Sunday, July 15, 2012

The Role of Markup in the Digital Humanities

My paper from the Cologne colloquium has been published in:

Historical and Social Research / Historische Sozialforschung 37.3 (2012), 125-146.

It contains a fairly detailed description of my alternative to embedded XML markup as an overall system for representing texts in the digital humanities, and how interoperable software can be built upon that foundation. Since it is a subscription-only journal I can't publish the text here, but a free copy of that volume is reportedly being given to all attendees to the current Digital Humanities conference at Hamburg this year. So it should be read by a wide audience.

Wednesday, July 11, 2012

Restricting Versions in Table View

One refinement suggested by my initial version of Table View was to restrict the number of versions for comparison to some subset of those available. This has the advantage of further improving the signal to noise ratio, and does so in a purely digital way.

To help the user select a subset of versions intuitively I used a dropdown menu with the selected versions marked by a × sign. (A tick mark can't be used because browsers already mark the currently selected item with a tick). This method is very compact. It always occupies the same space however many versions there are – something not achieved by the usual technique of a set of check boxes:

Monday, July 2, 2012

Table View

As we struggle to find ways to effectively represent textual variation on screen one of the persistent requests from various quarters has been the need for a table view: a hierarchical representation of the variants of a range within the chosen base text. This kind of view, for example, is used in the Cervantes hypertextual edition, or CollateX. Unlike the apparatus, which is a compact series of footnotes about variations in a text, table view shows variants in a strict rectilinear grid. Although variation is naturally overlapping in structure, not rectilinear or recursive, we can use such a format to clarify for the reader what is a variant of what across a number of versions – something side-by-side view cannot achieve.

Full text

One way to make table view work would be to show the text of all versions covering a particular range in the base text. Although this duplicates text between versions it is quite clear:

The rectilinear grid is implemented as a simple table, which can be seen by turning on the table cell borders:

This ensures that variants are vertically aligned, but since much of the text is the same, we might want to collapse the grid wherever the text is the same, and show only variants of the chosen base text above the line as highlighted alternatives:

This reduces clutter, but introduces another problem: the context of part-word variants is now removed and they may be regarded as unreadable. Extending them to the nearest word-boundary overcomes this:

What this view highlights is another need: many versions are almost the same. For example, in the Shakespeare example, Q1 and Q2 are practically the same, just like F1-F4. The differences are only minor punctuation changes. Collapsing these further introduces nested tables of variants that are best hidden from the reader:

The underlined text-ranges can be expanded by clicking on them, and the same action collapses them again. In the expanded form the sigla are displayed as a guide to the reader:

How this table view differs from the others

This table view differs from other attempts in two key ways:

  1. It is generated directly from a merged multi-version text, not from a collation of many separate texts
  2. It has three combinable options: 1) expansion to word-boundaries, 2) hiding merged text and 3) collapsing minor variants into sub-tables. These may be combined where desired to produce different effects.

The tables are generated as simple HTML. The cross-browser Javascript and CSS required to animate and format them may also be generated optionally.