Build a parallel-text edition: A Practical Guide

Q: How are the two texts aligned in TEI?

Each segment is given an `xml:id`, and the correspondence is recorded with a `` of `` elements (or the `@corresp` attribute) connecting matching segments. The alignment data is separate from the text, so you can change it without re-editing either side.

Q: What if the texts diverge in structure?

Use one-to-many and many-to-one links. A `` can connect one segment on one side to several on the other, handling cases where a translator merges or splits sentences. Standoff linking keeps these mismatches manageable.

To build a parallel-text edition, transcribe each text independently, segment both into comparable units (usually sentences or verses) with stable xml:ids, then record the correspondence between segments as standoff links in TEI — a <linkGrp> of <link> elements, kept separate from the texts themselves. Get a machine-assisted first-pass alignment, correct it by hand where the texts diverge, and render the result as linked side-by-side columns. The alignment, not the transcription, is the scholarly heart of this kind of edition.

What problem does a parallel-text edition solve?

It lets a reader see two related texts in correspondence: an original beside its translation, a Latin source beside a vernacular gloss, or two recensions of the same work. The value is not in the texts individually — those may exist already — but in the explicit, verifiable mapping between their parts. A good parallel-text edition turns "these are related" into "this sentence corresponds to that one", which is a citable scholarly claim.

How do I structure the source files?

Keep each text whole and self-contained, then express alignment as a separate layer. This standoff approach means you can revise the alignment without touching either transcription:

xml

<!-- text A, segmented -->
<seg xml:id="a1">In the beginning was the Word.</seg>
<seg xml:id="a2">And the Word was with God.</seg>

<!-- text B, segmented -->
<seg xml:id="b1">Im Anfang war das Wort.</seg>
<seg xml:id="b2">Und das Wort war bei Gott.</seg>

Every segment that might be compared gets an xml:id. That id is the anchor the alignment links to.

How do I record the alignment?

Use a <linkGrp> of <link> elements. Each <link> joins two (or more) segment ids with its @target:

xml

<linkGrp type="translation">
  <link target="#a1 #b1"/>
  <link target="#a2 #b2"/>
</linkGrp>

Because the links are standoff, you can store them in a third file pointing into both texts. This keeps a clean separation: text A, text B, and the editorial alignment between them are three independent, individually-revisable assets.

What alignment granularity should I choose?

This decision governs both effort and usefulness. Match it to what your readers actually compare.

Granularity	Effort	Best for
Paragraph	Low	Quick reading comparison
Sentence/verse	Medium	Most translation editions
Clause	High	Close stylistic analysis
Word	Very high	Linguistic/lexical study

Sentence or verse level is the workhorse default: fine enough to be genuinely useful, coarse enough to finish.

Can I automate the first pass?

Yes — for translations especially, a sentence aligner gets you most of the way, then you correct it. Tools like hunalign or LF Aligner take two plain-text files and emit aligned pairs:

bash

# LF Aligner-style: produce a sentence-aligned TSV, then review by hand
lf_aligner textA.txt textB.txt --lang1 la --lang2 de > aligned.tsv
# columns: segmentA <tab> segmentB ; fix merges/splits manually

Treat the output as a draft. The aligner will misjudge non-literal passages, omissions, and reordered clauses — exactly the places where your editorial judgement adds value. Convert the corrected TSV into <link> elements with a short script.

How do I handle texts that do not line up?

Translators merge, split, omit, and reorder. Standoff links absorb this gracefully because a single <link> can connect one segment to several:

xml

<!-- one source sentence rendered as two in translation -->
<link target="#a5 #b7 #b8"/>
<!-- a source sentence with no counterpart -->
<link target="#a9"/>

Encoding the mismatch honestly is better than forcing a tidy one-to-one map. The interface can then show a one-to-many highlight, telling the reader the truth about how the texts relate.

How should it look to a reader?

Render the two texts in side-by-side columns and let the <link> data drive interaction: hovering or clicking a segment highlights its counterpart across the gutter. A modest XSLT-to-HTML build plus a few lines of JavaScript reading the link targets gives you synchronised highlighting and scrolling. Keep it static where you can — the alignment is precomputed, so no server is required to deliver a responsive comparison view.

Key Takeaways

A parallel-text edition's scholarly value is the explicit alignment between texts.
Transcribe each text independently; segment both with stable xml:ids.
Record correspondence as standoff <linkGrp>/<link>, separate from the texts.
Choose alignment granularity (sentence/verse is the usual default) by research need.
Use sentence aligners (hunalign, LF Aligner) for a first pass, then correct by hand.
Handle divergence with one-to-many links rather than forcing one-to-one maps.
Deliver as static, linked side-by-side columns with synchronised highlighting.

Frequently Asked Questions

What is a parallel-text edition?

A parallel-text edition presents two or more related texts side by side — typically an original and a translation, or two versions — aligned so readers can compare corresponding passages directly. The alignment between segments is the core scholarly contribution.

How are the two texts aligned in TEI?

Each segment is given an xml:id, and the correspondence is recorded with a <linkGrp> of <link> elements (or the @corresp attribute) connecting matching segments. The alignment data is separate from the text, so you can change it without re-editing either side.

What granularity should I align at?

Align at the smallest unit your readers compare — usually the sentence or verse, sometimes the clause. Word-level alignment is powerful for linguistic study but costly; paragraph-level is cheap but coarse. Match the granularity to the research question.

Can alignment be automated?

Partly. Sentence aligners like hunalign or LF Aligner produce a first pass for translations, and you correct it by hand. Automation saves time on bulk text but never replaces editorial judgement at difficult or non-literal passages.

How do I display a parallel-text edition?

Render the two texts in side-by-side columns with linked highlighting, so selecting a segment on one side highlights its counterpart. The <link> correspondences drive this interaction, usually via XSLT or a JavaScript viewer.

What if the texts diverge in structure?

Use one-to-many and many-to-one links. A <link> can connect one segment on one side to several on the other, handling cases where a translator merges or splits sentences. Standoff linking keeps these mismatches manageable.

What problem does a parallel-text edition solve? ​

How do I structure the source files? ​

How do I record the alignment? ​

What alignment granularity should I choose? ​

Can I automate the first pass? ​

How do I handle texts that do not line up? ​

How should it look to a reader? ​

Key Takeaways ​

Frequently Asked Questions ​

What is a parallel-text edition? ​

How are the two texts aligned in TEI? ​

What granularity should I align at? ​

Can alignment be automated? ​

How do I display a parallel-text edition? ​

What if the texts diverge in structure? ​

Related reading ​