Skip to content
Transkribus Workflows

To handle abbreviations in Transkribus, decide one policy for the whole project — keep them as written (diplomatic) or expand them (reading) — and apply it consistently in your ground truth, because the HTR model simply learns the mapping you demonstrate. If you expand dñs to dominus the same way on every training page, the model will expand it automatically on new pages; if you waver, it produces unusable, inconsistent output. The abbreviation tag then lets you preserve both the original mark and its expansion so nothing is lost on export.

Expand or keep as written — which should I choose?

This is an editorial decision, not a technical one, and it drives everything downstream.

ApproachWhat you recordBest forTrade-off
DiplomaticThe mark as written (dñs)Paleographic study, manuscript fidelityHarder to search/read
ExpandedThe full word (dominus)Reading editions, indexing, searchLoses visual form unless tagged
Tagged bothMark and expansion via tagEditions needing bothMore tagging effort

The professional default for a searchable scholarly edition is expanded, but tagged — you get readability and the original mark survives in the markup.

How does the model learn abbreviations?

The recognition model has no dictionary of medieval shorthand; it learns from your ground truth. Whatever transcription sits beside an image of & or or a macron is what it will reproduce.

text
Ground truth line A:  "dominus noster"   ← image shows  dñs nr̄
Ground truth line B:  "dñs noster"        ← inconsistent!

Feed it line A consistently and the macron-over-d reliably becomes dominus. Feed it both styles and the model guesses, badly. Consistency in ground truth is the whole game.

How do I tag abbreviations so both forms survive?

Use Transkribus's structural abbreviation tag on the span. It stores the abbreviated reading and the expansion together in the PAGE XML, and that pair maps cleanly to TEI on export.

xml
<!-- After export to TEI, a tagged abbreviation becomes: -->
<choice>
  <abbr>dñs</abbr>
  <expan>dominus</expan>
</choice>

A publishing stylesheet can then italicise the supplied letters, hide one form, or show both — your choice at render time, not transcription time.

Why is my model producing gibberish on brevigraphs?

Two usual culprits:

  1. Mixed policy in training — the model saw the same mark expanded two ways.
  2. Too few examples — a rare brevigraph like the con-/-us sign () appears only a handful of times.

Fix the first by standardising existing ground truth; fix the second by adding more lines containing that mark before retraining. A model needs repeated, consistent exposure to learn an abbreviation reliably.

What is the right workflow on a medieval collection?

A practical order of operations:

  1. Write a short transcription convention document: list each common abbreviation and its agreed expansion.
  2. Transcribe ground truth strictly to that convention.
  3. Apply the abbreviation tag where you need to preserve the original mark.
  4. Train (or fine-tune) and run recognition.
  5. On export to TEI, verify abbr/expan (or am/ex) elements came through.
  6. Spot-check that keyword search finds the expanded word.

This keeps medieval shorthand captured consistently — readable for editing, searchable for users, and faithful for paleographers.

Key Takeaways

  • Pick one abbreviation policy per project; consistency beats any individual choice.
  • The model learns abbreviation handling from your ground truth — it has no built-in expander.
  • Tag abbreviations to keep both the original mark and the expansion through to TEI.
  • Tagged abbreviations export to TEI abbr/expan and render as italics via a stylesheet.
  • Gibberish usually means mixed ground truth or too few examples of a rare brevigraph.
  • Storing expansions makes documents searchable while preserving paleographic fidelity.

Frequently Asked Questions

Should I expand abbreviations or transcribe them as written in Transkribus?

Decide once, at project level. For a diplomatic transcription, keep abbreviation marks as written; for a reading edition, expand them. The critical rule is consistency, because the HTR model learns whatever you teach it.

Can a Transkribus model learn to expand abbreviations automatically?

Yes. If your ground truth consistently expands a brevigraph to its full letters, the model learns that mapping and will expand it on new pages. Mixed ground truth produces unpredictable, unusable output.

How do I record both the abbreviated and expanded forms?

Use the abbreviation tag (an editorial structural tag) to mark the span and store the expansion, so the original and resolved forms are both preserved in the PAGE XML and survive export to TEI.

Why does my model output gibberish on abbreviation marks?

Usually the training data mixed expansion styles, or the brevigraph is rare. Standardise the ground truth to one policy and add more lines containing the mark so the model has enough examples to learn it.

How are expanded letters distinguished in a scholarly edition?

Conventionally expanded letters are italicised or wrapped in editorial markup. Transkribus abbreviation tags export to TEI ex/expan and abbr/am elements, which a publishing stylesheet can then render in italics.

Strongly. A reader searching a modern spelling will miss an unexpanded brevigraph. Storing the expansion makes the full word findable while the original mark stays visible for paleographic accuracy.