Skip to content
Transkribus Workflows

Tagging in Transkribus captures a document's structure and meaning so they survive into your exported data, not just the bare text. There are two kinds: structural tags that label regions and lines by function (heading, paragraph, marginalia, table), and text tags that mark spans inside a line (person, place, date, abbreviation). Assign structure around the layout step and add text tags during correction, and your export carries headings, marginal notes and named entities as real, queryable markup.

Structural tags vs text tags — what's the difference?

Getting this distinction right shapes everything downstream:

Tag typeScopeExamplesExports as
StructuralRegion / lineheading, paragraph, marginalia, page-numberregion type / custom attribute
TextSpan within a lineperson, place, date, abbrev, gapinline custom tag / attribute

Structural tags answer what is this block of the page; text tags answer what does this run of characters mean. Both flow into PAGE XML, and the TEI export maps many of them to elements like head, note, persName and placeName.

How do you tag region structure?

Region structure is set around layout analysis. After lines are detected, you can:

  1. Confirm or draw text regions (main column, header, margin).
  2. Assign each region a structure type — for example paragraph, heading, marginalia, page-number.
  3. Set or correct reading order so regions export in the right sequence.
text
Page regions
  [heading]      "Memorandum 1641"
  [paragraph]    main body column
  [marginalia]   note in left margin   <- separate region, own structure type
  [page-number]  "fol. 12r"

Keeping marginalia in its own region with the marginalia type is the key move: it stops marginal notes interleaving with the main text in the export.

How do you tag spans of text during correction?

Once you can read the transcription, select a span in the text editor and apply a text tag. Common ones:

  • person / place / date for named entities.
  • abbrev plus an expansion for shorthand and brevigraphs.
  • gap for illegible or damaged passages.

These spans become inline annotations in the export. A tagged line conceptually looks like this in PAGE XML custom syntax:

text
Unicode:  "payd to Iohn Smyth of Yorke the xij day"
custom:   person 16,9 ; place 31,5 ; date 41,11

The custom attribute records each tag's offset and length within the line, which your transform later turns into TEI persName, placeName and date.

How do tags survive the export?

Both export formats preserve tags, but differently:

  • PAGE XML stores structure as region type attributes and span tags in the line custom attribute — the fullest record.
  • TEI maps common structures automatically (heading becomes head, marginalia becomes note), and you map the rest with your own XSLT.

If a custom tag does not appear where you expect in TEI, add an explicit rule to your transform. Never assume an unusual tag maps itself.

Should you tag tables and special structures?

Tables need dedicated table-tagging (cells, rows, columns) rather than plain structure tags, because a register's value is its grid. For lists, catchwords, signatures and running heads, assign matching structure types so the export distinguishes them from body paragraphs. The principle holds throughout: encode the function, not just the text, and your data stays useful for indexing and analysis.

A practical tagging order

  1. Run layout; fix regions and reading order.
  2. Assign structure types to every region.
  3. Recognise text.
  4. Correct text, applying text tags (entities, abbreviations, gaps) as you go.
  5. Export to PAGE/TEI and verify tags landed correctly.

Doing structure first makes recognition read the page correctly; doing text tags during correction means you tag only what you can actually read.

Key Takeaways

  • Transkribus has structural tags (regions/lines) and text tags (spans) — use both.
  • Assign structure types around layout; add text tags during correction.
  • Put marginalia in its own region with the marginalia type to keep it separate.
  • Tag named entities (person, place, date) for later extraction and indexing.
  • Tags export to PAGE XML (region types + custom attributes) and map into TEI.
  • Custom tags may need explicit XSLT mapping to reach the right TEI element.
  • Encode the page's function, not just its words, to keep data analysable.

Frequently Asked Questions

What is the difference between structural tags and text tags in Transkribus?

Structural tags label whole regions or lines by their function, such as heading, marginalia or paragraph. Text tags annotate spans inside a line, such as a person name, place or abbreviation. They serve different purposes and both export into your XML.

Do I need to tag structure before or after recognition?

You assign region-level structure during or after layout analysis, and you add text-level tags during correction once you can read the transcription. Tagging structure early helps recognition and export; text tagging is usually a later pass.

Will my tags survive export to TEI or PAGE XML?

Yes. Structural tags become region types and custom attributes in PAGE XML, and the TEI export maps many to elements like head and note. Custom tags may need explicit mapping in your transform to land in the right TEI element.

How do I tag marginalia so it does not mix with the main text?

Give marginalia its own text region and assign it a marginalia structure type, separate from the main body region. That keeps its reading order and content distinct in the export rather than interleaved with the main column.

Can I tag named entities like people and places?

Yes, using text tags such as person, place and date over spans within a line. These enable later entity extraction and indexing, and export as attributes you can transform into TEI persName, placeName and date elements.

Does tagging affect recognition accuracy?

Correct region structure helps the layout and recognition steps read the page in the right order and keep columns or marginalia separate. Text-span tags do not change recognition but enrich the data you get out.