Skip to content
Transkribus Workflows

To export Transkribus transcriptions to TEI, open the document, choose Export, and select the TEI option for a ready-made TEI P5 file, or export PAGE XML and transform it yourself for maximum fidelity. The direct TEI export is the fastest path to an editable scholarly file, but PAGE XML preserves every baseline and coordinate, so serious editions usually export PAGE and convert with their own XSLT. Here is how to choose and execute either route cleanly.

TEI or PAGE XML — which should you export?

The right format depends on how much layout fidelity your edition needs:

NeedBest exportWhy
Quick editable text editionTEI (direct)Text-centred, ready to edit
Exact image-to-text linkingPAGE XMLKeeps zone coordinates
Custom encoding rulesPAGE then your XSLTFull control of mapping
Bulk / repeatable jobsPAGE via APIScriptable, consistent

PAGE XML is the fullest record: it stores text regions, baselines, line polygons and pixel coordinates. TEI is text-centred: it captures the words and structure but simplifies geometry. If in doubt, export PAGE and keep it as your source of truth.

How do you run a direct TEI export?

  1. Open the document and select the pages (or the whole document).
  2. Click Export and tick TEI.
  3. Choose options: include line breaks, region structure and tags as needed.
  4. Download the resulting .xml file.

The output is a single TEI document with a teiHeader stub and a body containing your transcribed text, broken into structural divisions where your tags allowed. Treat the header as a placeholder — you will fill in real metadata (title, source, responsibility) by hand.

What does the exported TEI look like?

A typical fragment maps pages to pb, lines to lb, and tagged headings to head:

xml
<body>
  <pb n="1" facs="#page1"/>
  <head>Memorandum of Accompts 1641</head>
  <p>
    <lb n="1"/>Item payd to the carpenter for
    <lb n="2"/>worke done upon the church porch
  </p>
</body>

Note how lb markers preserve original line breaks. For a documentary edition that matters; for a reading edition you may strip them in a later pass.

Why export PAGE XML and transform it yourself?

PAGE XML retains information TEI discards, and a custom transform lets you encode exactly to your project's rules. A minimal XSLT skeleton turns PAGE TextLine elements into TEI lb plus text:

xslt
<xsl:template match="pc:TextLine">
  <lb n="{@id}"/>
  <xsl:value-of select="pc:TextEquiv/pc:Unicode"/>
</xsl:template>

This route is more work but gives you reproducible, schema-conformant output and full control over how regions, tags and zones become TEI elements.

How do you batch export a whole collection?

For more than a handful of documents, script it with the Transkribus REST API. You request each page's PAGE XML, then run your XSLT over the lot:

bash
# Pseudo-pipeline: pull PAGE XML for every page, then transform to TEI
for doc in $(cat doc_ids.txt); do
  transkribus-export --doc "$doc" --format page --out "page/$doc"
done
saxon -s:page -xsl:page2tei.xsl -o:tei

This keeps the conversion consistent across thousands of pages and re-runnable when your encoding rules evolve.

How do you clean up and validate the result?

The export is a starting point, not a finished edition. Plan a post-processing pass:

  • Fill the teiHeader with real bibliographic and responsibility metadata.
  • Decide how to handle line breaks, hyphenation and abbreviations.
  • Map any custom Transkribus tags that did not translate automatically.
  • Validate against your project's TEI ODD or RNG schema and fix mismatches.
bash
# Validate against a Relax NG schema
jing tei_all.rng edition.xml

Only after validation passes should the file flow into your edition platform or publication pipeline.

Key Takeaways

  • Transkribus exports TEI directly or PAGE XML for you to transform.
  • PAGE XML is the fuller record (coordinates, baselines); TEI is text-centred.
  • For editions with exact image links, keep PAGE XML as your source of truth.
  • Custom XSLT over PAGE gives reproducible, schema-conformant TEI.
  • Use the REST API to batch-export and transform whole collections.
  • The export is a draft: fill the teiHeader and resolve tag mappings by hand.
  • Always validate the TEI against your project schema before publishing.

Frequently Asked Questions

Can Transkribus export directly to TEI?

Yes. The export dialog offers a TEI option that maps regions, lines and tags into a TEI P5 document. It is a serviceable starting point, but most editions need further refinement in an XML editor afterwards.

Should I export TEI or PAGE XML from Transkribus?

Export PAGE XML if you want the fullest, most faithful record of layout, baselines and coordinates, then transform it yourself. Export TEI directly when you want a quick, editable text-centred file and can tidy it later.

What gets lost in the Transkribus TEI export?

Pixel-precise zone coordinates and some layout nuance are simplified or dropped in TEI, since TEI is text-centred. If you need exact image-to-text linking, keep the PAGE XML alongside or use a custom XSLT that preserves facsimile zones.

How do I batch export a whole collection?

Use the web app export for moderate volumes, or the Transkribus REST API to script export of every document in a collection. The API returns PAGE XML per page that you can transform to TEI in bulk.

Does the export preserve my structural tags?

Structural and text tags you applied in Transkribus carry into PAGE XML as region types and custom attributes, and the TEI export maps many of them to elements like heading and note. Unusual custom tags may need manual mapping in your transform.

Will the exported TEI validate against the TEI schema?

Transkribus TEI is generally well-formed and broadly P5-conformant, but you should validate it against your project's TEI ODD or RNG and fix any element or attribute mismatches before publishing.