Skip to content
TEI & XML Encoding

Write a minimal TEI header when the metadata is not yet stable and the priority is getting text encoded: early drafts, teaching files, single-use transcriptions and pilots. Skip the minimal approach the moment a document is bound for publication, a repository or OAI-PMH harvesting, because those systems read header fields and gaps there become real discovery and provenance failures. The header is cheap to expand later, but expensive to fix after a collection has shipped.

What is the absolute minimum?

The TEI schema requires only three children of fileDesc. Here is a valid header with nothing optional:

xml
<teiHeader>
  <fileDesc>
    <titleStmt>
      <title>Letter from A. Smith to J. Brown, 1843</title>
    </titleStmt>
    <publicationStmt>
      <p>Unpublished working transcription.</p>
    </publicationStmt>
    <sourceDesc>
      <p>Transcribed from MS 1843/17, County Archive.</p>
    </sourceDesc>
  </fileDesc>
</teiHeader>

That validates. It is enough to start encoding the same afternoon you open the manuscript.

When does a minimal header actually fit?

Reach for it when the answer to "will this metadata change soon?" is yes:

  • Drafts and spikes — you are testing an encoding model, not publishing.
  • Teaching — students should learn body markup without drowning in profileDesc.
  • Single-use extraction — you need the text for one analysis, not a permanent edition.
  • Rapid pilots — proving a workflow before committing to a metadata application profile.

In all of these, a heavy header is premature optimisation. You would polish provenance fields that the next iteration rewrites anyway.

When is a minimal header the wrong call?

The failure mode is shipping a thin header into a system that depends on it. Watch for these signals:

SignalWhy a minimal header hurts
Going into a repositoryDC/MODS crosswalks pull from titleStmt, publicationStmt
OAI-PMH harvestingAggregators map header fields to Dublin Core; gaps = poor discovery
Long-term preservationsourceDesc and revisionDesc carry provenance auditors expect
Multi-encoder teamWithout encodingDesc, conventions drift across files
Citable scholarly editionReaders need responsibility, licence and edition statements

If any of these apply, invest in the header now — retrofitting metadata across hundreds of files later is far costlier.

How do you upgrade a header without breaking anything?

The optional blocks are appended siblings, so you bolt them on without touching your transcription:

xml
<teiHeader>
  <fileDesc> ... </fileDesc>
  <encodingDesc>
    <projectDesc><p>Diplomatic transcription; original spelling kept.</p></projectDesc>
  </encodingDesc>
  <profileDesc>
    <langUsage><language ident="en">English</language></langUsage>
  </profileDesc>
  <revisionDesc>
    <change when="2025-01-12" who="#er">Added editorial policy.</change>
  </revisionDesc>
</teiHeader>

Add encodingDesc when conventions need documenting, profileDesc for languages and participants, and revisionDesc the first time anyone edits the file.

Can you stop a header staying minimal forever?

The risk with "minimal for now" is that it never gets enriched. The only durable fix is to make richness a validation requirement: customise your ODD so that, say, publicationStmt must contain a licence and revisionDesc must exist. Then an under-filled header fails validation in CI, not in a reviewer's inbox six months on. That converts good intentions into an enforced standard.

What does this mean for a whole collection?

Decide the policy once, at the collection level, not file by file. A common pattern: encode against a minimal profile during transcription, then run every file through an enrichment pass that fills profileDesc and revisionDesc before deposit. Documenting that two-stage policy in your project handbook keeps the trade-off deliberate rather than accidental.

Key Takeaways

  • The required minimum is fileDesc with titleStmt, publicationStmt and sourceDesc.
  • Minimal headers fit drafts, teaching, single-use jobs and pilots — unstable metadata.
  • Avoid minimal headers for repository deposit, OAI-PMH harvesting and preservation.
  • A minimal header is fully valid; richness is an editorial choice, not a validity one.
  • Upgrade by appending encodingDesc, profileDesc and revisionDesc — no disruption.
  • Enforce required header fields through your ODD so metadata cannot quietly stay thin.

Frequently Asked Questions

What is the minimum required in a TEI header?

TEI requires only fileDesc containing titleStmt (with a title), publicationStmt and sourceDesc. Everything else in teiHeaderencodingDesc, profileDesc, revisionDesc — is optional and added when you need it.

When should I write a minimal TEI header?

Use a minimal header for early drafts, single-use transcriptions, teaching examples and pilots where the metadata is not yet stable. It keeps you encoding instead of fighting metadata you will only revise later.

When is a minimal header a bad idea?

Avoid it for anything you will publish, deposit in a repository or harvest via OAI-PMH. Aggregators, citation tools and preservation systems read header fields, and gaps there become discovery and provenance problems.

Does a minimal header still validate against TEI?

Yes. A header with just titleStmt, publicationStmt and sourceDesc is fully valid against the TEI schema. Validity is about structure; richness is a separate editorial decision.

How do I upgrade a minimal header later?

Add the optional blocks incrementally: encodingDesc for your editorial rules, profileDesc for languages and people, and revisionDesc for the change log. Because they are appended siblings, retrofitting them does not disturb existing markup.

Can I enforce a richer header across a project?

Yes — customise your ODD to make selected header elements mandatory. That turns "we should fill this in" into a validation error, which is the only reliable way to keep a collection's metadata consistent.