How to Get started with TEI P5

Q: What is the minimum I need to start encoding in TEI P5?

A plain-text editor or oXygen, one well-formed `.xml` file with a `teiHeader` and a `text` element, and the TEI P5 All schema referenced via a processing instruction. That alone validates and is a legitimate starting point.

Q: What schema language should a beginner target?

Relax NG (the `.rng` file the TEI ships). It gives the clearest error messages in oXygen and VS Code, and the TEI generates it as the primary schema from the ODD source.

Q: How do I validate my first TEI file?

Reference `tei_all.rng` via an `xml-model` processing instruction at the top of the file, then run validation in your editor or with `jing tei_all.rng myfile.xml` on the command line.

To get started with TEI P5, create a single well-formed XML file with a teiHeader (your metadata) and a text element (your transcription), point it at the TEI All Relax NG schema with an xml-model processing instruction, and validate it in oXygen or VS Code. That four-part skeleton — declaration, schema link, header, text — is a complete, valid TEI document you can grow incrementally. You do not need to read the whole Guidelines first.

What exactly is TEI P5?

TEI P5 is the fifth and current edition of the Text Encoding Initiative Guidelines, an XML vocabulary maintained by the TEI Consortium for representing texts in the humanities. "P5" has been the stable major version since 2007, with biannual releases (the version number looks like 4.7.0). It is not a single fixed schema but a modular system: you select the modules and elements you need and generate a custom schema from them.

How do I create my first valid TEI file?

Start with this minimal skeleton. It validates against tei_all out of the box:

xml

<?xml version="1.0" encoding="UTF-8"?>
<?xml-model href="https://tei-c.org/release/xml/tei/custom/schema/relaxng/tei_all.rng"
            type="application/xml" schematypens="http://relaxng.org/ns/structure/1.0"?>
<TEI xmlns="http://www.tei-c.org/ns/1.0">
  <teiHeader>
    <fileDesc>
      <titleStmt><title>A letter from Mary to John, 1789</title></titleStmt>
      <publicationStmt><p>Encoded by Elara Reed, 2024.</p></publicationStmt>
      <sourceDesc><p>Transcribed from MS Add. 12345, f. 3r.</p></sourceDesc>
    </fileDesc>
  </teiHeader>
  <text><body>
    <p>My dearest John, I write in haste&#8230;</p>
  </body></text>
</TEI>

Those three children of fileDesc — titleStmt, publicationStmt, sourceDesc — are the only mandatory header elements. Everything else is optional and added when you need it.

Which tools should a beginner install?

Tool	Cost	Best for
oXygen XML Editor	Paid (academic discount)	Validation, XPath, author mode, XSLT
VS Code + Scholarly XML	Free	Lightweight encoding, Relax NG validation
jEdit	Free	Older but solid XML plugin
`jing` (command line)	Free	Batch validation in scripts

For a first project, VS Code with the Scholarly XML extension is genuinely enough. Move to oXygen when you start writing XPath, doing find-and-replace across many files, or transforming with XSLT.

Should I use tei_all or a smaller customisation?

tei_all includes every element and is the right schema while you are learning, because nothing you try will be rejected as "not in the schema". For real projects, switch to a tighter customisation so the schema actively prevents mistakes. The lightest official starting points are TEI Bare (a stripped skeleton) and TEI Simple. You tailor these with an ODD file rather than editing the schema by hand.

How do I validate and fix errors?

Validation is constant, not a final step. In oXygen, errors appear underlined as you type. On the command line:

bash

jing tei_all.rng letter.xml
# letter.xml:8:14: error: element "titel" not allowed here

Read errors literally: a typo'd element name, a missing required child, or wrong nesting account for almost all of them. The line and column point you straight to it.

What should I avoid as a beginner?

The classic trap is encoding appearance instead of meaning. If a word was italic, ask why — is it a foreign phrase (<foreign>), a title (<title>), or emphasis (<emph>)? Tag the function, and let a stylesheet decide it looks italic. Other early mistakes: inventing your own element names (use existing TEI ones), forgetting the namespace declaration, and starting too elaborate. A simple file you finish beats an ambitious one you abandon.

Key Takeaways

A valid TEI document needs only a declaration, a schema link, a teiHeader, and a text element.
The three mandatory header children are titleStmt, publicationStmt, and sourceDesc.
Validate against tei_all while learning, then tighten with a customisation later.
Use Relax NG (.rng) for the clearest error messages.
Start free in VS Code; move to oXygen when you need XPath and XSLT.
Encode meaning, not appearance — capture what a feature is.
Grow the file incrementally; do not read the entire Guidelines first.

Frequently Asked Questions

What is the minimum I need to start encoding in TEI P5?

A plain-text editor or oXygen, one well-formed .xml file with a teiHeader and a text element, and the TEI P5 All schema referenced via a processing instruction. That alone validates and is a legitimate starting point.

Do I need oXygen XML Editor or can I use free tools?

oXygen is the de facto standard and worth its licence for serious work, but you can start free with VS Code plus the Scholarly XML or XML extension, or jEdit. Both can validate against a Relax NG schema.

Should I learn the whole TEI Guidelines before starting?

No. The full Guidelines describe roughly 580 elements; most editions use 30 to 60. Start from a customisation like TEI Bare or TEI Simple and add elements as your sources demand them.

What schema language should a beginner target?

Relax NG (the .rng file the TEI ships). It gives the clearest error messages in oXygen and VS Code, and the TEI generates it as the primary schema from the ODD source.

How do I validate my first TEI file?

Reference tei_all.rng via an xml-model processing instruction at the top of the file, then run validation in your editor or with jing tei_all.rng myfile.xml on the command line.

What is the most common beginner mistake in TEI?

Over-tagging: encoding presentational detail like bold or indentation instead of meaning. Capture what a feature is (a heading, a name, a correction), not how it looked on the page.

What exactly is TEI P5? ​

How do I create my first valid TEI file? ​

Which tools should a beginner install? ​

Should I use tei_all or a smaller customisation? ​

How do I validate and fix errors? ​

What should I avoid as a beginner? ​

Key Takeaways ​

Frequently Asked Questions ​

What is the minimum I need to start encoding in TEI P5? ​

Do I need oXygen XML Editor or can I use free tools? ​

Should I learn the whole TEI Guidelines before starting? ​

What schema language should a beginner target? ​

How do I validate my first TEI file? ​

What is the most common beginner mistake in TEI? ​

Related reading ​