Appearance
Validate TEI against a schema whenever a file will be published, archived, shared, or fed to processing tools — which is almost always. Validation is cheap, automatable, and catches structural mistakes before they reach readers or break a transformation. The only times to skip it are genuinely disposable scratch encoding you are about to delete. The harder decision is which schema and which layer of validation, not whether to do it at all.
There are really three questions hiding inside "should I validate": well-formed vs valid, tei_all vs custom, and structure vs business rules.
Is validation different from well-formedness?
Yes, and conflating them is a common mistake. Well-formedness is pure XML syntax: every tag closes, there is one root, special characters are escaped. Any XML parser checks it for free. Validation goes further and asks whether the document obeys a schema — are these the right elements, in the right places, with the right attributes?
xml
<!-- well-formed but INVALID TEI: persName cannot contain a div -->
<persName><div>Mary</div></persName>That parses fine but a TEI schema rejects it. You need both checks; well-formedness alone proves nothing about encoding correctness.
When should I validate against tei_all versus a custom schema?
This is a real trade-off, not a default.
| Scenario | Use | Why |
|---|---|---|
| Learning, prototyping | tei_all | Accepts anything legal in TEI; nothing blocks you |
| Active production | Custom ODD schema | Rejects elements your project banned; enforces house style |
| Receiving external files | tei_all first | Confirm it is legal TEI before checking it fits your profile |
| Final archive deposit | Custom + Schematron | Strictest gate before something becomes permanent |
A custom schema is more work but turns validation into an active guardrail: if your project decided to use corr rather than reg, the schema can forbid reg outright, so a slip is caught at validation time rather than in review.
What does each validation layer actually catch?
Think in layers, each catching what the one below cannot:
- Well-formedness — syntax only.
- Relax NG / XSD — which elements and attributes are allowed where (structure).
- Schematron — context rules and cross-references: "a
rdgmust have@wit", "@frommust precede@to", "every@refmust resolve".
xml
<sch:rule context="tei:persName[@ref]">
<sch:assert test="key('ids', substring-after(@ref,'#'))">
@ref points to a person not declared in listPerson.
</sch:assert>
</sch:rule>Relax NG can never express that assertion — it does not reason about references. If your edition relies on linked entities or witnesses, Schematron is where real quality control lives.
How do I run validation in practice?
For a single file, your editor does it live; oXygen underlines errors as you type. For a corpus, automate it. A pre-commit hook or CI step that runs jing over every changed file stops invalid TEI entering the repository:
bash
# validate all TEI files against the project schema
for f in data/*.xml; do
jing schema/myEdition.rng "$f" || echo "INVALID: $f"
doneAdd a second pass with an ISO Schematron processor against the .sch file for the business rules. Batch validation on every commit is the single highest-value habit in a TEI project.
When is it reasonable not to validate?
Validation has costs: setting up schemas, slower iteration if you validate on every keystroke against tei_all, and false confidence (a valid file can still be editorially wrong). It is reasonable to defer validation when you are sketching a structure you expect to throw away, or experimenting with an encoding approach before committing. But the moment a file is destined for anyone else's eyes or any tool, validate it. The cost of an invalid file surfacing downstream — a broken transform, a rejected deposit — vastly exceeds the cost of validating early.
Does validation prove my encoding is correct?
No, and treating it as a quality guarantee is the deepest trap. Validation proves conformance to a schema, not soundness of judgement. A file that tags a city as a persName validates perfectly if the structure is legal. Validation is necessary but never sufficient; pair it with human review, sampling, and Schematron rules that encode as much of your editorial policy as can be made mechanical.
Key Takeaways
- Validate any file that will be published, archived, shared, or processed — skip only true scratch work.
- Well-formedness (syntax) and validation (schema conformance) are different checks; you need both.
- Use
tei_allfor learning and incoming files; use a custom ODD schema as a production guardrail. - Relax NG checks structure; Schematron checks context rules and cross-references.
- Automate batch validation on every commit with
jingplus a Schematron pass. - A valid file can still be editorially wrong — validation is necessary, not sufficient.
Frequently Asked Questions
Is well-formedness the same as validation?
No. Well-formedness only checks XML syntax (closed tags, one root, escaped characters). Validation additionally checks that elements, attributes, and nesting obey a schema. A file can be well-formed but invalid TEI.
When should I validate against tei_all versus a custom schema?
Validate against tei_all while exploring or learning, because it accepts anything legal in TEI. Switch to a custom ODD-derived schema for production, where you want the schema to actively reject elements your project has decided not to use.
Do I need to validate every single file?
For any file that will be published, archived, or processed by tools, yes. Validate in batch on commit. For quick throwaway notes or scratch encoding you are about to discard, it is reasonable to skip it.
What does Schematron catch that Relax NG cannot?
Context-dependent and cross-reference rules: a wit pointer must resolve to a declared witness, a date range must be ordered, an element is required only in a certain context. Relax NG validates structure, not these business rules.
Can validation guarantee my encoding is correct?
No. Validation proves the file obeys the schema, not that your editorial judgements are right. A perfectly valid file can still tag a place name as a person. Validation is necessary but not sufficient for quality.