Appearance
Choose a diplomatic transcription when fidelity to the document matters — for editions, linguistic study, or any analysis of the source's own spelling and layout — and a normalised one when readability and search matter more, as in a teaching text or a finding aid. For most working projects the right answer is neither extreme but a documented semi-diplomatic middle, ideally encoded so a single file yields both views. This guide gives you a decision procedure and the encoding to back it.
What is the core difference?
Diplomatic transcription reproduces what is on the page; normalised transcription rewrites it for a modern reader. Everything else follows from that one axis.
| Feature | Diplomatic | Normalised |
|---|---|---|
| Spelling | original | modernised |
| Abbreviations | kept (or marked) | expanded |
u/v, i/j | as written | standardised |
| Punctuation | original | modern |
| Line breaks | preserved | run on |
| Best for | editions, linguistics | reading, search, teaching |
How do I decide for my project?
Work backwards from the question the transcription must answer. Run this quick decision procedure:
text
1. Will anyone study the source's own spelling/forms? yes → lean diplomatic
2. Is the main goal readability or full-text search? yes → lean normalised
3. Both audiences? → encode both layers
4. Solo, time-boxed, internal use? → semi-diplomatic default
5. Unsure? → semi-diplomatic; you can derive normalised laterThe asymmetry that decides ties: you can always generate a normalised reading from a careful diplomatic base, but you cannot recover the original spelling from a normalised text. So when in doubt, capture more, not less.
What does semi-diplomatic look like in practice?
Semi-diplomatic keeps original spelling but resolves the things that only obscure meaning. A typical, defensible policy:
text
KEEP: original spelling, word order, original capitalisation
EXPAND: abbreviations (mark expansions, e.g. italics or <expan>)
NORMALISE silently: u/v, i/j, long s → s, ligatures
RECORD: line breaks as | or in a layout layer, not inline proseWrite this down in a one-page transcription guideline and apply it uniformly. The guideline, not your memory, is what makes the result reproducible across days and across people.
How do I keep both layers in one file?
Encode the choice rather than re-keying. TEI gives you paired elements so each token carries both readings, and your publishing layer renders whichever the reader wants:
xml
<p>
<!-- abbreviation: original mark vs expansion -->
<choice><abbr>dns</abbr><expan>d<ex>omi</ex>nus</expan></choice>
<!-- spelling: as-written vs regularised -->
<choice><orig>vpon</orig><reg>upon</reg></choice>
</p>From this single source you can output a diplomatic view (abbr, orig) and a normalised view (expan, reg) with one stylesheet switch — no second transcription pass.
What pitfalls should I avoid?
The fatal one is silent inconsistency: expanding some abbreviations but not others, normalising u/v here and not there, with nothing written down. A reader then cannot tell whether a form is the scribe's or the editor's. Two more traps: over-normalising and destroying the linguistic evidence you may later want, and mixing layout into prose so line ends become invisible. Decide, document, and apply uniformly — that single habit prevents most rework.
How does the choice affect search and analysis?
It is not cosmetic. A normalised layer makes full-text search and naive word-frequency counts behave, because upon is always spelled one way. A diplomatic layer preserves the spelling variation a historical linguist studies. Holding both — search on reg, analyse on orig — is why the dual-layer encoding above is worth the small extra effort.
Key Takeaways
- Diplomatic = fidelity to the source; normalised = readability and search.
- Decide from the question your transcription must answer, not from habit.
- Semi-diplomatic (keep spelling, expand abbreviations, normalise
u/vandi/j) is the safest default. - You can derive normalised from diplomatic but not the reverse — when unsure, capture more.
- Encode both layers with TEI
choice/orig/regandabbr/expanso one file yields both views. - The worst pitfall is silent inconsistency; write a one-page guideline and apply it uniformly.
- Search on the normalised layer, analyse on the diplomatic layer.
Frequently Asked Questions
What is a diplomatic transcription?
A diplomatic transcription reproduces the source as written — original spelling, abbreviations, capitalisation, line breaks and even errors — without modernising. It prioritises fidelity to the document over readability.
What is a normalised transcription?
A normalised transcription regularises the text for a modern reader: expanding abbreviations, standardising spelling and u/v and i/j, modern punctuation and capitalisation. It prioritises readability and searchability over documentary fidelity.
Which should I choose for a beginner project?
If you must pick one, semi-diplomatic is the safest default: keep original spelling but expand abbreviations and silently normalise u/v and i/j. It is honest about the source yet usable, and you can derive a fully normalised reading layer later.
Can I have both diplomatic and normalised in one file?
Yes, and it is best practice. Encode the diplomatic reading and the normalised reading together with TEI choice, orig and reg (or abbr and expan), so a single source produces both views without re-keying.
What pitfall ruins a transcription most often?
Inconsistency — silently normalising in some places and not others, with no written policy. Decide your rules up front, record them in transcription guidelines, and apply them uniformly so the result is reproducible.
Does the choice affect text mining and search?
Strongly. Original spelling fragments search results and breaks naive frequency counts, while normalisation can erase the very variation a linguist wants. Keeping both layers lets you search on the normalised form and analyse on the diplomatic one.