Skip to content
Digital Scholarly Editions

To plan a digital scholarly edition, fix three things before you transcribe a single line: your editorial model (documentary, critical, or genetic), your encoding schema (almost always TEI P5 with a customised ODD), and your delivery layer (how readers will actually use it). Write these into a short project charter, run a two-week pilot on your hardest pages, and only then scale up. The order matters — model drives schema, schema drives interface, and reversing that order is the single most expensive mistake in the field.

What is a digital scholarly edition, really?

A digital scholarly edition is not a PDF of a book online and not a bare image gallery. It is a structured, machine-readable representation of one or more textual witnesses, with an explicit editorial method, that lets readers move between facsimile, transcription, apparatus, and commentary. The "scholarly" part means every transformation from source to screen is documented and defensible. That documentation requirement is what separates an edition from a digitisation project, and it is why planning matters more here than in almost any other digital humanities work.

Which editorial model should I choose?

This is your first and most consequential decision because it sets your transcription policy and your TEI module list.

ModelCapturesTypical TEI focus
DocumentaryOne witness, as written<sourceDoc>, zones, line-level layout
CriticalA reconstructed text from many witnessestextcrit (app, lem, rdg)
GeneticThe process of writing and revision<add>, <del>, <subst>, change stages

If you cannot decide, ask what question the edition answers. "What does this manuscript say?" is documentary. "What did the author intend across all surviving copies?" is critical. "How did the text come to be?" is genetic.

How do I scope the work realistically?

Scope in pages and throughput, not vibes. Count your folios, estimate encoding rate from a real pilot, and budget the invisible 50% — collation, proofreading, interface, and accessibility testing.

text
pages = 240
encode_rate = 12 pages/day  (measured, not hoped)
encode_days = 240 / 12      = 20 days
overhead    = encode_days * 1.0   (collation, review, build, a11y)
total_days  = 40 person-days  -> ~8 working weeks for one encoder

A pilot on five genuinely difficult pages will surface every markup problem you would otherwise hit at page 180.

What goes in the project charter?

Keep it to two pages and put it under version control. It should name the editorial model, the schema and ODD, the metadata standard, the licence, the responsible editors, and the long-term home for the data. The charter is also where you record your sigla scheme and your persistent-identifier policy. Treat it as a living contract; when you change a rule, you change the charter and note the date.

How do I structure the files?

Separate concerns from the start so the edition outlives any one tool:

text
edition/
  data/        # TEI/XML source — the scholarly asset
  schema/      # ODD + compiled RNG/Schematron
  images/      # or IIIF manifest URLs, not bitmaps in the repo
  build/       # generated HTML — never edited by hand
  docs/        # editorial declaration, charter, changelog

Your data directory is the thing that matters in twenty years. The build directory is disposable; you must be able to regenerate it from data plus schema with one command.

What are the common pitfalls?

The biggest is encoding before the schema is stable, which forces re-encoding. The second is letting presentation leak into the source — colours, font choices, and HTML classes belong in stylesheets, never in TEI. The third is ignoring accessibility until launch; a facsimile-and-transcription view needs keyboard navigation and alt text from the design stage. The fourth is choosing a bespoke database before you have proven you cannot deliver with static files plus IIIF.

Key Takeaways

  • Decide the editorial model first; it governs schema, apparatus, and interface.
  • Standardise on TEI P5 with a customised ODD and validate continuously.
  • Scope by measured encoding throughput and double it for collation and review.
  • Keep a two-page charter under version control as a living contract.
  • Separate data, schema, and build so the edition survives tool changes.
  • Run a pilot on your hardest five pages before committing to a workflow.
  • Plan accessibility and persistent identifiers from day one, not at launch.

Frequently Asked Questions

What is the first decision when planning a digital scholarly edition?

Define your editorial model — documentary, critical, or genetic — because it drives your transcription policy, your TEI schema, and your interface. Everything downstream depends on this choice, so settle it in writing before any encoding begins.

Do I need TEI to plan a digital edition?

Not strictly, but TEI P5 is the de facto standard and almost every publishing tool (TEI Publisher, EVT, edition viewers) expects it. Choosing TEI early gives you interoperability, validation, and a large community of reusable stylesheets.

How long does it take to produce a digital scholarly edition?

Plan in transcription throughput, not calendar months. A trained encoder manages roughly 8–15 manuscript pages of full TEI per day; multiply by your page count and double it for collation, review, and interface work.

What should go in an editorial declaration?

Your transcription policy, normalisation rules, how you treat abbreviations and corrections, your witness sigla, and responsibility statements. It lives in the TEI header's encodingDesc and editorialDecl and makes every choice auditable.

How do I keep a digital edition sustainable from day one?

Separate your data (TEI/XML) from your presentation layer, version everything in Git, and prefer static or standards-based delivery (IIIF, plain TEI) over a bespoke database that one developer maintains.

What is the most common planning mistake?

Encoding before agreeing the model and schema. Teams discover six months in that their markup cannot express what the edition needs, forcing a costly re-encode. A two-week pilot on five hard pages prevents this.