Skip to content
Digital Scholarly Editions

Always put your edition's source under version control — Git history of your TEI is non-negotiable provenance. The genuine decision is narrower: should you publish multiple citable versions to readers? Do that only when the edition is a living research object that scholars will cite at specific points in time — actively revised texts, evolving apparatus, or collaborative editions. For a critical edition that will be finished and then stand, a single tagged, citable release plus internal Git history is the right, lighter-weight choice.

What does "versioning an edition" actually mean?

Two different things hide under one word. Internal versioning is your Git repository: every commit, branch, and diff, used by the team to track who changed what and why. Published versioning is a curatorial act — you release named, citable snapshots (v1.0, v2.0) that readers can cite and that you commit to keeping available forever. Conflating these leads people to either skip Git ("we don't need versions yet") or over-engineer a public version-switcher nobody uses. Separate them and decide each on its own merits.

Should I always use Git internally?

Yes, without exception. Even a one-person, one-manuscript edition benefits from a commit history: it records editorial reasoning, lets you revert a bad normalisation, and provides provenance for every reading. The cost is near zero. Initialise it the day you start:

bash
git init edition
echo "build/" >> .gitignore        # never commit generated HTML
echo "images/*.tif" >> .gitignore  # facsimiles belong in IIIF, not Git
git add data/ schema/ docs/
git commit -m "Initial transcription, witnesses A-C, ch.1"

Commit semantically — one editorial decision per commit — so the log reads as an editorial diary, not a backup dump.

When is published, citable versioning worth it?

Weigh the benefit (readers can cite a stable point) against the cost (you maintain every version forever). The signals that justify it:

SignalLean toward published versions
Edition revised for yearsYes — readers cite a moving target
Apparatus/readings changeYes — citations must pin a state
Collaborative, ongoingYes — releases mark agreed states
Finished critical editionNo — one release suffices
Small documentary editionNo — current build + changelog

If three or more rows point to "yes", build a version-switcher. Otherwise, publish one citable release and keep the rest in Git.

How should Git handle TEI well?

Git diffs line by line, so XML formatting determines whether your history is readable. Normalise whitespace with a pretty-printer before committing so a one-word change does not show as a whole reformatted paragraph:

bash
xmllint --format witness.xml --output witness.xml
git add witness.xml
git commit -m "Emend 'obscuram' to 'obscaenam' (ch.3, conjecture ER)"

Tag releases so a citation can resolve to an exact state:

bash
git tag -a v1.0 -m "First public edition, ch.1-12"
git push --tags

A reader citing v1.0 can always retrieve precisely what they read.

What does published versioning cost in practice?

It is ongoing curation, not a one-off. Each released version must stay online and citable indefinitely, you must write a changelog explaining what differs between versions, and your interface needs a way for readers to choose a version. That is real hosting and labour commitment stretching past the project's funded life. Underestimating this is how editions end up with a broken "v2" link and an unreachable "v1".

Can I keep the data versioned but the site single?

Yes, and for most editions this is the sweet spot. Maintain full Git history of the TEI for provenance and accountability, but publish only the current build to readers alongside a human-readable changelog. You get complete internal auditability — every reading traceable to a commit — without promising to host a museum of past websites. Reserve public multi-version delivery for editions where citation precision genuinely demands it.

Key Takeaways

  • Always version the source in Git; that decision is settled, not optional.
  • Distinguish internal Git history from published, citable public versions.
  • Publish multiple versions only for living, actively cited editions.
  • Commit semantically, one editorial decision per commit, for a usable history.
  • Pretty-print XML before committing so diffs stay readable.
  • Tag releases so citations resolve to an exact textual state.
  • Published versioning is indefinite curation labour, not a one-time release.

Frequently Asked Questions

Should every digital edition be under version control?

Yes — the source TEI/XML should always live in Git regardless of project size. The real decision is whether to publish multiple citable versions to readers, which is a heavier commitment that not every edition needs.

What is the difference between internal versioning and published versions?

Internal versioning is your Git history — every commit, for the team. Published versioning means releasing tagged, citable snapshots (v1.0, v1.1) that readers can reference and that you promise to keep available. The first is mandatory; the second is a curatorial choice.

When is publishing multiple versions worth the cost?

When the edition is actively revised over years, when scholars cite specific readings that may change, or when the edition is a moving research object. For a one-time critical edition that will not change, a single citable release is enough.

How does Git handle TEI editions specifically?

Git tracks line-based diffs, so keep one logical unit per line or use a pretty-printer to keep diffs readable. Commit semantically — one editorial decision per commit — and tag releases. Large binary facsimiles should stay out of Git; reference them via IIIF instead.

What are the costs of published versioning?

You must keep every released version online and citable indefinitely, document what changed between them, and design an interface that lets readers pick a version. That is ongoing curation labour and hosting commitment, not a one-off.

Can I version the data without versioning the website?

Yes, and it is often the right call. Keep rich Git history of the TEI for provenance, but publish only the current build to readers, with a changelog. You get full internal accountability without the burden of maintaining many live public versions.