Appearance
Transkribus and eScriptorium are the two leading platforms for handwritten text recognition, and the honest answer is that they trade off convenience versus control. Transkribus is a polished hosted service with a huge public model library and a pay-per-page credit model; eScriptorium is open-source software you self-host, free of per-page fees but requiring servers and some technical effort. Choose Transkribus to start fast; choose eScriptorium for data sovereignty and large-scale economics.
At a glance: how do they compare?
| Dimension | Transkribus | eScriptorium |
|---|---|---|
| Licence / model | Hosted SaaS, credit-based | Open source (GPL), self-hosted |
| Recognition engine | PyLaia / proprietary | Kraken |
| Public models | Large library | Few; bring your own |
| Cost shape | Per page (credits) | Hardware / hosting only |
| Infrastructure | None needed | Server + GPU recommended |
| Export | PAGE, ALTO, TEI, DOCX, PDF | PAGE, ALTO, text |
| Best for | Fast start, smaller jobs | Large, controlled, long-term |
Which costs less over a whole project?
Transkribus charges credits roughly per recognised page, so a 50,000-page job has a real, scaling bill. eScriptorium has no per-page charge — your cost is the server and, ideally, a GPU. For a few thousand pages, Transkribus is almost always cheaper once you count your own time. Past a tipping point (often tens of thousands of pages), self-hosting eScriptorium becomes the economical choice, provided you have the technical capacity to run it.
How hard is each to set up?
Transkribus needs only a browser and an account — zero infrastructure. eScriptorium typically runs via Docker:
bash
# Minimal eScriptorium dev stack (illustrative)
git clone https://gitlab.com/scripta/escriptorium.git
cd escriptorium
docker compose up -d
# then browse to http://localhost:8080For production you want a GPU for training and tuned worker counts. That is a meaningful sysadmin commitment, so weigh whether your team can sustain it for the project's lifetime.
What about training custom models?
Both train custom models, but the workflow differs:
- Transkribus trains on its servers from the web app. You assign training/validation pages, pick a base model, and read a CER chart. No command line needed.
- eScriptorium trains Kraken models. You can train in the UI or drop to the Kraken CLI for full control:
bash
# Train a Kraken HTR model from ALTO/PAGE ground truth
ketos train -o my_model -f page training/*.xml
# Evaluate on held-out data
ketos test -m my_model_best.mlmodel eval/*.xmlThe CLI route gives reproducible, scriptable training that suits research that must be documented and re-run.
Can you migrate between them?
Yes — and you should keep this option open. Both speak PAGE XML and ALTO XML, the lingua franca of layout-plus-text. Export ground truth from one, import to the other, and you keep your most valuable asset: corrected transcription. Region types and custom tags may need a small remapping pass, but the heavy lifting transfers cleanly. Avoid lock-in by always retaining PAGE/ALTO exports.
Which should you choose for your archive?
- Pick Transkribus if you want results today, your project is small-to-medium, you value the public model library, or you have no one to run servers.
- Pick eScriptorium if you need full control of data and infrastructure, your project is large or open-ended, you want GPL openness, or per-page credits would dominate your budget.
Many institutions sensibly do both: prototype in Transkribus, then scale production on eScriptorium once the workflow is proven.
Key Takeaways
- The core trade-off is convenience (Transkribus) versus control (eScriptorium).
- Accuracy depends on the model, not the platform; both reach similar CER.
- Transkribus needs no infrastructure; eScriptorium needs a server and ideally a GPU.
- For small jobs Transkribus is cheaper; at very large scale self-hosting wins.
- PAGE/ALTO XML lets you migrate ground truth between them — keep those exports.
- eScriptorium's Kraken CLI offers reproducible, scriptable training for research.
- A hybrid approach (prototype in Transkribus, scale in eScriptorium) is common.
Frequently Asked Questions
Is eScriptorium free?
eScriptorium is open-source under the GPL and free to install on your own server. Some institutions also offer hosted instances at no per-page cost, but you still pay for the hardware or hosting that runs it.
Which is more accurate, Transkribus or eScriptorium?
Accuracy depends on the model, not the platform. Both use comparable deep-learning engines, so a well-trained model in either reaches similar CER. Transkribus has a larger public model library, while eScriptorium relies on Kraken models you supply or train.
Can I move my data between the two platforms?
Yes, via PAGE XML and ALTO XML, which both support. You can export ground truth from one and import it into the other, though region and tagging conventions may need light remapping.
Does eScriptorium require technical skills to run?
Self-hosting eScriptorium needs Docker and basic server administration, plus a GPU for fast training. If you only have a hosted instance, day-to-day use is browser-based and no harder than Transkribus.
Which platform is better for a large funded project?
Transkribus is faster to start and needs no infrastructure, which suits short or smaller projects. For very large or long-running work where credit costs add up, a self-hosted eScriptorium can be cheaper and gives full data control.
Do both support training custom models?
Yes. Transkribus trains models in its web app on its servers; eScriptorium trains Kraken models, either in the interface or via the Kraken command line for full control over parameters.