Appearance
To train a Transkribus HTR model, transcribe a few thousand words of ground truth, split it into training and validation pages, pick a matching base model to fine-tune, and launch a training job — Transkribus handles the rest server-side. With a strong base model, roughly 5,000 to 15,000 transcribed words of a single consistent hand will reach a usable character error rate (CER), often under 8%. Below is how to get there without wasting effort.
How much ground truth do you really need?
Ground truth is page transcription you have corrected to near-perfection. The amount you need depends heavily on whether you fine-tune a base model:
| Scenario | Ground truth target | Expected CER |
|---|---|---|
| Single hand, strong base model | 5k–15k words | 5–8% |
| Single hand, from scratch | 50k+ words | 6–10% |
| Multiple hands, base model | 25k–50k words | 7–12% |
| Highly degraded or unusual script | 30k+ words | varies |
The single biggest lever is consistency: 8,000 carefully corrected words beat 20,000 sloppy ones. Transcription errors in ground truth teach the model the wrong thing.
What is a base model and which should you pick?
A base model is a pre-trained network you build on top of. Fine-tuning means your data refines an existing model rather than starting from zero, which can cut the ground truth you need by an order of magnitude. Choose a base model whose script, language and period are closest to your material — a 17th-century German Kurrent super-model for German court hands, an English secretary model for early-modern English, and so on. If nothing matches well, a broad multilingual handwriting super-model is the safe default.
How do you prepare and split your ground truth?
- Transcribe pages until they are essentially error-free; tag obvious structure if relevant.
- Decide a validation split — usually 10–20% of pages held out.
- In the training dialog, assign pages to Training Set and Validation Set explicitly, or let Transkribus sample automatically.
The validation set is sacred: it must be pages the model never trains on, so the reported CER reflects real-world performance, not memorisation.
text
Total ground truth: 120 pages (~24,000 words)
Training set: 100 pages
Validation set: 20 pages <- never seen during trainingLaunching the training job
In the web app, open Models, then Train Model. Configure:
- Model name and a description noting the source, hand and date range.
- Base model to fine-tune (or none).
- Training / validation assignment.
- Epochs — leave the default (often around 50) unless you have a reason to change it.
text
Train Model
name: Smith_Diary_1640s_v1
base model: English Secretary Hand XVII (super-model)
epochs: 50
train pages: 100
val pages: 20Start the job. Training runs on Transkribus servers; you can close the tab and come back when it is done.
How do you read the learning curve and CER?
When training finishes, open the model's accuracy chart. It plots CER against epoch for both training and validation data. You want the validation curve to fall and then flatten. If validation CER bottoms out and starts rising while training CER keeps falling, the model is overfitting — you trained too long or have too little data.
A healthy result shows validation CER settling at a low, stable value. Note that number; it is the figure you quote and the baseline you try to beat with the next version.
When should you iterate versus accept the model?
If validation CER meets your target (under 5% for editing, under 10% for search), apply the model to the rest of your collection and move on. If it falls short, the fix is almost always more ground truth from the pages where errors cluster, not more epochs. Add 5,000–10,000 words from the hardest pages, retrain as v2, and compare.
Key Takeaways
- Fine-tune a matching base model; it dramatically cuts the ground truth you need.
- For one consistent hand, 5k–15k clean words often reaches a usable CER.
- Consistency of transcription matters more than raw volume.
- Always hold out a validation set so reported CER is honest.
- Read the learning curve: a rising validation curve signals overfitting.
- To improve, add ground truth from error-heavy pages — not more epochs.
- Version your models (
v1,v2) so you can measure real progress.
Frequently Asked Questions
How much ground truth do I need to train a Transkribus model?
Roughly 5,000 transcribed words gets a single consistent hand started, but 15,000 to 25,000 words is better when you fine-tune a strong base model. From scratch with no base model, expect to need 50,000 words or more.
What is a base model and should I use one?
A base model is a pre-trained model you fine-tune on top of, so your data refines existing knowledge instead of starting blank. Always use one when a public model matches your script and period, because it slashes the ground truth you need.
What CER counts as a good Transkribus model?
Under 10% CER is workable for keyword search and rough reading; under 5% is good for editing; under 2.5% rivals a careful human first pass. The right target depends on whether you need search or a publishable edition.
Why split ground truth into training and validation sets?
The validation set is held out from training so Transkribus can measure honest accuracy on pages the model never saw. Without it you only see training-set CER, which flatters the model and hides overfitting.
Can one model read several different hands?
Yes, if you include enough ground truth from each hand. Mixed-hand models are slightly less accurate per scribe than a dedicated single-hand model, but they are far more practical for collections written by many people.
How long does training take in Transkribus?
Training runs server-side and typically takes from under an hour to a few hours depending on dataset size and the number of epochs. You do not need to keep your browser open; the model appears in your collection when the job finishes.