Skip to content
OCR & HTR Pipelines

Choose the right HTR model by matching the model's training data to your script, language and period — then proving the match by measuring CER on a sample of your own pages. The instinct to grab the biggest, most general model is usually wrong: a specialised model trained on the exact hand you face will beat a larger generic one nearly every time. Before you even consider training your own, you should be able to name your document's script family, language and approximate date, because those three facts drive the entire selection.

How do I pick a model for my manuscript?

Start by characterising the document along the axes that matter to a recognition model:

AxisExample valuesWhy it matters
Script / handsecretary, Kurrent, italic, Greek minusculeDominant driver of accuracy
LanguageLatin, Early Modern English, GermanDrives the language model / dictionary
Period16th c., 19th c.Letterforms and abbreviations shift over time
Materialmanuscript, print, mixedManuscript models ≠ print models
Conditionclean, faded, bleed-throughMay need a model trained on noisy data

Then shortlist two or three pretrained models whose training data overlaps your axes, and test them. The match in hand and era dominates; a model trained on the same scribal school will read your page far better than a generic one with ten times the training data.

Can a generic model read any handwriting?

No. "Super" or generic models exist (Transkribus' broad multi-language models, wide Kraken baselines) and they are a reasonable first try when you have no closer match. But on a distinctive script — English secretary hand, German Kurrent, Greek minuscule — a specialised model typically wins by several CER points. Use a generic model as a fallback or a starting point for fine-tuning, not as the default answer.

Where do I find pretrained HTR models?

Three main sources, each with model cards you must actually read:

  • Transkribus public models — searchable marketplace, each listing documents language, date range, training-set size and reported CER.
  • Kraken / eScriptorium repositories — open models, many for non-Latin scripts, distributed as .mlmodel files.
  • Zenodo — archived, citable models with DOIs and documentation.

The model card's stated training data and CER tell you whether it is even worth testing — but the reported CER was measured on their material, so treat it as a screen, not a guarantee.

How do I compare models objectively?

Never trust demo pages. Transcribe a small held-out sample of your own documents as ground truth, run each candidate, and compute error rates:

bash
# Run two candidate Kraken models over the same sample, then score
kraken -i sample.png out_modelA.txt segment -bl ocr -m secretary_16c.mlmodel
kraken -i sample.png out_modelB.txt segment -bl ocr -m generic_latin.mlmodel

# Compare each to your ground truth
python eval_cer.py gt.txt out_modelA.txt    # e.g. CER 6.2%
python eval_cer.py gt.txt out_modelB.txt    # e.g. CER 11.8%

Twenty to thirty representative lines are enough to separate a good match from a poor one. Pick on measured CER, not on reputation.

When should I train my own model?

Train (or fine-tune) only when no pretrained model clears your target CER on the sample. A decision rule that saves weeks:

text
Best pretrained CER on your sample:
  ≤ target            → use it as-is
  slightly above      → FINE-TUNE the closest model (needs hundreds of lines)
  far above / no match→ TRAIN from scratch (needs thousands of lines)

Fine-tuning the closest pretrained model is almost always the right middle path: it needs an order of magnitude less ground truth than training from scratch and converges faster, because the base model already knows the script family.

A pragmatic selection workflow

  1. Characterise script, language, period, material, condition.
  2. Shortlist 2–3 pretrained models from the sources above by reading model cards.
  3. Transcribe ~25 lines of your own material as ground truth.
  4. Run each model; compare CER/WER.
  5. If the best clears target, ship it; if close, fine-tune it; if hopeless, train from scratch.

This puts measurement before commitment and stops you training a model you never needed.

Key Takeaways

  • Match the model to script, language, period, material and condition before anything else.
  • A specialised model on the right hand beats a larger generic model nearly every time.
  • Read model cards for training data and CER, but treat their reported CER as a screen, not proof.
  • Compare candidates on a held-out sample of your documents using CER/WER — never on demo pages.
  • Fine-tune the closest pretrained model before training from scratch; it needs far less ground truth.
  • Train from scratch only when no pretrained model clears your target on your own sample.

Frequently Asked Questions

How do I pick an HTR model for my manuscript?

Match the model's training data to your script, language and period first, then test two or three candidates on a sample page and compare CER. The closest match in hand and era almost always wins, even over a larger generic model.

Can a generic HTR model read any handwriting?

No single model reads all hands well. So-called generic or "super" models cover common cases but underperform specialised models on distinctive scripts like secretary hand, Kurrent or Greek minuscule.

When should I train my own HTR model instead of using a pretrained one?

Train your own when no pretrained model gets below your target CER on a sample, and you have or can create a few thousand ground-truth lines. Otherwise fine-tune the closest pretrained model — it is faster and needs less data.

Where do I find pretrained HTR models?

Transkribus' public model marketplace, the Kraken/eScriptorium model repositories, and Zenodo collections host pretrained HTR models with documented training data, language and CER. Always read the model card before using one.

How do I compare HTR models objectively?

Transcribe a held-out sample of your own material as ground truth, run each candidate model over it, and compare character error rate (CER) and word error rate. Test on your documents, never on the model's own demo pages.