Skip to content
Transkribus Workflows

To pick a public Transkribus model, open the Models tab, switch the filter to Public models, and sort by the combination that matches your document: language, script type, and century. Then read each candidate's model card for its training-set description and validation CER, and run a quick test on 3-5 of your own pages before committing. The headline accuracy figure is only a starting clue — the real decision comes from that small test, because a model's score is measured against its own data, not yours.

How do I browse the public model library?

Inside the desktop or web app, the Models view defaults to My models. Change the dropdown to Public models and you get the full published catalogue — several hundred models covering everything from medieval Latin charters to nineteenth-century German Kurrent and early modern Dutch notarial hands.

Filter aggressively. The most useful facets are:

  • Language (Latin, German, English, French, Dutch, Spanish, etc.)
  • Script type (Gothic/Textura, Bastarda, Secretary, Kurrent, Antiqua, humanist cursive)
  • Date range the model was trained on

A model trained on 1550-1650 German Kurrent will fail on 1850 Kurrent even though both are "Kurrent" — handwriting drifts by generation.

What do the numbers on a model card mean?

Each card reports a CER (Character Error Rate) on the model's own validation set, plus the number of words and pages used in training.

FieldWhat it tells youWhat it does not tell you
Validation CERDifficulty of the model's own held-out pagesHow it performs on your material
Training wordsRough robustness and coverageWhether your script style is represented
Date / languagePeriod and tongue matchRegional hand variation

Treat a 4% validation CER as "promising," not "guaranteed." Two models with identical CER can behave completely differently on your scans.

Specific model or generic super-model?

For a long time you chose narrow, single-collection models. Now Transkribus also publishes broad multilingual super-models (the "Text Titan" line and similar) trained on millions of words across many hands and centuries.

text
Decision rule:
  if (your script + language + period) is inside a specific model's scope:
      use the specific model        # usually the lower CER
  else:
      use a generic super-model     # robust fallback, slightly higher CER

Specific models win on accuracy when they match; super-models win on coverage when your material is mixed, multilingual, or simply unusual.

How do I test a model before spending credits?

Upload a representative handful of pages — varied ink density, a clean page and a stained one — and run recognition.

bash
# Conceptual: run model 12345 on a 5-page sample collection
transkribus-run --collection 88231 --model 12345 --pages 1-5
# then inspect CER against a quick manual transcription of one page

Transcribe one page by hand, compute CER against the model output, and you have a real, material-specific number in under thirty minutes. That single figure should drive the decision far more than the card's advertised score.

When should I stop searching and start training?

If your best public test result sits below roughly 10% CER, correct and move on — that is faster than building a model. If every candidate lands above 15%, pick the closest as a base model and fine-tune. Training from a related base typically needs only a few hundred lines of ground truth and converges in a fraction of the epochs needed from scratch.

Key Takeaways

  • Switch the Models filter to Public models and facet by language, script, and date — not just language.
  • A model's advertised CER is measured on its own data; only a sample run on your pages is decisive.
  • Use a specific model when it matches your hand; fall back to a generic super-model otherwise.
  • A real-material CER under ~10% means correct-and-go; above ~15% means find a better match or train.
  • The best public model also makes the best base model for later fine-tuning.
  • Browsing models is free; only recognition runs consume credits, so test cheaply and early.

Frequently Asked Questions

Where do I find public Transkribus models?

Open the Models tab in Transkribus and switch the filter from 'My models' to 'Public models'. The browser lists every published model with its language, century coverage, and validation CER.

What is a good CER to expect from a public model?

A well-matched public model on similar material typically gives 5-10% CER out of the box. Anything under 10% is usually worth correcting rather than retyping; above 15% you should look for a better match or plan training.

Should I use a generic super-model or a specific one?

Use a specific model when your script, language, and date fall squarely inside its training data. Reach for a generic super-model like the Transkribus 'Text Titan' family only when no specific model matches, since broad coverage trades some accuracy for flexibility.

Can I combine a public model with my own training?

Yes. Pick the closest public model as a base model and fine-tune it on a few hundred lines of your own ground truth. This usually beats training from scratch and converges far faster.

Why does a highly-rated public model do badly on my pages?

The reported CER reflects the model's own validation set, not your material. Script style, ink, scan quality, and layout differ; always run a small free test on your own pages before trusting the headline number.

Do public models cost credits to run?

Running any model consumes Transkribus credits per page, whether the model is public or your own. Browsing and selecting models is free; only the recognition run is billed.