Appearance
Getting started with Transkribus takes four steps: create a free account at app.transkribus.org, upload your scanned pages into a collection, run automatic layout analysis, then apply a public recognition model to produce editable text. You can reach a first machine transcription of a few pages in well under ten minutes, and your starter credit allowance covers roughly 100 pages before you pay anything. This guide walks the whole path without the options that overwhelm beginners.
What is Transkribus and what does it actually do?
Transkribus is a platform for Handwritten Text Recognition (HTR) and OCR built specifically for historical documents. Unlike generic OCR, it reads cursive hands, secretary script and degraded print that ordinary tools fail on. It does three things in sequence: detects the layout (where lines and regions are), recognises the text on each line, and lets you correct that text in a side-by-side editor. The corrected output becomes training data for models tuned to your exact material.
The product you want is the web app, not the old desktop client. Everything below assumes the browser version.
How do I create an account and a collection?
- Go to
app.transkribus.organd register with an email address. - Confirm the email and log in. New accounts get a free credit allowance automatically.
- Create a collection — this is just a folder that groups related documents. Name it something like
Parish Registers 1700-1750.
Collections matter later because models, jobs and sharing all operate at the collection or document level. Keep one project per collection rather than dumping everything into one bucket.
How do I upload and prepare my pages?
Upload by dragging image files (or a PDF) into your collection, or import from a IIIF manifest URL if your source archive publishes one. Order matters, so name files with zero-padded numbers (page_001.jpg, page_002.jpg) before uploading.
A quick local prep pass pays off:
bash
# Batch-resize oversized TIFFs and convert to quality JPEG before upload (ImageMagick)
mogrify -resize 3000x3000\> -quality 92 -format jpg *.tif| Aspect | Good | Avoid |
|---|---|---|
| Resolution | 300 DPI+ | under 200 DPI |
| Lighting | even, no glare | hard shadows |
| Skew | straight pages | tilted scans |
| Format | TIFF / PNG / quality JPEG | heavily compressed JPEG |
Running layout analysis and recognition
Open a document, select the pages, and run Layout Analysis. Transkribus detects text regions and draws baselines under each line. Review a couple of pages — if lines are merged or skipped, you can re-run with a different layout model, but for clean pages the default usually works.
Next, choose Text Recognition and pick a model. For a first run, select a public model that matches your material rather than training your own. Start the job and watch the status move from queued to running to done.
text
Job pipeline: upload -> layout analysis -> text recognition -> correct
Credits spent: 0 0 ~1 per page 0How do I correct and read the results?
Open a recognised page in the Text tab. The image sits on the left, the transcription on the right, line by line. Click a line to place your cursor and fix errors. Use keyboard navigation to move between lines quickly. Corrected pages are saved as new versions, so you never lose the raw machine output.
If a model is hitting roughly 95% character accuracy (a 5% CER), correction is fast clean-up. If it is wildly wrong, the model probably does not match your script — try a different public model before assuming you need to train one.
What should I do after my first transcription?
Once you trust the workflow, the natural next moves are: tag structure (headings, marginalia), export to a usable format, and — if accuracy is not good enough — collect ground truth and train a custom model. Each of those is a deliberate next step rather than something to wrestle with on day one.
Key Takeaways
- Use the web app at
app.transkribus.org; the Java desktop client is deprecated. - The free starter allowance covers about 100 pages, enough to evaluate the platform properly.
- Group work into collections, one project per collection.
- Upload at 300 DPI+ with sensible filenames; clean scans beat clever models.
- The pipeline is always layout → recognition → correction; only recognition spends credits.
- Begin with a public model that matches your script before considering custom training.
- Corrected pages double as future ground truth, so accuracy compounds over time.
Frequently Asked Questions
Is Transkribus free to use?
Creating an account and uploading documents is free, and the layout step costs nothing. Text recognition is paid in credits, but new accounts receive a free starter allowance (around 100 credits) so you can transcribe roughly 100 pages before paying.
Do I need to install software to use Transkribus?
No. The current product is the Transkribus web app at app.transkribus.org, which runs entirely in your browser. The old Java desktop client (Transkribus eXpert) is deprecated and no longer recommended for new users.
What image quality does Transkribus need?
Aim for 300 DPI or more, sharp focus and even lighting. JPEG, PNG, TIFF and PDF all work. Avoid heavy JPEG compression and skewed pages, since poor scans hurt both layout detection and recognition accuracy.
Which model should a beginner pick first?
Start with a public model that matches your script and language, such as Transkribus Print Multilingual for typeset text or a general handwriting super-model. Run it on a few pages before deciding whether you need a custom model.
How long does my first transcription take?
Layout analysis and recognition for a handful of pages usually finish in one to a few minutes once the job reaches the front of the queue. A whole document of hundreds of pages may take an hour or more depending on load.
Can I correct the machine transcription?
Yes. The text editor sits beside the page image, and you click any line to edit it. Corrections are saved as versions, and corrected pages later become the ground truth you use to train a better model.