Appearance
When Transkribus skips, merges, or scrambles lines, the cause is almost always layout detection, not text recognition: the engine reads along baselines, so a missing or misplaced baseline produces a missing or garbled line. The fix is a three-step loop — re-run layout analysis with a better layout model, manually repair the few baselines it still gets wrong, then re-run text recognition on just those lines. You rarely need to retype anything; you correct the geometry and let recognition flow over it again.
Why does the order of operations matter?
Transkribus runs two separate models per page. Layout analysis finds text regions and draws baselines; text recognition then transcribes along each baseline. They are independent, which is the key to efficient fixing.
text
Layout model → text regions + baselines
│
▼
Text model → characters along each baselineIf you correct text without fixing the underlying baseline, the error returns the next time you re-run recognition. Always repair geometry first.
Why is Transkribus skipping lines?
Missing baselines come from a handful of recurring causes:
- Faint or browned ink the detector reads as background.
- Tight line spacing where two lines fuse into one detected band.
- Marginalia and headers outside the main text region's box.
- A layout model trained on different material — a single-column-prose model will ignore a two-column register.
The first move is to re-run layout analysis with a layout model that matches your page structure. Transkribus offers generic line-detection plus trainable field/P2PaLA models for structured pages.
How do I manually repair baselines?
For the stragglers, edit in the canvas. Select the line tool, and:
- Draw a new baseline left-to-right under any text line the model missed.
- For a merged pair, delete the bad baseline and draw two correct ones, or use the split handle.
- Nudge a baseline's start/end points so it covers the whole line, including a trailing flourish.
- Make sure each baseline sits inside the correct text region — a baseline outside any region is dropped.
Then re-run text recognition on the page (not the whole document). Only the changed lines are re-read.
What about skew and image quality?
A page scanned at a slight angle is a top cause of merged baselines, because the detector's horizontal bands cut across two slanted lines.
bash
# Deskew before layout analysis (example with ImageMagick)
magick page0042.tif -deskew 40% -threshold 50% page0042_clean.tifRe-upload the cleaned image, or apply Transkribus's built-in image preprocessing, then re-run layout analysis. Many "baseline bugs" simply vanish once the page is straight and the contrast lifted.
Baseline error cheat-sheet
| Symptom | Likely cause | Fix |
|---|---|---|
| Whole lines missing from transcript | No baseline drawn (faint ink) | Re-run layout; draw missing baselines |
| Two lines as one garbled line | Single baseline over two rows | Split baseline; deskew page |
| Marginal notes ignored | Region box too small | Extend or add a text region |
| Lines transcribed but out of order | Wrong reading order | Renumber reading order, not baselines |
| Same failure on every page | Mismatched layout model | Train/select a field model on samples |
When should I train a layout model?
If you fight the same baseline failures across hundreds of pages — a fixed two-column ledger, say, or a consistent marginal-gloss layout — stop correcting page by page. Annotate 15-25 representative pages with correct regions and baselines, train a field model, and apply it to the whole collection. The up-front cost pays back within a few dozen pages.
Key Takeaways
- Baseline errors are a layout problem; recognition only reads what the baselines define.
- Always fix geometry first, then re-run recognition — never retype to paper over a bad baseline.
- Deskewing and contrast cleanup eliminate a large share of merged-line errors.
- Keep every baseline inside the right text region, or it is silently dropped.
- Reading order is separate: fix baselines, then renumber order before export.
- For repeating collection-wide failures, train a field/P2PaLA layout model rather than hand-correcting.
Frequently Asked Questions
Why is Transkribus skipping lines on my page?
The layout model failed to draw a baseline for those lines, usually because of faint ink, close line spacing, or a marginal note the detector ignored. Re-run layout analysis with a better-matched layout model or draw the missing baselines manually.
What is the difference between a baseline and a text region in Transkribus?
A text region is the box that groups lines belonging together (a column or paragraph), while a baseline is the single line the script sits on. Recognition reads along baselines, so a missing baseline means a missing transcribed line.
Why are two lines being merged into one?
A single baseline was drawn across two written lines, often when line spacing is tight or the page is skewed. Split the baseline at the correct point, or re-run layout analysis after deskewing the image.
Can I fix baselines without retyping the text?
Yes. Correct the baseline geometry first, then re-run text recognition on just the affected lines or page. The corrected baselines feed recognition fresh, so you never retype manually.
Does the reading order affect baseline errors?
Reading order does not create baseline errors, but a wrong reading order makes good baselines export in the wrong sequence. Fix baselines first, then check and renumber the reading order before export.
Should I train a layout model to fix recurring baseline problems?
If the same baseline failures repeat across a whole collection, training or selecting a layout (P2PaLA / field) model on representative pages fixes them at scale far better than manual correction page by page.