Troubleshooting: Tell apart confusable letterforms

When two letterforms keep getting confused, the fix is almost never staring harder at the single glyph — it is gathering context. Resolve confusables by checking the scribe's own unambiguous examples elsewhere in the manuscript, by testing each candidate reading against the word and its language, and by recording any remaining doubt explicitly rather than guessing silently. This article walks through the recurring failure modes and the diagnostic that fixes each one.

What are the most common confusable pairs?

Most transcription errors cluster around a handful of culprits. Knowing the list turns vague unease into a targeted check.

Confusable	Distinguishing signal	Most common in
`i n m u` (minims)	dots on i, ligature shape, word sense	Gothic, cursive
long `ſ` vs `f`	crossbar full (f) vs half/none (ſ)	print and hand to ~1800
`c` vs `t`	t has a stroke above the bowl	Secretary, court hand
`u` vs `v`	position in word, not shape	medieval–early modern
`r` rotunda vs `2`/`z`	follows o, b, p	Gothic
`e` vs `o`	open vs closed top	worn or faded hands

Why do minims cause so many errors?

Minims are the short vertical strokes that build i, n, m, u. In a hand that does not join or dot them, the word minimum really can look like a picket fence of ten identical strokes. The root cause is that the information you need is not in the strokes — it is in the spacing, any surviving i-dots, and the word's meaning. Count the minims first, then partition them against a plausible word. If the count is ambiguous, that ambiguity is real and must be recorded.

How do I build a per-scribe reference sheet?

This is the single highest-leverage fix. Instead of trusting a printed alphabet, harvest the scribe's own letters:

text

1. Pick the troublesome letter, e.g. final-position r.
2. Find THREE clear, uncontested instances on nearby pages.
3. Crop them side by side.
4. Note what makes them unambiguous (a following space, a known word).
5. Compare your doubtful glyph against these, not against a textbook.

A scribe is internally consistent far more often than they match a type sample, so their own hand is the best key to their hand.

How should I record a letter I cannot resolve?

Never bury a guess. Make the doubt visible and machine-tractable so later analysis can see it:

xml

<!-- TEI: a reading you are unsure of -->
<choice>
  <unclear reason="faded" cert="medium">non</unclear>
  <unclear reason="faded" cert="low">uon</unclear>
</choice>

In a plain-text workflow, a bracket convention works too — for example m[?]nimum — as long as it is documented in your transcription guidelines so it is not mistaken for editorial deletion.

Can software tell confusable letters apart for me?

A handwriting-text-recognition (HTR) model trained on enough pages of the same scribe internalises context and routinely beats a beginner on minims, because it has seen which strokes resolve to which words. But it has two failure modes you must manage: it commits confidently even when the source is genuinely ambiguous, and it inherits any systematic bias in your ground truth. Treat model output as a strong first draft, then review the low-confidence regions it flags.

python

# surface tokens the model was least sure about, for human review
for line in page.lines:
    if line.confidence < 0.80:
        print(f"{line.id}\t{line.text}\t{line.confidence:.2f}")

A diagnostic checklist for stubborn cases

When a letter still will not yield, run this order: (1) count strokes; (2) test each candidate word against the language and date; (3) compare three of the scribe's own clear examples; (4) check the parallel place in any other copy of the text; (5) if doubt remains, record it with a reason and confidence and move on. Spending an hour on one glyph that the source itself left ambiguous is wasted; documenting the ambiguity is the correct, defensible outcome.

Key Takeaways

Resolve confusables with context — the scribe's own clear examples, the word, the language — not by staring at one glyph.
Minims (i n m u) cause the most errors; count strokes, then partition against a plausible word.
Long ſ versus f is decided by the crossbar: full for f, half or none for long s.
Many hands treat u/v and i/j as positional variants; transcribe diplomatically, normalise separately.
Build a per-scribe reference sheet from three uncontested instances of each tricky letter.
Record unresolved letters explicitly with TEI unclear or a documented bracket convention — never silently guess.
HTR helps with context but commits confidently to ambiguous cases; review its low-confidence output.

Frequently Asked Questions

Why do minim letters like i, n, m, u get confused?

In Gothic and many cursive hands these letters are built from identical vertical strokes (minims) with little or no joining, so iiiii could read as in, ni, ui or m. You disambiguate from word context, surviving dots on i, and the scribe's habits elsewhere on the page.

How do I tell long s from f?

Long s has a crossbar only on the left of the stem (or none at all); f has a full crossbar running through. When the photo is poor, check whether the supposed letter ever appears word-finally — long s rarely does in many hands, while f does freely.

What is the fastest way to resolve a confusable letter?

Build a per-scribe reference sheet: find three unambiguous instances of each tricky letter elsewhere in the same manuscript and compare. The scribe's own habits beat any printed alphabet table.

Are u and v interchangeable in old manuscripts?

Often yes — many medieval and early-modern hands treat u and v (and i and j) as positional variants of one letter, not two. Transcribe what is on the page in a diplomatic layer and normalise to modern u/v in a separate reading layer.

How should I record an unresolved letter?

Use a documented convention: enclose your best reading in square brackets or mark it with a TEI unclear element giving a reason and a confidence value. Never silently guess, because a hidden guess corrupts downstream analysis.

Can HTR models fix confusable letters automatically?

Partly. A model trained on enough of the same scribe learns context and beats a beginner on minims, but it still struggles with genuinely ambiguous cases and will commit confidently to one reading, so you must review its output rather than trust it.

What are the most common confusable pairs? ​

Why do minims cause so many errors? ​

How do I build a per-scribe reference sheet? ​

How should I record a letter I cannot resolve? ​

Can software tell confusable letters apart for me? ​

A diagnostic checklist for stubborn cases ​

Key Takeaways ​

Frequently Asked Questions ​

Why do minim letters like i, n, m, u get confused? ​

How do I tell long s from f? ​

What is the fastest way to resolve a confusable letter? ​

Are u and v interchangeable in old manuscripts? ​

How should I record an unresolved letter? ​

Can HTR models fix confusable letters automatically? ​

Related reading ​