Resolve ambiguous historical place names: A Practical Guide

To resolve an ambiguous historical place name you turn one written form into one intended place by ranking candidate matches against contextual evidence — co-occurring places, the administrative hierarchy, the source date, and known routes — then confirming the top candidate and recording your confidence. The name alone almost never decides it; the surrounding document does. This guide walks the full workflow with worked examples.

Why is the same toponym so often ambiguous?

Three forces produce ambiguity. First, repetition: England alone has dozens of Newtons and Newports. Second, orthographic drift: a single place is spelled "Cirencester", "Cisseter", and "Ciceter" across three centuries. Third, reassignment: a name detaches from one settlement and attaches to a successor or a parish reorganisation. Resolution is the discipline of collapsing those forms onto one referent with stated evidence.

Which context signals should I weight?

Rank candidates by how well each fits the document, not by fame. The strongest signals, roughly in order:

Spatial co-occurrence — other places named in the same record cluster geographically.
Administrative containment — "Stratton in the hundred of Branch" eliminates every Stratton outside Wiltshire.
Temporal fit — a place must exist at the source's date; a 1340 charter cannot mean a Tudor foundation.
Itinerary — if the source is a journey, neighbouring stops bound the candidate.

How do I build a candidate-and-context workflow?

Generate candidates from a gazetteer, then score them against the other resolved places in the document. A simple proximity weighting works well:

python

from math import radians, sin, cos, asin, sqrt

def haversine_km(a, b):
    lat1, lon1, lat2, lon2 = map(radians, [*a, *b])
    dlat, dlon = lat2 - lat1, lon2 - lon1
    h = sin(dlat/2)**2 + cos(lat1)*cos(lat2)*sin(dlon/2)**2
    return 2 * 6371 * asin(sqrt(h))

def best_candidate(candidates, context_points):
    # context_points: coords of already-resolved places in the doc
    def score(c):
        return min(haversine_km((c["lat"], c["lon"]), p) for p in context_points)
    return min(candidates, key=score)   # nearest to the document's centre of gravity

This picks the candidate closest to where the document's other places already sit — the opposite of defaulting to the largest.

How should I handle spelling variation?

Normalise before matching, but keep the original. Run the raw form through a light transliteration and a phonetic key so "Sarisburia" can reach "Salisbury":

python

import unicodedata
def fold(s):
    s = unicodedata.normalize("NFKD", s).encode("ascii", "ignore").decode()
    return s.lower().replace("ph", "f").replace("ck", "k")

Store both the verbatim toponym and the folded key; never overwrite the source spelling.

How do I record uncertainty?

Every resolution carries a confidence value and a one-line rationale.

Decision	Confidence	Note
Stratton -> Stratton St Margaret	0.95	"in Highworth hundred" stated
Newport -> Newport (Salop)	0.60	nearest to two co-named places
Aston -> ?	0.20	five candidates, no context

A 0.20 row is a research task flagged for later; a silent guess is a future error.

What pitfalls recur most?

Largest-place bias. The famous Newport is rarely the clerk's Newport.
Anachronism. Matching to a place that did not yet exist at the source date.
Discarding the original spelling. It is the only audit trail you have.
Over-automation. Let software rank; let a historian decide the hard ones.

Key Takeaways

Resolution chooses one referent; the gazetteer only supplies candidates.
Weight candidates by spatial co-occurrence and administrative containment, not fame.
Normalise spelling for matching but always retain the verbatim form.
Score candidates against already-resolved places in the same document.
Record a confidence value and rationale for every decision.
Reserve manual review for low-confidence, consequential matches.

Frequently Asked Questions

What makes a historical place name ambiguous?

Ambiguity comes from one name referring to several places (Newport, Springfield), spelling variation across centuries, and names that moved or were reassigned. Resolution means choosing the single intended referent using contextual evidence.

What context signals help most when disambiguating?

Co-occurring places in the same document, the administrative hierarchy named (parish within hundred within county), the date of the source, and the route or itinerary if one is implied. These constrain candidates far more than the name alone.

Should I disambiguate by hand or with software?

Use software to generate and rank candidates, then confirm by hand for anything consequential. Fully automatic toponym resolution still errs on rare and obsolete places, which is exactly where historians work.

How do I record a resolution I am unsure about?

Store a confidence value and a short note on the evidence used. A flagged low-confidence match is recoverable; a silent guess presented as fact is not.

Can a gazetteer resolve names on its own?

A gazetteer supplies candidates and their attributes, but resolution is the act of choosing among them. The gazetteer is the lookup table; your contextual rules are the decision procedure.

What is the single most common mistake?

Defaulting to the largest or best-known place. The biggest Newport is rarely the one a parish register clerk meant, so weight candidates by proximity to other named places in the source.

Why is the same toponym so often ambiguous? ​

Which context signals should I weight? ​

How do I build a candidate-and-context workflow? ​

How should I handle spelling variation? ​

How do I record uncertainty? ​

What pitfalls recur most? ​

Key Takeaways ​

Frequently Asked Questions ​

What makes a historical place name ambiguous? ​

What context signals help most when disambiguating? ​

Should I disambiguate by hand or with software? ​

How do I record a resolution I am unsure about? ​

Can a gazetteer resolve names on its own? ​

What is the single most common mistake? ​

Related reading ​