Skip to content
Historical Gazetteers & Place Data

When a historical place name maps to the wrong modern location, the cause is almost always one of four things: a same-name modern town with higher prominence, an outright renaming, a one-to-many administrative split, or a string match made without a period or regional filter. The fix is to stop matching on bare strings and instead resolve through identifiers, period filters and an explicit relationship type. This guide walks the common failures, how to diagnose each fast, and the fix that actually holds.

Why does my historical name match the wrong modern place?

The classic failure: "Boston" in a 17th-century English document resolves to Boston, Massachusetts because the modern city dwarfs the Lincolnshire town on population. Diagnose it by checking whether the chosen candidate's country or region contradicts the document's known context. The fix is a two-line guard — restrict candidates to the period-plausible region before ranking.

python
def filter_candidates(cands, period_region):
    # period_region e.g. a bounding box or country list valid for the source
    keep = [c for c in cands if c["region"] in period_region]
    return keep or cands  # fall back, but flag for review

If the keep list empties, do not silently fall back to the global winner — flag the mention for review. Silent fallback is how the wrong Boston gets into a published dataset.

How do I handle renamed and vanished places?

A renamed place (Christiania to Oslo, Constantinople to Istanbul) will never string-match its modern form. A vanished settlement has no modern successor at all. For both, map to coordinates plus a typed relationship rather than forcing a one-to-one equivalence.

SituationWrong fixRight fix
RenamedDrop the recordLink via renamed_to, keep both names
DepopulatedForce nearest modern townCoordinates + former settlement note
MergedPick one survivormerged_into with all parts listed
SplitSingle foreign keyOne-to-many typed relationships

What causes one-to-many mapping problems?

Administrative reform is the usual culprit. A historical parish may have been split among three modern civil parishes, or several merged into one. If your schema has a single modern_id column you are forced to discard real information. Model the join as its own table of typed relationships so a query can answer "which modern units cover this historical parish?" without guessing.

Should I match on strings or identifiers?

Identifiers, every time you can get them. String matching breaks on spelling drift ("Salop" for Shropshire), abbreviation, and exonyms. The robust pattern is: try an identifier crosswalk first (Wikidata, GeoNames, a national gazetteer), fall back to normalised string matching only for the residue, and mark every string-matched record for human review. A useful normaliser strips diacritics, lowercases, and collapses common historical spelling variants you observe in your own data.

bash
# quick audit: how many of my mappings rested on string match alone?
csvgrep -c match_method -m "string" mapping.csv | csvstat --count

If that count is large, your dataset's reliability rests on the weakest method. Prioritise converting those to identifier-based links.

How do I deal with transliteration and exonyms?

Keep three fields, not one: the original-script form, a normalised Latin transliteration, and the modern endonym. Map through a variant table so the historical exonym ("Leghorn") reaches the modern endonym ("Livorno") via an explicit row, not a fuzzy direct match. This makes the chain auditable and lets you reuse the same variant table across projects.

How should I document an uncertain mapping?

Never let a shaky mapping look as clean as a certain one. Add a confidence flag (high / medium / low) and a one-line note recording the evidence and the gazetteer version used. Reviewers and reusers can then filter on confidence, and an honest "low, successor inferred from boundary overlap" is far more valuable than a tidy but unsupported match.

Key Takeaways

  • The wrong-modern-match bug is usually prominence bias — add a period and region filter before ranking.
  • Map renamed and vanished places with typed relationships, never a forced one-to-one equivalence.
  • Model administrative splits and merges as a relationship table, not a single foreign key.
  • Prefer identifier crosswalks; treat every string match as provisional and flag it for review.
  • Handle exonyms through an explicit variant table holding original, normalised and modern forms.
  • Record a confidence flag and evidence note so uncertain mappings never masquerade as certain.

Frequently Asked Questions

Why does my historical name match the wrong modern place?

Almost always because a same-name modern town outranks the correct one on population, or because the historical name was renamed entirely. Add a period filter and a regional bounding box before trusting any string match.

How do I handle a place that no longer exists?

Map it to coordinates and a 'former settlement' note rather than forcing it onto a modern successor. Record the relationship type — succeeded_by, merged_into, depopulated — so downstream users understand the link.

What causes one-to-many mapping problems?

A single historical name often splits into several modern units, or several historical parishes merge into one. Model these as explicit typed relationships, not a single foreign key, or you will lose information silently.

Should I match on name strings or on identifiers?

Match on stable identifiers wherever possible. String matching is a last resort that should always be reviewed; identifiers from GeoNames, Wikidata or a national gazetteer survive spelling drift and renaming.

How do I deal with transliteration and exonyms?

Store the original-script form, a normalised Latin form, and the modern endonym separately. Map through a variant table rather than trying to match the historical exonym directly to the modern endonym.

How do I document a mapping decision I am unsure about?

Add a confidence flag and a short note with your evidence and the gazetteer version. An honest 'low confidence, see note' is far more useful than a clean-looking but wrong match.