Skip to content
Historical Gazetteers & Place Data

To reconcile place names in your data, match each free-text name to a stable identifier in an authority — Wikidata, GeoNames, Pleiades or the World Historical Gazetteer — so that every spelling variant resolves to one ID. The practical engine for tabular data is OpenRefine: cluster variants first, run a reconciliation service, then confirm matches by hand. The deliverable is an ID column you can trust, plus an honest flag on whatever did not match.

What is reconciliation, and what is it not?

Reconciliation produces identifiers, not prettier strings. Cleaning turns "Yorke" into "York"; reconciliation turns "York" into Q42462. That ID is what lets you join across datasets, pull coordinates, and link to other projects. If you finish with a tidy text column and no IDs, you have cleaned, not reconciled.

How do I prepare data before reconciling?

Match rate depends almost entirely on prep. Do this first:

  1. Cluster and merge spelling variants (OpenRefine's key-collision and nearest-neighbour clustering).
  2. Strip qualifiers like "near", "parish of", "co." that block exact matches.
  3. Split multi-valued cells so one cell holds one place.
  4. Add a country or region column to constrain candidates.

Reconciling cleaned, deduplicated strings can lift the auto-match rate from under half to well over 80 percent on typical archival data.

How do I reconcile a spreadsheet in OpenRefine?

OpenRefine is the standard tool. The flow:

text
1. Import CSV  ->  2. Cluster the place column  ->  3. Reconcile
   Reconcile > Start reconciling... > add a Wikidata service
   choose type: human settlement (Q486972) or your closest type
4. Review: green ticks = auto-matched; cells with candidates need a click
5. Add column "qid" from reconciled value > cell.recon.match.id

For the World Historical Gazetteer, point the reconciliation endpoint at the WHG service instead; the same confirm/reject loop applies.

Which authority should I reconcile against?

It depends on coverage versus period accuracy. Store more than one ID where useful.

AuthorityStrengthBest for
WikidataBroad, multilingual, stable hubGeneral linking, a default ID
World Historical GazetteerCross-period historical placesPeriod-aware projects
PleiadesAncient MediterraneanClassical sources
GeoNamesModern coordinatesPresent-day settlements

A common pattern is a Wikidata QID as the primary hub plus a specialist ID for period precision.

How do I extract and store the results?

After confirming matches, pull the IDs into permanent columns. In OpenRefine's GREL:

text
cell.recon.match.id        // the matched identifier, e.g. Q42462
cell.recon.match.name      // the authority's label
cell.recon.judgment        // "matched", "none", or "new"

Export qid, the authority label and the judgment so unmatched rows (judgment == "none") are explicit and auditable.

How do I keep it reproducible and honest?

Two disciplines. First, record provenance: the authority, its snapshot date, the type you reconciled against, and the matching parameters. Second, never force weak matches: leave low-confidence names flagged and review them in batch. A reconciliation run you cannot re-execute, or one that hid its failures, is worse than none because it looks authoritative while being silently wrong.

Key Takeaways

  • Reconciliation yields stable identifiers, not just cleaned strings.
  • Cluster, normalise and split cells before reconciling to raise the auto-match rate sharply.
  • OpenRefine with a reconciliation service is the standard tool for tabular place data.
  • Choose the authority by coverage versus period accuracy; store multiple IDs when useful.
  • Export the judgment column so unmatched names stay explicit and recoverable.
  • Record authority, snapshot date and parameters so the run is reproducible.

Frequently Asked Questions

What does it mean to reconcile place names?

Reconciliation matches each free-text place name in your data to a stable identifier in an authority such as Wikidata, GeoNames or the World Historical Gazetteer. The output is an ID column, not a cleaned-up string.

What is the best tool for reconciling a spreadsheet of places?

OpenRefine with a reconciliation service is the standard choice for tabular data. It clusters variants, queries the authority, and lets you confirm or reject each match interactively.

Should I reconcile against Wikidata or a specialist gazetteer?

Reconcile against Wikidata for broad coverage and a stable hub ID, and against a specialist gazetteer like the World Historical Gazetteer or Pleiades when period accuracy matters. You can store both IDs.

How do I improve match quality before reconciling?

Cluster and normalise spellings first, strip honorifics and qualifiers, and split multi-valued cells. Reconciling clean, deduplicated strings dramatically raises the auto-match rate.

What do I do with names that don't match anything?

Leave them unreconciled with a flag and review them in batch. Forcing a weak match silently corrupts your data; an unmatched flag is a recoverable, honest state.

How do I keep reconciliation reproducible?

Record the authority, its version or snapshot date, the matching parameters, and export both the input and the confirmed ID mapping. Then anyone can re-run or audit the process.