Skip to content
Historical Gazetteers & Place Data

A historical gazetteer is a structured list of places with stable identifiers, names that vary over time, locations with honest uncertainty, and time spans. To build one you model the schema first (place vs. name vs. attestation), populate it from your sources, geolocate cautiously, link to an authority such as Pleiades or GeoNames, and publish in Linked Places Format. The schema choices you make in the first hour decide whether the gazetteer stays usable at 5,000 entries.

What problem does a gazetteer actually solve?

Free-text place names in archival data are unsortable, unmappable, and unjoinable. "Salisbury", "New Sarum", and "Sarisburia" name one place; "Newport" names a dozen. A gazetteer gives each real place a stable ID so every variant spelling, every record series, and every map layer can anchor to the same entity. That anchor is what later geocoding, network analysis, and deduplication all depend on.

How should I structure the data model?

Use three related tables, not one wide sheet. A place is the entity; names and attestations hang off it.

sql
CREATE TABLE places (
  place_id      TEXT PRIMARY KEY,   -- e.g. 'wilt-0042'
  place_type    TEXT,               -- parish, manor, hundred, settlement
  lat           REAL,
  lon           REAL,
  uncertainty_m INTEGER,            -- radius in metres
  start_year    INTEGER,
  end_year      INTEGER
);

CREATE TABLE names (
  name_id    INTEGER PRIMARY KEY,
  place_id   TEXT REFERENCES places(place_id),
  toponym    TEXT,
  language   TEXT,                  -- en, la, fr-AN
  start_year INTEGER,
  end_year   INTEGER,
  is_primary INTEGER                -- 1 for the display label
);

Keep a third attestations table linking each name to the exact source — a charter, a census line, a map sheet — with a citation. That is what makes a claim defensible rather than asserted.

Where do I get the place data from?

Work from primary sources outward. Transcribe names directly from your charters, parish registers, or tithe maps rather than copying a modern atlas. Then cross-check against authorities:

SourceBest forWatch out for
PleiadesAncient MediterraneanSparse outside the classical world
GeoNamesModern settlements, coordinatesModern boundaries, anachronistic names
WikidataCross-referencing, multilingual labelsVariable quality, verify each match
Ordnance Survey / IGNNational historical sheetsLicence terms differ by country

How do I assign coordinates honestly?

Never fabricate precision. Pick a point you can justify — a church, a market cross, a parish centroid — and record an uncertainty_m radius beside it. For a deserted medieval village known only from a charter, a 2,000 m radius around a best guess is legitimate; six decimal places of latitude is not.

python
import csv
rows = [
    {"place_id": "wilt-0042", "lat": 51.0688, "lon": -1.7945,
     "uncertainty_m": 500, "place_type": "parish"},
]
with open("places.csv", "w", newline="", encoding="utf-8") as f:
    w = csv.DictWriter(f, fieldnames=list(rows[0].keys()))
    w.writeheader(); w.writerows(rows)

Add a closeMatch column holding the authority URI, for example https://pleiades.stoa.org/places/108794. Prefer closeMatch to exactMatch unless the temporal and spatial extents genuinely coincide — a Roman Sorviodunum is not strictly the same entity as the modern parish that succeeded it.

What are the common pitfalls?

  • Renaming vs. re-identifying. A new spelling is a new name row, not a new place. Mint IDs sparingly.
  • One-row-per-name flattening. It feels easier and destroys your ability to query names.
  • Silent coordinate guessing. Without an uncertainty field, users cannot tell measured from estimated.
  • Skipping citations. An entry without a source is an assertion, not evidence.

Key Takeaways

  • Model places, names, and attestations as separate linked tables from day one.
  • Give every place a stable, opaque internal ID you never reuse or renumber.
  • Record coordinate uncertainty explicitly; avoid false precision.
  • Link to Pleiades, GeoNames, or Wikidata with closeMatch/exactMatch chosen deliberately.
  • Cite the source for every name and location claim.
  • Start with a coherent 50–500 place slice rather than chasing scale.
  • Export to Linked Places Format only once the internal model is stable.

Frequently Asked Questions

What is the minimum data a historical gazetteer entry needs?

At minimum you need a stable internal ID, a primary name, a place type, a coordinate with an uncertainty flag, and a temporal span. Everything else — variant names, citations, links to Pleiades or Wikidata — is enrichment you add later.

Should I store one row per place or one row per name?

Use two tables: a places table keyed by a stable ID, and a names table with many rows per place. Flattening names into one comma-separated column makes disambiguation and search far harder later.

Do I need exact coordinates for every place?

No. Record what you can defend and flag the rest. A parish centroid with a 5 km uncertainty radius is more honest and more useful than a fabricated coordinate to six decimals.

What format should I publish in?

Work internally in SQLite or CSV, then export to Linked Places Format GeoJSON-LD for interoperability with the World Historical Gazetteer and Peripleo when you are ready to share.

How do I handle a place that changed name or jurisdiction?

Keep one place record and attach time-bounded name and attestation rows. Mint a new place ID only when the entity itself is genuinely different, not merely renamed.

How big should my first gazetteer be?

Aim for a coherent slice — one county, parish cluster, or record series — of 50 to 500 places. A small, well-modelled gazetteer beats a large, inconsistent one.