When to Design a gazetteer data model

Design a dedicated gazetteer data model when your places have identity that persists while their names, locations and administrative roles change over time — and skip it when a flat table of name, coordinates and source answers every question you will ever ask. The deciding signal is not the size of your data but whether change over time and typed relationships between places are things you need to query. This article lays out the trade-offs so you can make that call deliberately rather than by default.

When is a flat table genuinely enough?

If every place in your project is a stable modern location, mentioned once, with a single accepted name and one coordinate pair, you do not need a model — you need a tidy CSV.

text

place_id,name,lat,lon,source
P0001,Winchester,51.0632,-1.3080,OS Open Names 2024
P0002,Romsey,50.9890,-1.4980,OS Open Names 2024

Building a relational schema around that is effort you will not recover. The honest test: write down the five questions you actually need to answer. If none of them involve dates, name variants, or relationships between places, stop here.

What signals say you do need a model?

Reach for a real model when any of these appear in your sources:

A place is known by several names across periods (Eboracum, Jorvik, York).
Administrative roles change — a parish becomes part of a district, then a unitary authority.
You must express relationships: contains, succeeded_by, merged_into.
Coordinates themselves are uncertain or change as boundaries move.

Each of these breaks a flat table, because a single row can no longer hold a value that is true only for a span of years.

What does the core of a good model look like?

The non-negotiable idea is to separate the place (a stable identity) from its attestations (time-scoped facts about it). Names, locations and relationships all become attestations with a temporal range.

sql

CREATE TABLE place (
  place_id    TEXT PRIMARY KEY,
  place_type  TEXT          -- settlement, parish, region
);

CREATE TABLE name_attestation (
  place_id  TEXT REFERENCES place(place_id),
  name      TEXT,
  lang      TEXT,
  year_from INTEGER,
  year_to   INTEGER,
  source    TEXT
);

CREATE TABLE relationship (
  subject   TEXT REFERENCES place(place_id),
  predicate TEXT,           -- contains, succeeded_by, merged_into
  object    TEXT REFERENCES place(place_id),
  year_from INTEGER,
  year_to   INTEGER
);

That structure lets "York" change name, location and parent unit across a thousand years without ever losing its identity. Everything else — sources, certainty flags, alternate coordinates — hangs off the same pattern.

Should I adopt Linked Places Format instead?

If sharing or federation is on your roadmap, adopt Linked Places Format (LPF) rather than inventing a schema. LPF already models names, when-spans, geometries and typed relations as JSON, and tools like the World Historical Gazetteer ingest it directly.

Choice	Best when	Cost
Flat CSV	Stable modern places, no time dimension	Lowest; breaks on change
Custom relational	Complex queries, private project	Schema design + maintenance
Linked Places Format	You will share or federate data	Learning curve, JSON tooling
Graph database	Relationship traversal dominates queries	Ops overhead, smaller talent pool

Inventing a bespoke model is justified mainly when LPF cannot express a relationship your sources demand.

Relational or graph — how do I choose?

Most gazetteers run comfortably on relational tables, which are easy to back up, query with SQL, and share. Choose a graph database only when traversal is the main job — chasing successor chains many hops deep, or resolving deep containment hierarchies on every query. If your dominant questions are "what was this called in 1450?" rather than "trace every successor across ten centuries", relational wins on simplicity.

How do I know the model has gone too far?

Complexity has a smell. If routine questions need joins nobody can write without the schema diagram open, or volunteers stop entering data because the form is intimidating, the model has outrun the project. Prune back toward the queries you actually run, and push rare edge cases into a free-text note rather than a new table.

Key Takeaways

The trigger for a model is change-over-time and typed relationships, not data volume.
A flat CSV is the right answer for stable modern places; do not over-engineer it.
Separate stable place identity from time-scoped attestations of name, location and role.
Adopt Linked Places Format when you plan to share or federate, before inventing a schema.
Relational suits most gazetteers; reserve graph databases for traversal-heavy workloads.
A model that blocks contributors or needs unwritable joins has gone too far — simplify.

Frequently Asked Questions

When should I design a custom gazetteer data model?

Design one when your places change identity over time, carry multiple attested names, or need typed relationships an off-the-shelf gazetteer cannot express. If a flat spreadsheet of name plus coordinates answers all your questions, you do not need a model yet.

When is a custom gazetteer model overkill?

For a single project with stable modern places and no temporal change, a CSV with id, name, lat, lon and source is enough. Building a relational or graph model for that case wastes effort you will never recover.

Should I adopt Linked Places Format instead of inventing a schema?

If you intend to share or federate data, adopt Linked Places Format rather than a bespoke schema. Inventing your own model is justified mainly when LPF cannot express a relationship your sources require.

What is the core of a good gazetteer data model?

A stable place identifier separated from its names, with names, locations and relationships each modelled as time-scoped attestations rather than single fixed values. That separation is what lets a place change over time without losing identity.

Relational or graph database for a gazetteer?

Relational handles most gazetteers well and is easier to share. Reach for a graph database only when relationship traversal — successor chains, containment hierarchies — is the dominant query pattern.

How do I know my model is too complex?

If common questions take multi-table joins nobody on the team can write unaided, or contributors avoid entering data because the form is daunting, the model has outrun the project. Simplify toward the queries you actually run.

When is a flat table genuinely enough? ​

What signals say you do need a model? ​

What does the core of a good model look like? ​

Should I adopt Linked Places Format instead? ​

Relational or graph — how do I choose? ​

How do I know the model has gone too far? ​

Key Takeaways ​

Frequently Asked Questions ​

When should I design a custom gazetteer data model? ​

When is a custom gazetteer model overkill? ​

Should I adopt Linked Places Format instead of inventing a schema? ​

What is the core of a good gazetteer data model? ​

Relational or graph database for a gazetteer? ​

How do I know my model is too complex? ​

Related reading ​