Appearance
When you geocode historical addresses, the core problem is that modern geocoders match against today's street network, so renamed streets, renumbered houses and demolished terraces silently fail or land in the wrong place. The fix is a layered workflow: clean and contextualise the address strings, run an automated pass to clear the easy 50-80%, then resolve the stubborn remainder against georeferenced period maps and street directories — recording the method for every point. This guide diagnoses the recurring failures and gives fixes that hold.
Why do my addresses fail to match at all?
Run a single test address through your geocoder and read the raw response before blaming the data. The usual root causes, in order of frequency:
- Missing context — "12 High Street" with no town. Append town, county, country.
- Historical street name — the street was renamed or never existed in OSM.
- Dirty string — OCR noise, abbreviations ("St.", "Rd."), trailing occupier names from a directory.
- Wrong geocoder for the job — Nominatim is fine for places, weak on house numbers.
Fix the cheap causes first. A cleaning pass often lifts the match rate 15-20 points before you touch a map.
How do I clean address strings reliably?
Do this as a reproducible step, not by hand. A small Python pass with pandas standardises the worst offenders:
python
import pandas as pd
df = pd.read_csv("directory_1881.csv")
repl = {r"\bSt\.?\b": "Street", r"\bRd\.?\b": "Road",
r"\bPl\.?\b": "Place", r"\s+": " "}
s = df["address"].str.strip()
for pat, rep in repl.items():
s = s.str.replace(pat, rep, regex=True)
# strip the occupier name that trade directories prefix
s = s.str.replace(r"^[A-Z][a-z]+,\s*", "", regex=True)
df["address_clean"] = s + ", York, North Yorkshire, UK"
df.to_csv("directory_1881_clean.csv", index=False)Adding the town and county to every row is the single highest-impact fix for "wrong city" errors.
Why did addresses land in the wrong city or country?
This is an ambiguity failure. "Newcastle, High Street" matches Newcastle upon Tyne, Newcastle-under-Lyme and several abroad. Two fixes:
- Append administrative context (county + country), as above.
- Constrain the search envelope. Most geocoders accept a bounding box; restrict it to your region so out-of-area hits are impossible.
text
Nominatim: &viewbox=-1.30,54.05,-0.90,53.90&bounded=1After bounding, re-run and the scatter of distant points collapses.
How do I place streets that no longer exist?
No automated geocoder can find a demolished or renamed street — the authority is the period map. Workflow:
- Georeference the relevant historical map sheet.
- Locate the street on it; digitise its centreline.
- Snap the unmatched address points to that line, interpolating by house number where the directory gives a range.
- Set
match_method = map-tracedon every such point.
This is slow but it is the only honest way, and it is where most of the real historical value lives.
Reading match quality honestly
A geocoder's confidence score is a triage tool, not a verdict. Build a status field and validate by eye:
| match_method | meaning | trust |
|---|---|---|
| exact | matched a current address point | high — but verify renamed streets |
| interpolated | placed along a street by number | medium |
| map-traced | snapped to a georeferenced period map | medium-high, source-backed |
| centroid | fell back to town/parish centre | low — flag clearly |
Never let a centroid fallback look identical to a surveyed point on the final map.
What match rate should I expect, and when do I stop?
Historical batches typically reach 50-80% automated; modern data reaches 95%+. The unmatched residual is the work, not a failure. Stop when remaining records are either truly unlocatable (record them as centroid with a note) or would cost more to resolve than they add. Always export the method and confidence columns so a reader can weight every point.
Key Takeaways
- Modern geocoders match today's network; historical addresses need cleaning, context and period maps.
- Append town, county and country to kill "wrong city" ambiguity errors.
- Constrain the geocoder's bounding box to your region.
- Clean strings reproducibly with a scripted pass before any map work.
- Place vanished streets by tracing a georeferenced period map and snapping points.
- Record match_method and confidence so approximate points are never mistaken for surveyed ones.
- Expect 50-80% automated; treat the residual as the real task.
Frequently Asked Questions
Why does a modern geocoder fail on historical addresses?
Modern geocoders match against today's street network, so renamed streets, renumbered houses, demolished terraces and former place names return no match or land at the wrong point. Historical addresses need a historical gazetteer or a georeferenced map, not Google or Nominatim alone.
My addresses geocode to the wrong country or city — why?
Almost always missing or ambiguous context. 'Newcastle, High Street' matches several Newcastles; append the county and country, restrict the geocoder's bounding box, and the false hits disappear.
How do I geocode a street that no longer exists?
Use a georeferenced historical map to locate the street, digitise it, and snap your address points to it manually. No automated geocoder can place a demolished street; the period map is your authority.
What is a reasonable match rate for historical addresses?
Expect 50-80% automated before manual work, far below the 95%+ of modern data. Treat the unmatched residual as the real job, not a failure, and resolve it against period maps and directories.
Should I trust the confidence score a geocoder returns?
Treat it as a triage signal, not truth. A high score on a renamed street can still be wrong; always validate a sample against a period map, and flag every record with the method used to place it.
How do I record geocoding uncertainty?
Add columns for match method (exact, interpolated, map-traced, centroid), positional confidence, and source. Never let an approximate point look identical to a surveyed one in the final layer.