Skip to content
GIS for History

When a historical railway network misbehaves, the cause is usually one of four faults: disconnected line topology, missing time attributes, off-line stations, or a CRS/georeferencing error. Work through them in that order. The single most common complaint — "my routing finds no path" — is nearly always topology: segments that look connected on screen but do not share exact endpoints. Here is the fault-tree.

Why does routing fail across my network?

Routing needs a connected graph. If two segments meet visually but their endpoints differ by even a fraction of a metre, the network is broken at that node. Diagnose and fix with a snapping/topology pass:

python
import geopandas as gpd
from shapely.ops import snap, unary_union

rail = gpd.read_file("railways_1890.shp")
# snap endpoints together within 1 m, then re-node
merged = unary_union(rail.geometry)
snapped = snap(merged, merged, tolerance=1.0)

In QGIS, run Vector > Topology Checker with a "must not have dangles" rule, then Snap geometries to layer. Only once segments truly touch will routing find paths.

How do I make the network time-aware?

A railway map is not one network — it is a different network every year. Lines open, double, close and lift. Store dates per segment and slice by year:

python
rail["opened"] = rail["opened"].astype("Int64")
rail["closed"] = rail["closed"].astype("Int64")

def network_in(year):
    return rail[(rail["opened"] <= year) &
                (rail["closed"].isna() | (rail["closed"] > year))]

net_1880 = network_in(1880)

Without opened/closed fields you cannot honestly answer "what could you reach by rail in 1880", which is usually the whole point.

Why don't my stations sit on the track?

Stations digitised as separate points drift off the line, breaking station-to-station distance and connectivity. Snap them on:

python
stations["geometry"] = stations.geometry.apply(
    lambda p: rail_lines.interpolate(rail_lines.project(p))
)

This projects each station onto the nearest point of the line so it lies exactly on the track.

How should junctions and shared track be modelled?

Two rules prevent most network corruption:

  • Split lines at every junction, so each segment is a clean edge between two nodes.
  • Share, don't duplicate, joint track — where two companies ran over the same metals, store one geometry referenced by both routes.

Duplicated overlapping lines create phantom parallel edges that inflate connectivity and ruin centrality measures.

A diagnostic table

SymptomRoot causeFix
Routing finds no pathEndpoints not snappedSnap + topology check
Reachability looks wrong for a yearNo date attributesAdd opened/closed, filter
Station ignored by joinsPoint off the lineProject station onto line
Inflated connectivityDuplicated shared trackOne geometry, two references
Whole line in the wrong placeCRS / georeferencing errorVerify CRS and residuals

What if a whole line is in the wrong place?

A segment displaced wholesale is rarely a digitising slip — it is a coordinate-system or georeferencing fault in the underlying map. Before re-tracing, confirm the layer's CRS matches the project and check the georeferencing residuals of the scanned map you digitised from. Fixing geometry that was correct all along just hides the real bug.

Turn the network into a graph

For connectivity and centrality, convert the cleaned lines to a graph and let NetworkX do the analysis:

python
import networkx as nx
G = nx.Graph()
for _, row in net_1880.iterrows():
    coords = list(row.geometry.coords)
    G.add_edge(coords[0], coords[-1], length=row.geometry.length)
print(nx.number_connected_components(G))   # >1 means gaps remain

More than one connected component is a fast signal that topology issues survive.

Key Takeaways

  • Diagnose in order: topology → missing dates → off-line stations → CRS errors.
  • "No route found" almost always means endpoints are not snapped.
  • Make the network time-aware with opened/closed fields and slice by year.
  • Project stations onto the line so joins and along-line distances work.
  • Split lines at junctions; share joint track instead of duplicating it.
  • A whole line displaced is a CRS/georeferencing fault, not a digitising slip.
  • Convert to a NetworkX graph; more than one component reveals lingering gaps.

Frequently Asked Questions

Why does my railway network have gaps that break routing?

Line segments do not share exact endpoints, so the network is topologically disconnected. Snap endpoints within a small tolerance and run a topology check; routing needs lines that actually touch at junctions.

How do I handle a railway that opened and closed at different dates?

Add opened and closed date fields to each segment, then filter by year to build a snapshot. A line valid in 1880 may be gone by 1970, so the network must be time-aware, not a single static layer.

Why do my stations not sit on the line?

Stations digitised separately drift off the track. Snap station points to the nearest line, or generate them from the line geometry, so spatial joins and distance-along-line calculations work.

How should I model junctions and shared track?

Split lines at every junction so each segment is an edge between two nodes, and give shared track a single geometry referenced by both routes rather than duplicating it. Duplicated overlapping lines corrupt network analysis.

What causes a railway to appear in the wrong place entirely?

Almost always a CRS or georeferencing error in the source map. Confirm the layer's coordinate system and the georeferencing residuals before assuming the digitised geometry is wrong.

Which tools are best for historical railway networks?

QGIS with the Topology Checker and the network analysis tools, PostGIS with pgRouting for routing at scale, and Python (GeoPandas plus NetworkX) for converting the lines into a graph for connectivity analysis.