Appearance
Reach for the sf package when your historical mapping needs to be scripted, reproducible and joined to tabular analysis in R, and when your places already have or can be geocoded to coordinates. Skip sf in favour of QGIS when the work is mostly interactive digitising, georeferencing map scans, or done by collaborators who do not code. The deciding signal is whether mapping is one step in a larger R pipeline, or the main task in itself.
When does sf fit your project?
sf treats spatial features as ordinary data frames with a geometry column, so every tidyverse verb you already use applies. That makes it ideal when:
- Your data lives in R and you want maps generated from the same script as your tables and charts.
- You need to join attributes (census, trade, mortality) to places repeatedly.
- Reproducibility matters: a reviewer should regenerate the map from code.
r
library(sf); library(tidyverse)
ports <- read_csv("data/ports.csv") |> # lon, lat columns
st_as_sf(coords = c("lon", "lat"), crs = 4326)
ggplot(ports) + geom_sf(aes(size = tonnage)) + theme_void()When should you not use sf?
Be honest about the costs. Prefer another tool when:
- You are georeferencing a scanned historical map. QGIS's georeferencer is far better suited.
- The work is heavy manual digitising of features by tracing. Again, QGIS.
- Collaborators cannot read R, so a click-driven GUI lowers friction.
- You only need one quick static map and never again. Datawrapper or QGIS may be faster.
sf's strength is the pipeline; if there is no pipeline, its overhead buys you little.
How do you decide between sf and QGIS?
| Signal | Lean sf | Lean QGIS |
|---|---|---|
| Mapping is part of a coded analysis | Yes | No |
| Need to repeat over many datasets | Yes | No |
| Georeferencing raster map scans | No | Yes |
| Manual feature digitising | No | Yes |
| Non-coding collaborators | No | Yes |
| Output must be reproducible from script | Yes | Sometimes |
Many projects use both: digitise and georeference in QGIS, export to GeoPackage, then analyse and render reproducibly in sf.
How do coordinate systems trip up historians?
The most common failure is leaving the CRS undefined. Set it explicitly, and project before measuring:
r
ports_bng <- st_transform(ports, 27700) # British National Grid, metres
st_distance(ports_bng[1, ], ports_bng[2, ]) # now in real metresMeasuring distance or area in raw lat/long degrees gives meaningless numbers. For British material, EPSG:27700 is the usual projected target; choose the appropriate national grid elsewhere.
How does sf cope with boundaries that moved?
sf has no concept of time. You supply the right boundary layer for each period yourself, then join:
r
counties_1851 <- st_read("data/counties-1851.gpkg")
mortality |>
left_join(counties_1851, by = "county") |>
st_as_sf() |>
ggplot() + geom_sf(aes(fill = death_rate))If your boundaries shift across the study span, you load period-specific polygons (from sources like the Historical GIS boundary datasets) and map each slice against the boundaries valid for its date.
What if you only have place names?
sf cannot map a name. Geocode first against a historical gazetteer to obtain coordinates, then convert to sf with st_as_sf(). If geocoding is itself most of the work, your real task is gazetteer reconciliation, not sf mapping, which is a useful signal about where to spend effort.
Key Takeaways
- Use sf when mapping is part of a scripted, reproducible R pipeline.
- Prefer QGIS for georeferencing scans and heavy manual digitising.
- Always set the CRS explicitly; project to metres before measuring distance.
- sf is time-blind: supply period-specific boundary layers for changing borders.
- Geocode place names against a gazetteer before sf can map them.
- A common pattern is QGIS for prep, sf for reproducible analysis and rendering.
- If geocoding dominates, the real task is reconciliation, not sf.
Frequently Asked Questions
When is the sf package the right choice over QGIS?
Choose sf when your mapping is part of a scripted, reproducible R analysis and you already work in the tidyverse. Choose QGIS for heavy interactive digitising, georeferencing scans, or when collaborators are not coders.
What coordinate system should historical points use?
Store data in a clearly recorded CRS, commonly EPSG:4326 for lat/long, and transform to a projected CRS like British National Grid (EPSG:27700) for distance and area work. Always set st_crs() explicitly; never leave it NA.
Can sf handle changing historical boundaries?
Yes, but you must supply time-specific boundary layers yourself. sf has no notion of time; you load the correct historical polygon set for each period and join your data to the boundaries valid for that date.
Is sf suitable if my places are only names, not coordinates?
Not directly. You first geocode the place names against a gazetteer to obtain coordinates, then bring those points into sf. Without coordinates there is nothing for sf to map.
How heavy is sf to install and run?
sf depends on GDAL, GEOS and PROJ, which the binary packages bundle on Windows and macOS. On Linux you install those system libraries first. Once installed it handles typical historical datasets of thousands of features easily.