Skip to content
R for the Humanities

To visualise history with ggplot2 well, build each chart from a tidy data frame, map variables to aesthetics deliberately, plot rates rather than raw counts when populations change, make uncertainty visible, and lock a single house theme so every figure in a collection matches. ggplot2's layered grammar lets you do all of this declaratively, which is exactly what makes historical figures consistent and defensible.

How does the grammar of graphics help historians?

ggplot2 builds a plot in layers: data, aesthetic mappings, geoms, scales, and theme. You describe what maps to what, not pixel positions. A minimal time series of burials:

r
library(ggplot2)
ggplot(burials, aes(x = year, y = count)) +
  geom_col(fill = "grey30") +
  labs(title = "Burials, St Mary's parish, 1700-1799",
       x = NULL, y = "Burials per year",
       caption = "Source: Parish register, transcribed 2024")

Because each component is explicit, you can swap geom_col for geom_line, or add a smoother, without rebuilding from scratch.

Should you plot counts or rates?

This is the most consequential decision in historical visualisation. If the underlying population grew, a rising count of, say, criminal convictions may mean nothing more than more people. Normalise:

r
convictions |>
  mutate(rate_per_1000 = (n / population) * 1000) |>
  ggplot(aes(year, rate_per_1000)) +
  geom_line()

Use raw counts only for genuinely closed corpora, such as a fixed bundle of correspondence, where there is no population at risk to standardise against.

How do you make historical date axes behave?

ggplot2 will not place decades sensibly if your dates are text. Convert first, then control the breaks:

r
events |>
  mutate(date = lubridate::ymd(date)) |>
  ggplot(aes(date, value)) +
  geom_line() +
  scale_x_date(date_breaks = "20 years", date_labels = "%Y")

For year-only data, treat the year as numeric and use scale_x_continuous(breaks = seq(1700, 1900, 50)).

How do you show uncertainty honestly?

Historical figures are often estimates. Visual weight should reflect confidence:

r
ggplot(estimates, aes(year, mid)) +
  geom_ribbon(aes(ymin = low, ymax = high), fill = "grey80") +
  geom_line(colour = "grey20")

Other tactics: lower alpha on interpolated points, dash reconstructed segments with linetype, or facet attested versus modelled data. The aim is that a reader never mistakes an educated guess for a record.

How do you keep colour accessible?

Colour-code by category only with a tested palette, and never let colour carry the meaning alone:

r
ggplot(df, aes(year, value, colour = region, linetype = region)) +
  geom_line() +
  scale_colour_viridis_d()

Viridis is perceptually uniform and colourblind-safe. Adding linetype means the chart survives greyscale printing, which still matters for journal figures.

How do you enforce a house style?

Set the theme once and reuse it everywhere:

r
theme_set(
  theme_minimal(base_size = 12, base_family = "serif") +
    theme(plot.title = element_text(face = "bold"),
          plot.caption = element_text(colour = "grey40", hjust = 0))
)

Wrap recurring labelling in a small helper so captions, source lines and sizing never drift between figures one and forty.

A pre-export checklist worth keeping:

CheckWhy it matters
Rate vs count chosen deliberatelyAvoids population-growth artefacts
Axis types are Date/numericDecades and centuries land correctly
Source cited in captionProvenance travels with the figure
Palette colourblind-safeInclusive and print-robust
Uncertainty visibleNo estimate disguised as fact
Saved with ggsave() at fixed sizeReproducible dimensions and DPI

Key Takeaways

  • Build charts from tidy data and explicit aesthetic mappings.
  • Plot rates, not raw counts, whenever the population changes over time.
  • Convert dates to real Date/numeric types before setting axis breaks.
  • Make uncertainty visible with ribbons, alpha or line types.
  • Use viridis plus a second non-colour channel for accessibility.
  • Lock a house theme with theme_set() for collection-wide consistency.
  • Export with ggsave() so figure dimensions and DPI are reproducible.

Frequently Asked Questions

Should I plot counts or rates for historical populations?

Plot rates when the underlying population changes over time, otherwise growth in raw counts just reflects more people. Counts are fine for closed collections like a fixed set of letters, but normalise to a denominator for demographic claims.

How do I show uncertainty in a ggplot2 chart?

Use geom_ribbon() or geom_errorbar() for ranges, lower the alpha on uncertain points, or shade reconstructed periods. Never present an estimated figure with the same visual weight as a directly attested one.

Why do my historical date axes look wrong?

ggplot2 needs a real Date or numeric type, not a character string. Convert with lubridate first, then control breaks with scale_x_date() or scale_x_continuous() so decades and centuries land on sensible ticks.

How do I make charts colourblind-safe?

Use scale_colour_viridis_d() or a tested palette, and never rely on colour alone. Add direct labels, line types or facets so the chart still reads in greyscale or for colourblind viewers.

How do I keep a consistent house style across many figures?

Define a theme once with theme_set() and a small wrapper function, then reuse it for every figure. This keeps fonts, sizing and captions uniform across a whole collection or publication.