Skip to content
Historical Data Visualisation

To visualise a historical time series well, plot a rate rather than a raw count whenever your denominator changes, keep the time axis continuous and date-typed so missing years stay visible, and overlay a light smoothing line on top of the raw data. Those three decisions fix most of the misreadings that plague historical charts. The rest of this guide turns that into a repeatable workflow.

What makes historical time series different?

Unlike sensor or financial data, historical series are shaped by archival survival. A spike in recorded witchcraft trials may reflect a clerk who kept better registers, not a wave of accusations. Three structural problems recur:

  • Changing denominators — population, literacy, or the number of surviving parishes all drift.
  • Irregular sampling — annual data with gaps, or a mix of decadal censuses and continuous registers.
  • Calendar discontinuities — the 1752 Gregorian switch in Britain, regnal years, or fiscal years that start in March.

Decide how you will handle each of these before you draw anything.

Counts or rates — which should you plot?

If the population behind your data changes, plot a rate. Burials in a town of 2,000 are not comparable to burials in the same town once it reaches 20,000. Compute the rate against the best denominator you have and document it in the caption.

python
import pandas as pd

df = pd.read_csv("burials.csv", parse_dates=["year"])
df = df.set_index("year")
df["rate_per_1000"] = df["burials"] / df["population"] * 1000

When no denominator exists, index to a base year (value / value_at_1800 * 100) and say so on the axis.

How do I handle missing and irregular years?

Reindex onto a complete date range so gaps are explicit rather than smoothed over:

python
full = pd.date_range(df.index.min(), df.index.max(), freq="YS")
df = df.reindex(full)          # missing years become NaN
ax = df["rate_per_1000"].plot(marker="o")
ax.set_ylabel("Burials per 1,000")

matplotlib breaks the line at NaN, which is exactly what you want: a visible gap signals absence, not zero.

Smoothing without hiding the signal

Annual historical counts are noisy. A centred rolling mean reveals the trend while preserving honesty if you keep the raw series in view.

python
df["smooth"] = df["rate_per_1000"].rolling(window=11, center=True).mean()
ax = df["rate_per_1000"].plot(alpha=0.3, label="annual")
df["smooth"].plot(ax=ax, linewidth=2, label="11-yr mean")

Odd windows (5, 11, 21) keep the mean aligned to a real year. State the window in the caption — readers cannot reverse-engineer it.

Which chart and axis choices work?

DecisionDefault for historyWhy
Chart typeLine for continuous series, step for stock-on-a-dateSteps avoid implying interpolation between censuses
Y-axis baselineZero for counts/rates, non-zero allowed for indicesTruncated axes exaggerate change
X-axisReal date type, equal time spacingPrevents decades being squeezed by missing rows
Multiple seriesDirect labels at line endsBeats a legend the eye must hop to

For comparing several places, prefer small multiples (one panel per region) over a tangle of coloured lines.

Annotating the historical context

A bare line tells the reader nothing about why. Add light vertical markers for the events that frame the data — a plague year, an enclosure act, a boundary change — but keep them subordinate to the data with thin grey lines and small type. Two or three annotations clarify; ten compete with the trend.

Key Takeaways

  • Plot rates, not counts, whenever the underlying population changes.
  • Reindex to a full date range so missing years render as gaps, never as joined lines.
  • Smooth with a centred odd-window rolling mean and always keep the raw series visible.
  • Start counts and rates at a zero baseline; only indices justify a truncated axis.
  • Use small multiples instead of spaghetti when comparing many series.
  • Add sparse, subordinate annotations for the historical events that shaped the data.
  • Always caption the denominator, smoothing window and source so the chart is reproducible.

Frequently Asked Questions

Should I plot counts or rates for historical time series?

Plot rates (per 1,000 population or per record) whenever the denominator changes over the period, because raw counts often just track the size of the surviving archive rather than the phenomenon you care about.

How do I handle irregular or missing years in a historical series?

Keep the time axis continuous and date-typed so gaps render as visible breaks; never let your tool silently join points across a missing decade, which implies data you do not have.

What is the best baseline for a historical line chart?

For absolute quantities start the y-axis at zero; for index series or rates of change a non-zero baseline is acceptable, but label it explicitly so readers are not misled by a truncated axis.

How should I smooth noisy historical counts?

Use a centred rolling mean (commonly 5 or 11 years for annual data) and always show the raw series faintly underneath so smoothing does not hide volatility.

Which tool should a historian start with for time-series charts?

Datawrapper for fast publishable charts, pandas plus matplotlib in Python for reproducible analysis, and ggplot2 in R if your workflow is already R-based.