Skip to content
Historical Data Visualisation

The fastest way to avoid misleading historical charts is to audit four things before you publish: the axis baseline, whether you are showing counts or rates, how gaps are drawn, and whether the geography matches the period. The overwhelming majority of accidentally deceptive history charts fail one of those four checks — and each has a concrete fix.

This is a troubleshooting guide: each section names a symptom, the root cause, and the repair.

Why does my trend look dramatic but feel wrong?

Symptom: a bar chart shows a "collapse" or "explosion" that contemporaries would not have recognised. Root cause: a truncated y-axis. Because a bar's length encodes its value, starting the axis at 95 instead of 0 turns a 3% change into a visual cliff.

Fix: for bar charts, always anchor the y-axis at zero. If the interesting variation is genuinely small, switch encoding — use a line chart (which legitimately encodes slope and position), or plot the change/difference directly. State the axis range in the caption either way.

Am I plotting counts when I should plot rates?

Symptom: "recorded crime tripled over the century" or "more letters survive from the 1700s". Root cause: the denominator changed. Population grew, archives preserved more recent material, registration improved. A raw count conflates the phenomenon with its recording.

Fix: normalise. Convert to a rate per 1,000 people, per surviving record, or per known coverage. In pandas:

python
# Wrong: raw count over time implies a real trend
df["count"].plot()

# Better: normalise by the changing denominator
df["rate_per_1000"] = df["count"] / df["population"] * 1000
df["rate_per_1000"].plot()

Where coverage itself is uncertain, plot the count and the coverage so the reader can judge.

How do I stop a chart inventing data in the gaps?

Symptom: a smooth line or filled choropleth runs straight through a period or region with no records. Root cause: the plotting library interpolates by default. The reader cannot tell the difference between "value was steady" and "we have nothing".

Fix: break the series at gaps and mark unsampled areas explicitly. With matplotlib, insert NaN rows so the line breaks rather than joining across the void:

python
import numpy as np
df.loc[df["year"].between(1645, 1660), "value"] = np.nan
df.set_index("year")["value"].plot()  # line now breaks over the gap

For maps, use a distinct "no data" fill (often a hatch pattern), never the lightest colour in the sequence, which reads as a real low value.

Do my colours or geography lie?

Two quieter traps:

SymptomRoot causeFix
Regions look ranked by intensityChoropleth not normalised by area/populationUse rate, or a dot-density / proportional symbol map
Sequential data looks categoricalRainbow palette implies false breaksUse a perceptually uniform sequential ramp (viridis)
Map attributes data to wrong unitsModern boundaries over historical countsUse period-correct or a single stable geography
Two series look correlatedDual y-axes scaled to alignIndex both to a base year on one axis

How do I check a chart before publishing?

Run a four-line audit: (1) does the bar baseline start at zero; (2) is it a rate or a count, and is that the right one; (3) are gaps and "no data" visibly distinct from real lows; (4) does the geography and the axis range match the period. If you cannot answer all four from the chart alone, the reader cannot either — add annotation or fix the encoding.

Key Takeaways

  • Bar charts must start the value axis at zero; line charts may not, but always label the range.
  • Plot rates, not raw counts, whenever the underlying population or record coverage changed over the period.
  • Draw gaps and "no data" as explicit breaks or hatching so the chart never interpolates across the void.
  • Avoid dual y-axes; index series to a common base year on a single axis instead.
  • Use perceptually uniform sequential palettes (viridis) for ordered data, not rainbow ramps.
  • Match maps to period-correct boundaries, or aggregate to one stable geography.
  • Caption every non-obvious choice — baseline, denominator, gap handling — so the reader can audit it.

Frequently Asked Questions

What is the most common way historical charts mislead?

Truncated and inconsistent axes are the most common culprit, closely followed by plotting raw counts when a rate is meant. Both exaggerate or invent trends that the underlying records do not support.

Should a bar chart's y-axis always start at zero?

Yes — for bar charts the bar length encodes the value, so a non-zero baseline distorts the comparison and is genuinely misleading. Line charts may use a non-zero axis because they encode position and slope, but you should label the range clearly.

How do I avoid implying data exists where there is none?

Draw registration gaps, missing years and unsampled regions as explicit breaks or hatched areas rather than letting a line or choropleth interpolate across them. A smooth line over a five-year gap silently invents data.

Why is plotting raw counts over a long period misleading?

Because the underlying population usually changed. A rising count of recorded crimes may simply reflect a growing population or better record-keeping; converting to a rate per 1,000 people, or normalising by record coverage, removes that confound.

Are dual y-axes ever acceptable in historical charts?

Rarely. Dual axes let you align two unrelated series arbitrarily to imply a correlation that the scaling invented. Prefer indexing both series to a common base year on a single axis, or use small multiples.

How should I handle changing administrative boundaries on a map?

Map each period to the boundaries that actually existed at that time, or aggregate everything to a single stable geography. Draping modern boundaries over historical counts attributes data to units that did not yet exist.