Appearance
To plot historical data with matplotlib reliably, build figures with the object-oriented API (fig, ax = plt.subplots()), pin your styling explicitly, and treat every chart as a reproducible artefact generated by a saved script rather than an interactive session. The difference between a one-off screenshot and a defensible figure is almost entirely about discipline: explicit axes, documented data provenance, and exported vector output.
This guide collects the practices that keep a whole collection of historical charts consistent, from census time series to parish-register counts.
Why use the object-oriented interface, not pyplot?
The plt.plot() style relies on a hidden "current axes" that becomes ambiguous the moment you add a second panel. The object-oriented approach makes every element explicit and is trivial to wrap in a function you can reuse across dozens of sources.
python
import matplotlib.pyplot as plt
fig, ax = plt.subplots(figsize=(8, 4.5))
ax.plot(df["year"], df["baptisms"], label="Baptisms")
ax.plot(df["year"], df["burials"], label="Burials")
ax.set_xlabel("Year")
ax.set_ylabel("Recorded events")
ax.set_title("St Mary's parish, 1650-1750")
ax.legend()
fig.savefig("parish_events.pdf", bbox_inches="tight")How do you handle pre-1677 historical dates?
Matplotlib's date machinery leans on NumPy datetime64, which through the pandas path is capped at 1677-09-21. For medieval or early-modern series this silently fails. Two robust options:
- Plot the year as an integer on a numeric x-axis and format ticks yourself. This is the simplest fix for annual data.
- Use Python's stdlib
datetimeobjects (which reach back to year 1) directly, which matplotlib accepts via its owndatesmodule, bypassing the nanosecond ceiling.
For sub-annual early dates, store a Julian-day float and relabel ticks with a FuncFormatter.
How should I style charts consistently across a project?
Define one style and apply it everywhere. Avoid per-chart manual tweaks — they are the main source of drift.
python
plt.style.use("seaborn-v0_8-whitegrid")
plt.rcParams.update({
"font.size": 11,
"axes.titlesize": 13,
"figure.dpi": 110,
"savefig.dpi": 300,
})Keep this block in a small plotstyle.py module that every figure script imports.
Choosing the right chart for the source
| Source type | Recommended chart | Avoid |
|---|---|---|
| Annual counts (baptisms, prices) | Line chart | Bar per year (too dense) |
| Categorical shares (occupations) | Horizontal bar | Pie (hard to compare) |
| Distribution of ages at death | Histogram or KDE | Single mean |
| Sparse, irregular events | Stem / scatter | Smoothed line |
| Two correlated series | Dual-panel, shared x | Twin y-axis (misleading) |
Twin y-axes are popular and dangerous: they let any two series appear correlated by accident. Prefer stacked subplots with sharex=True.
How do I represent uncertainty honestly?
Historical counts are rarely complete. Make the gaps visible:
python
ax.fill_between(df["year"], df["low"], df["high"], alpha=0.25, label="Plausible range")
ax.plot(df["year"], df["estimate"], color="black", label="Central estimate")State in the caption exactly what the band means — under-registration, a smoothing window, or a sampling interval. A shaded band with no explanation is worse than none.
Exporting figures you can defend
Save vector PDF or SVG for line work and maps; reserve PNG at 300 DPI for raster-heavy images. Always pass bbox_inches="tight" to stop matplotlib clipping rotated tick labels. Commit the generating script next to the image so any reviewer can regenerate it byte-for-byte.
Key Takeaways
- Always use
fig, ax = plt.subplots()— explicit axes scale to multi-panel figures and stay reproducible. - Pre-1677 dates break the default
datetime64path; plot integer years or stdlibdatetimeinstead. - Centralise styling in one
rcParamsblock or style file; never tweak charts by hand one at a time. - Match the chart to the source; avoid twin y-axes and per-year bars.
- Show uncertainty with
fill_betweenorerrorbar, and explain the band in the caption. - Export vector PDF/SVG at 300 DPI with
bbox_inches="tight", and keep the generating script. - Pin your matplotlib version so a figure rendered today renders identically next year.
Frequently Asked Questions
Should I use the pyplot or object-oriented matplotlib interface?
Use the object-oriented interface (fig, ax = plt.subplots()) for any figure you intend to reuse or publish. The plt.plot() shortcut hides state that becomes ambiguous once you have several axes, and it makes scripts harder to reproduce.
How do I handle dates before 1677 in matplotlib?
Matplotlib's default datetime64 path inherits pandas' nanosecond limit of 1677-2262. For earlier dates, plot the year as a plain integer on a numeric axis, or convert to a Julian-day float and label ticks manually.
What DPI should I export historical figures at?
Save at 300 DPI for print and as vector PDF or SVG where possible, since line charts and maps stay crisp at any zoom. Use bbox_inches="tight" so labels are not clipped.
How do I show uncertainty in counts from incomplete records?
Use shaded bands with ax.fill_between() for ranges, or error bars with ax.errorbar(). Always state in the caption what the band represents — sampling error, source gaps, or an estimate.
Why do my axis labels overlap when plotting many years?
Limit ticks with ax.xaxis.set_major_locator(MaxNLocator(integer=True)) or a YearLocator, and rotate labels with fig.autofmt_xdate(). Plotting every year as a tick is almost never readable.
Can I make matplotlib output reproducible?
Yes. Pin your matplotlib version, set an explicit style with plt.style.use(), fix figure size and DPI, and avoid interactive tweaks. Save the generating script alongside the image.