How to Visualise cultural trends

To visualise cultural trends, normalise your counts into relative frequencies, bin them by a sensible time unit, plot a line with an honest uncertainty band, and always show corpus density (documents per bin) alongside so readers can spot composition artefacts. The most common mistake — plotting raw counts — produces charts that track the survival of text, not culture. Here is a clean step-by-step.

Step 1: Normalise before you plot anything

Raw word or document counts rise simply because more text survives from recent decades. Convert to a rate:

python

import pandas as pd

df = pd.read_parquet("hits.parquet")        # one row per occurrence, with 'year'
totals = pd.read_csv("tokens_per_year.csv") # year, total_tokens

counts = df.groupby("year").size().rename("hits")
trend = (counts / totals.set_index("year")["total_tokens"] * 1_000_000)
trend.name = "per_million_tokens"

Now the y-axis is "occurrences per million tokens" — comparable across periods of wildly different corpus size.

Step 2: Choose a time bin that matches your data

Yearly bins look precise but are noisy when sparse. If most years have under ~20 documents, bin by decade. The rule: each bin should hold enough documents that its estimate is stable. Show the binning choice in the caption.

How do I keep smoothing honest?

A rolling mean tames jitter but can manufacture smooth "trends" from noise. The honest pattern is: smooth and reveal.

python

ax = trend.rolling(window=3, center=True).mean().plot(label="3-bin rolling mean")
trend.plot(ax=ax, alpha=0.3, marker="o", linestyle="none", label="raw")
ax.legend()

State the window length in words. A reader who can see the raw points knows how much to trust the line.

Step 3: Plot corpus density next to the trend

This single habit prevents most false discoveries. A spike in 1847 is exciting until you see that a 900-page periodical joined the corpus that year. Add a second panel:

Panel	Shows	Catches
Trend (per-million)	Apparent cultural change	The "finding"
Documents per bin	Corpus composition	Artefacts masquerading as findings

If the trend spike coincides with a density spike, treat it as suspect until proven otherwise.

Which chart type should I use?

One or a few series → line chart, direct-labelled at the line ends rather than a legend.
Many categories → small multiples (faceted lines on a shared axis), not one crowded chart.
Composition over time → a stacked area only if absolute totals matter; for comparing individual trends, lines beat stacks every time.

Avoid dual y-axes — they let you imply correlations that aren't there.

Step 4: Show uncertainty

A bare line claims a precision your data rarely has. Bootstrap-resample documents within each bin and draw the 95% band:

python

import numpy as np

def band(values, n=1000):
    boots = [np.mean(np.random.choice(values, len(values), replace=True))
             for _ in range(n)]
    return np.percentile(boots, [2.5, 97.5])

A trend whose band overlaps a flat line across the whole range is not a trend.

Step 5: Label like a publication, not a notebook

Title states the finding ("Mentions of 'machinery' per million words, 1700-1900"), axes are labelled with units, the source and corpus size sit in a caption, and colour is colour-blind safe. These finishing steps are what separate a credible figure from a screenshot.

Key Takeaways

Normalise to relative frequency (per million tokens) before plotting — raw counts track text survival.
Bin by a unit dense enough for stable estimates; state the binning in the caption.
Smooth and reveal: show the rolling mean and the raw points together.
Always plot documents-per-bin beside the trend to catch corpus-composition artefacts.
Use lines for few series, small multiples for many; avoid dual axes and comparison-by-stacked-area.
Add a bootstrap confidence band so readers can judge the trend's reliability.
Finish with publication-quality labels, units, source and accessible colour.

Frequently Asked Questions

Should I plot raw counts or relative frequencies?

Almost always relative frequencies (per million words, or as a share). Raw counts mostly track how much text survives from each period, not genuine cultural change.

How do I smooth a noisy trend without lying?

A rolling mean over a few time bins is fine, but always show the underlying points or a confidence band too, and state the window. Heavy smoothing hides the noise that tells readers how much to trust the line.

Why does my trend spike in one specific year?

Usually a corpus composition change — a big document or source joined that year — not a real cultural shift. Plot documents-per-bin alongside the trend to catch it.

Linear or log scale for cultural frequency data?

Use a log y-axis when you care about proportional change or when values span orders of magnitude; use linear for additive comparisons and when zero is meaningful.

What chart type suits trends over time?

A line chart for one or a few series; small multiples (faceted lines) when comparing many. Avoid stacked areas for comparison — they make individual trends hard to read.

How do I show uncertainty in a trend?

Bootstrap within each time bin and draw the confidence band, or show the raw scatter behind the line. A bare line implies a precision you usually don't have.

Step 1: Normalise before you plot anything ​

Step 2: Choose a time bin that matches your data ​

How do I keep smoothing honest? ​

Step 3: Plot corpus density next to the trend ​

Which chart type should I use? ​

Step 4: Show uncertainty ​

Step 5: Label like a publication, not a notebook ​

Key Takeaways ​

Frequently Asked Questions ​

Should I plot raw counts or relative frequencies? ​

How do I smooth a noisy trend without lying? ​

Why does my trend spike in one specific year? ​

Linear or log scale for cultural frequency data? ​

What chart type suits trends over time? ​

How do I show uncertainty in a trend? ​

Related reading ​

Step 1: Normalise before you plot anything

Step 2: Choose a time bin that matches your data

How do I keep smoothing honest?

Step 3: Plot corpus density next to the trend

Which chart type should I use?

Step 4: Show uncertainty

Step 5: Label like a publication, not a notebook

Key Takeaways

Frequently Asked Questions

Should I plot raw counts or relative frequencies?

How do I smooth a noisy trend without lying?

Why does my trend spike in one specific year?

Linear or log scale for cultural frequency data?

What chart type suits trends over time?

How do I show uncertainty in a trend?

Related reading