Appearance
To quantify uncertainty in a historical estimate, separate its two main sources, sampling variability and measurement or coding ambiguity, then express each as a range rather than a single number. Use the bootstrap for sampling uncertainty, Monte Carlo simulation to propagate uncertain conversion factors and deflators, and an explicit sensitivity analysis for debatable assumptions. The deliverable is never a bare figure; it is an estimate with an honest interval and a sentence explaining where the range comes from. This guide runs the full workflow with concrete examples.
What sources of uncertainty actually matter?
Two dominate historical work, and they are not the same:
- Sampling uncertainty, because you analysed a subset of surviving records. Standard formulas address this.
- Measurement and coding uncertainty, because a hand was ambiguous, a date was a guess, an occupation was interpreted, a value was deflated. Standard formulas ignore this, yet it is often the larger share.
A confidence interval that captures only sampling error while a quarter of your records were coded on a coin-flip is misleadingly tight. The first job is to name every source of slack, not just the convenient statistical one.
How do I get a confidence interval without distributional assumptions?
Use the bootstrap. Resample your records with replacement many times, recompute the statistic each time, and read off the percentiles. It makes no normality assumption and handles medians, ratios and other awkward statistics gracefully:
python
import numpy as np
rng = np.random.default_rng(42)
def boot_ci(data, stat=np.median, B=10000, alpha=0.05):
reps = [stat(rng.choice(data, size=len(data), replace=True))
for _ in range(B)]
lo, hi = np.percentile(reps, [100*alpha/2, 100*(1-alpha/2)])
return lo, hi
# example: median age at marriage from a sample
boot_ci(ages_at_marriage) # -> (24.1, 26.8)With 10,000 resamples the interval is stable. For a median age at marriage of 25.4, a 95% bootstrap interval of 24.1-26.8 tells the reader far more than the point alone.
How do I propagate uncertainty in a deflator or conversion factor?
When an estimate runs through an uncertain factor, a price index, a bushel-to-litre conversion, a currency rate, treat the factor as a distribution and run a Monte Carlo simulation. Sample the factor, recompute, repeat:
python
# real wage with an uncertain CPI (best guess 142, plausible +-8)
nominal = 38.0
cpi = rng.normal(142, 8, size=10000)
real = nominal / cpi * 100
lo, hi = np.percentile(real, [2.5, 97.5]) # propagated 95% rangeThe output spread is your propagated uncertainty. Reporting real wage 26.8 (95% range 25.3-28.6) is honest; reporting 26.8 pretends the index was exact.
Confidence interval versus sensitivity range: which do I need?
Both, and they answer different questions.
| tool | answers | varies |
|---|---|---|
| confidence interval | how much would this wobble if I resampled? | the data, under a fixed model |
| sensitivity range | how much does my answer depend on a debatable choice? | the assumption (coding rule, gap-fill method) |
| Monte Carlo | how does input uncertainty flow to the output? | uncertain factors |
If a different but defensible coding rule moves your estimate by more than the confidence interval, the assumption is your dominant uncertainty, and a confidence interval alone would hide it. Run the analysis under each plausible rule and report the spread.
How should I present uncertainty to non-statisticians?
Lead with a range in plain language and show it visually, error bars, a shaded band, or a fan chart for projections, never a lone line implying precision the data lacks. Pair the number with one sentence of provenance: "literacy rose 12 points (95% CI 4-20), though the result is sensitive to how we counted partial signatures." That sentence carries the size, the statistical spread and the assumption risk together, which is exactly what an informed reader needs.
When is a single number honest?
Only for an exact, physical count: 412 wills in the box, 28 surviving letters. The moment you generalise, interpret, deflate, or fill a gap, the precision of a bare number is overstated and you owe the reader a range. Treat "single number, no interval" as a claim that there is genuinely nothing to be uncertain about, which for almost any estimate is false.
A workflow you can reuse
- List every source of uncertainty, sampling and non-sampling alike.
- Bootstrap the sampling component for a distribution-free interval.
- Monte Carlo any uncertain factor through to the final estimate.
- Run a sensitivity analysis over each debatable assumption.
- Combine into a reported range and one plain-language provenance sentence.
- Visualise with bands or error bars; never publish a naked point estimate that generalises.
Key Takeaways
- Quantify both sampling and coding/measurement uncertainty; the latter often dominates and is usually ignored.
- Use the bootstrap for distribution-free confidence intervals on any statistic.
- Propagate uncertain deflators and conversions with Monte Carlo simulation.
- Distinguish a confidence interval (resampling) from a sensitivity range (assumptions); report both.
- Present uncertainty as a visible range with a plain-language reason, not a lone number.
- A single figure is only honest for an exact count of what physically survives.
Frequently Asked Questions
What kinds of uncertainty should I quantify in historical estimates?
At least two: sampling uncertainty from working with a subset of records, and measurement or coding uncertainty from ambiguous, transcribed or interpreted sources. Many historical estimates are dominated by the second kind, which standard statistical formulas ignore.
How do I get a confidence interval if I do not know the underlying distribution?
Use the bootstrap: resample your data with replacement thousands of times, recompute the estimate each time, and take the 2.5th and 97.5th percentiles. It needs no distributional assumption and handles odd statistics like medians and ratios well.
How do I propagate uncertainty through a deflator or conversion factor?
Treat the factor as a distribution rather than a fixed number and run a Monte Carlo simulation, sampling the factor and recomputing your estimate many times. The spread of results is your propagated uncertainty, which a single point estimate hides.
What is the difference between a confidence interval and a sensitivity range?
A confidence interval quantifies sampling variability under a fixed model; a sensitivity range shows how the estimate moves when you change a debatable assumption such as a coding rule or a missing-data method. Historical work usually needs both, reported separately.
How should I present uncertainty to readers who are not statisticians?
Show a range and the reasoning behind it in plain language, and visualise it with error bars, shaded bands or fan charts rather than a single line. A point estimate with no visible uncertainty implies false precision.
Is it ever honest to give a single number with no uncertainty?
Only for an exact count of what physically survives, such as 412 wills in a box. Any estimate that generalises, interprets, or fills gaps should carry an explicit range, because the precision of a bare number is almost always overstated.