Sample historical records: A Practical Guide

To sample historical records well, define the population you want to describe, build a frame that lists every unit in it, choose a probability method such as systematic or stratified sampling, fix the sample size from your target margin of error, and record the design so others can reproduce it. A good sample lets you make defensible statements about a whole register, parish, or run of court rolls from a fraction of the work.

Why sample at all?

Full transcription of a long series — say forty years of poor-law admissions — can consume person-years. A probability sample of 1,500 admissions can estimate the share of female applicants, or the median age, to within a couple of percentage points. The point of sampling is inference: every record you draw stands in, with known probability, for many you did not read.

How do I define the population and frame?

The population is the abstract set you want conclusions about: "all baptisms in the diocese, 1750-1799." The frame is the concrete list you can actually draw from: the surviving, legible registers. Where they diverge — lost volumes, illegible folios — your inference quietly narrows. Write down the gap.

text

Population : all diocesan baptisms 1750-1799 (estimated 92,000)
Frame      : 41 surviving registers, 88,400 legible entries
Coverage   : 96.1 % of estimated population

Which sampling method should I choose?

Method	When to use	Strength	Watch out for
Simple random	Frame is a digital list	Unbiased, simple maths	Hard against physical volumes
Systematic (every kth)	Bound or filmed sources	Even coverage, easy in practice	Periodicity in the source
Stratified	Compare known subgroups	Precise per-stratum estimates	Need strata sizes in advance
Cluster	Records grouped by place	Cheap fieldwork	Larger sampling error

Systematic sampling is the practical default for archival work. To take a 1-in-20 sample of 88,400 entries, pick a random start between 1 and 20, then read every 20th entry.

python

import random
N, frac = 88_400, 1/20
k = int(1/frac)                 # interval = 20
start = random.randint(1, k)    # e.g. 7
selected = list(range(start, N + 1, k))
print(len(selected), "records, starting at", start)

Guard against periodicity: if entries cycle (for example, every 20th line is a column header or a Sunday), a fixed interval can lock onto that cycle and bias the sample. Vary k slightly or stratify if you suspect it.

How big does the sample need to be?

For a proportion, the worst case (p = 0.5) gives the largest required size. The classic formula is n = z^2 * p(1-p) / e^2. At 95 percent confidence (z = 1.96) and a margin e = 0.03, that is about 1,067 records. Add a finite-population correction when the sample is a large fraction of the frame, which shrinks the requirement.

What about stratified sampling for subgroups?

If you want separate, precise estimates for, say, three social ranks, allocate the sample across strata rather than relying on a single pooled draw. Proportional allocation matches each stratum's share of the frame; optimal (Neyman) allocation puts more sample where variability or cost is higher. Strata that you oversample need weights at analysis time.

How do I keep it reproducible?

Set and record a random seed, log the start value, and store the list of drawn record IDs as a file. Anyone should be able to re-run your script and land on the same rows. Note every exclusion — damaged folios, blank pages — because silent exclusions are how a probability sample turns into a biased one.

Key Takeaways

Separate the population (what you want to describe) from the frame (what you can draw from), and document the gap.
Systematic 1-in-k sampling is the easiest defensible method for physical sources.
Size the sample from your target margin: about 1,067 records for plus or minus 3 points.
Stratify when you need precise estimates for known subgroups, and weight oversampled strata.
Avoid convenience samples; they describe what you read but cannot generalise.
Fix a random seed and store the drawn IDs so the sample is reproducible.
Log every exclusion, because hidden exclusions reintroduce bias.

Frequently Asked Questions

Why sample historical records instead of transcribing everything?

A well-drawn sample of a few thousand records can estimate population quantities almost as precisely as a full transcription that takes years. You trade a little precision for an enormous saving in time and cost.

What sampling method should I use first?

Use systematic sampling: choose a random start and take every kth record. It is easy to apply against bound volumes or microfilm and spreads the sample evenly across the source.

How large should a historical sample be?

For estimating a proportion within plus or minus 3 percentage points at 95 percent confidence you need roughly 1,000 to 1,100 records. Halving the margin to plus or minus 1.5 points quadruples that to about 4,300.

What is the danger of a convenience sample?

A convenience sample, such as the records that happen to be indexed or legible, is biased in unknown ways and cannot support inference about the whole population. You can describe what you read but not generalise from it.

Do I need to weight my sample?

You need weights whenever selection probabilities differ, for example if you oversample a rare group. Weighting restores the original population proportions when you compute estimates.

How do I document a sampling design?

Record the frame, the method, the interval or fractions, the random seed or start, and any exclusions, in a README beside the data. This lets another researcher reproduce or critique your sample.

Why sample at all? ​

How do I define the population and frame? ​

Which sampling method should I choose? ​

How big does the sample need to be? ​

What about stratified sampling for subgroups? ​

How do I keep it reproducible? ​

Key Takeaways ​

Frequently Asked Questions ​

Why sample historical records instead of transcribing everything? ​

What sampling method should I use first? ​

How large should a historical sample be? ​

What is the danger of a convenience sample? ​

Do I need to weight my sample? ​

How do I document a sampling design? ​

Related reading ​