Skip to content
DH Project Management

To estimate a digitisation timeline, measure your real throughput on a small pilot batch, express it as items per hour, then divide your total item count by that rate and add overhead for preparation, metadata, quality assurance and rework. The most common beginner mistake is timing only the scanning step — the full pipeline usually takes 40 to 100% longer than capture alone. This guide walks through the core idea with a small worked example you can copy.

What does a digitisation timeline actually measure?

It measures the whole pipeline, not just the moment of image capture. A useful mental model has five stages, each consuming time: prepare (retrieve, unbind, flatten, condition-check), capture (the scan or photograph), process (crop, colour-correct, derive formats), describe (metadata), and QA (check and rework). Beginners see only "capture" and underestimate by a factor of two. Estimate every stage.

How do I find my real capture rate?

Run a pilot. Pick 50 representative items, digitise them properly end to end, and time each stage with a stopwatch or a simple log. Representative is the key word: do not pilot your easiest material.

text
Pilot batch: 50 bound-volume pages, planetary scanner
- prepare:   25 min   (0.5 min/page)
- capture:   60 min   (1.2 min/page)
- process:   30 min   (0.6 min/page)
- describe:  75 min   (1.5 min/page)   <- the surprise
- QA/rework: 20 min   (0.4 min/page)
Total: 210 min for 50 pages = 4.2 min/page = ~14 pages/hour (full pipeline)

Note the gap: pure capture was 50 pages/hour, but the full pipeline delivered only 14. Estimating from the capture figure alone would have been 3.5x too optimistic.

A small worked example you can follow

Suppose you have 12,000 pages to digitise and the pilot above gives 14 pages/hour through the full pipeline, with one full-time operator working 6 productive hours a day, 5 days a week.

text
12,000 pages / 14 pages-per-hour = 857 productive hours
857 hours / 6 hours-per-day      = 143 working days
143 days / 5 days-per-week        = ~29 weeks
+ 25% contingency                 = ~36 weeks (about 8 months)

The contingency line is not padding — it absorbs equipment downtime, leave, and the items that fail QA and must be redone. A first-project estimate without it will overrun.

Why is my estimate always shorter than reality?

Three predictable reasons, in order of impact:

  1. Metadata was forgotten. Describing items often equals or exceeds capture time.
  2. Only one operator's good day was counted. Real productive hours are rarely the full workday; 6 of 7.5 is typical once you account for breaks, setup and interruptions.
  3. Rework was ignored. A QA pass that rejects even 5% of images means re-pulling, re-capturing and re-checking those items.

Should I estimate in pages or items?

Use whatever unit you can count and time consistently. The table below shows why averaging across material types misleads — keep separate rates per type and sum them.

Material typeTypical full-pipeline rateNote
Modern bound volume12-20 pages/hourmetadata is light
Fragile loose sheets4-10 items/hourcareful handling dominates
Photographs / objects3-8 items/hourcolour and lighting setup
Heavily catalogued itemsslowermetadata QA adds up

If a collection mixes types, estimate each block separately and add them; a single blended rate hides the slow material that wrecks the schedule.

How big should the buffer be?

For a first digitisation project, add 20-30% contingency on top of the pipeline-based figure. Push toward the higher end when the material is fragile, varied, or poorly catalogued, because those conditions multiply preparation and rework. Track actuals against the estimate weekly; after a few weeks your real rate replaces the pilot rate and the remaining estimate sharpens dramatically.

Key Takeaways

  • Estimate the full pipeline — prepare, capture, process, describe, QA — not just scanning.
  • Run a 50-item pilot on representative material to get your true items-per-hour rate.
  • Pure capture rates are often 2-3x faster than full-pipeline rates; never estimate from the former.
  • Metadata is the most underestimated cost and frequently rivals capture time.
  • Convert to productive hours, then days, then weeks, and add 20-30% contingency.
  • Estimate each material type separately and sum; blended averages hide the slow material.

Frequently Asked Questions

How do I estimate a digitisation timeline as a beginner?

Measure your true throughput on a small pilot batch (say 50 items), convert it to items per hour, then divide your total item count by that rate and add overhead for handling, QA and rework. Never estimate from a vendor's peak figure.

What is a realistic capture rate for archival material?

It varies enormously: bound volumes on a planetary scanner might be 150-300 pages an hour, fragile single sheets on a copy stand far fewer, and complex objects only a handful. Always time your own material rather than trusting a generic number.

Why are my real timelines always longer than my estimate?

Because beginners estimate only the capture step. Preparation, metadata, quality assurance and rework typically add 40-100% on top of pure scanning time. Estimate the whole pipeline, not just the scanner.

Should I estimate in pages or in items?

Estimate in whatever unit you can actually count and time. For bound volumes, pages-per-hour is natural; for objects, items-per-hour. Keep the unit consistent across the whole estimate.

How much buffer should I add to a digitisation estimate?

A 20-30% contingency on top of a pipeline-based estimate is sensible for a first project, rising if the material is fragile, varied or poorly catalogued. Buffer covers equipment downtime, sick days and the items that need redoing.

What is the biggest hidden cost in digitisation time?

Metadata. Creating or checking descriptive metadata frequently takes as long as or longer than the imaging itself, yet beginners routinely forget to budget for it.