Skip to content
Digital Preservation

To estimate preservation storage costs, multiply your data volume by the number of copies you keep, project realistic annual growth over your planning horizon, then apply per-terabyte prices for each storage tier and add the overheads that disk price ignores — egress, fixity, media refresh, migration and staff time. The headline disk price is usually the smallest part of the total. A defensible estimate models all the recurring costs over five to ten years, not just today's cost of a single copy.

What goes into a preservation cost, beyond the disk?

Raw storage is one line item among many. A realistic model includes:

  • Multiple copies — 3-2-1 means roughly 3x your data volume.
  • Media refresh — disks and tape are replaced every 3–7 years.
  • Fixity and management — compute and tooling to check integrity.
  • Format migration — periodic conversion of at-risk formats.
  • Egress and retrieval — cloud fees to read data back out.
  • Staff time — often the largest cost of all.

Costing only the cheapest tier of one copy is the classic underestimate.

Step by step: building the estimate

Work through a simple model. Suppose you hold 4 TB today, expect 20% annual growth, keep 3 copies, and plan over 5 years.

text
Year 0 volume        : 4 TB
Annual growth        : 20% compounded
Year 5 volume (1 copy): 4 * 1.20^5 ≈ 9.95 TB
Copies               : 3
Year 5 stored volume : ≈ 29.9 TB across all copies

Now apply tiered prices. Mix a working copy on managed disk with two cheaper copies (e.g. cloud cold storage and offline tape), because keeping three copies on premium disk is rarely justified.

How do I compare storage tiers?

TierIndicative storage costRetrieval / egressBest role
Managed disk / NASHigher per TB, capital up frontNoneWorking access copy
Cloud hot object storeModerate per TBLow–moderate egressSecondary online copy
Cloud cold / archiveVery low per TBHigh retrieval + egressDeep archive copy
LTO tapeLow per TB after drive costStaff time to loadOffline / offsite copy

The trap with cold cloud tiers: storage looks almost free, but a single large retrieval or a provider migration can cost more than years of storage. Always model the exit.

Why must I model egress and exit costs?

Egress — the fee to move data out of a cloud — is where naive estimates collapse. Storing 30 TB cheaply is fine until you must migrate providers or run a full fixity audit that reads everything back. At typical egress rates, pulling tens of terabytes can dwarf the annual storage bill. Build an "exit cost" line into every cloud estimate: what would it cost to retrieve everything and leave? If that number is frightening, rebalance toward an architecture you can afford to walk away from.

Don't forget growth and staff

Two omissions sink most estimates. First, growth: a static estimate is wrong within a year. Apply a compounded rate (10–30% is common for active collections) across the horizon. Second, staff: the human time to ingest, document, check fixity and run migrations frequently exceeds the storage bill. A cost model that shows storage as the dominant cost is usually a model that forgot the people.

A reusable estimation checklist

Capture the model in a spreadsheet with these inputs: current volume, growth rate, planning horizon, number and placement of copies, per-tier storage price, media refresh cycle, expected egress for audits and migration, and annual staff effort. Recompute yearly. Present the result as a range, not a single number, because growth and access patterns are uncertain — a defensible estimate is honest about that uncertainty.

Key Takeaways

  • Total cost = volume x copies x growth, plus egress, refresh, migration and staff.
  • Disk price is usually the smallest line item; staff time is often the largest.
  • 3-2-1 roughly triples your stored volume before overheads.
  • Always model egress and a full "exit cost" for cloud tiers.
  • Cold cloud storage is cheap to hold but expensive to read back.
  • Apply realistic compounded growth (10–30%) over a 5–10 year horizon.
  • Present the estimate as a range and recompute it annually.

Frequently Asked Questions

How do I estimate preservation storage costs?

Multiply your data volume by your required number of copies, add annual growth, then apply per-terabyte costs for each storage tier plus egress, staff and migration overheads over your planning horizon.

Why is raw storage price not the whole cost?

Preservation cost includes multiple copies, periodic media refresh, fixity checking, format migration, egress and retrieval fees, and staff time; the disk price is often the smallest line item.

How many copies should I budget for?

Budget for at least three copies under the 3-2-1 rule, which roughly triples raw storage volume before you add overheads.

What is egress and why does it matter for cloud preservation?

Egress is the fee cloud providers charge to move data out; large retrievals or a provider migration can cost far more than storage itself, so always model egress and exit costs.

Should I plan for data growth?

Yes. Apply a realistic annual growth rate, often 10 to 30 percent for active collections, compounded over your planning horizon, or your estimate will fall short within a year or two.

Is cloud always cheaper than local storage?

Not necessarily; cold cloud tiers have low storage prices but high retrieval and egress fees, while local storage front-loads hardware cost but avoids egress, so the answer depends on access patterns.