Appearance
To estimate preservation storage costs, multiply your data volume by the number of copies you keep, project realistic annual growth over your planning horizon, then apply per-terabyte prices for each storage tier and add the overheads that disk price ignores — egress, fixity, media refresh, migration and staff time. The headline disk price is usually the smallest part of the total. A defensible estimate models all the recurring costs over five to ten years, not just today's cost of a single copy.
What goes into a preservation cost, beyond the disk?
Raw storage is one line item among many. A realistic model includes:
- Multiple copies — 3-2-1 means roughly 3x your data volume.
- Media refresh — disks and tape are replaced every 3–7 years.
- Fixity and management — compute and tooling to check integrity.
- Format migration — periodic conversion of at-risk formats.
- Egress and retrieval — cloud fees to read data back out.
- Staff time — often the largest cost of all.
Costing only the cheapest tier of one copy is the classic underestimate.
Step by step: building the estimate
Work through a simple model. Suppose you hold 4 TB today, expect 20% annual growth, keep 3 copies, and plan over 5 years.
text
Year 0 volume : 4 TB
Annual growth : 20% compounded
Year 5 volume (1 copy): 4 * 1.20^5 ≈ 9.95 TB
Copies : 3
Year 5 stored volume : ≈ 29.9 TB across all copiesNow apply tiered prices. Mix a working copy on managed disk with two cheaper copies (e.g. cloud cold storage and offline tape), because keeping three copies on premium disk is rarely justified.
How do I compare storage tiers?
| Tier | Indicative storage cost | Retrieval / egress | Best role |
|---|---|---|---|
| Managed disk / NAS | Higher per TB, capital up front | None | Working access copy |
| Cloud hot object store | Moderate per TB | Low–moderate egress | Secondary online copy |
| Cloud cold / archive | Very low per TB | High retrieval + egress | Deep archive copy |
| LTO tape | Low per TB after drive cost | Staff time to load | Offline / offsite copy |
The trap with cold cloud tiers: storage looks almost free, but a single large retrieval or a provider migration can cost more than years of storage. Always model the exit.
Why must I model egress and exit costs?
Egress — the fee to move data out of a cloud — is where naive estimates collapse. Storing 30 TB cheaply is fine until you must migrate providers or run a full fixity audit that reads everything back. At typical egress rates, pulling tens of terabytes can dwarf the annual storage bill. Build an "exit cost" line into every cloud estimate: what would it cost to retrieve everything and leave? If that number is frightening, rebalance toward an architecture you can afford to walk away from.
Don't forget growth and staff
Two omissions sink most estimates. First, growth: a static estimate is wrong within a year. Apply a compounded rate (10–30% is common for active collections) across the horizon. Second, staff: the human time to ingest, document, check fixity and run migrations frequently exceeds the storage bill. A cost model that shows storage as the dominant cost is usually a model that forgot the people.
A reusable estimation checklist
Capture the model in a spreadsheet with these inputs: current volume, growth rate, planning horizon, number and placement of copies, per-tier storage price, media refresh cycle, expected egress for audits and migration, and annual staff effort. Recompute yearly. Present the result as a range, not a single number, because growth and access patterns are uncertain — a defensible estimate is honest about that uncertainty.
Key Takeaways
- Total cost = volume x copies x growth, plus egress, refresh, migration and staff.
- Disk price is usually the smallest line item; staff time is often the largest.
- 3-2-1 roughly triples your stored volume before overheads.
- Always model egress and a full "exit cost" for cloud tiers.
- Cold cloud storage is cheap to hold but expensive to read back.
- Apply realistic compounded growth (10–30%) over a 5–10 year horizon.
- Present the estimate as a range and recompute it annually.
Frequently Asked Questions
How do I estimate preservation storage costs?
Multiply your data volume by your required number of copies, add annual growth, then apply per-terabyte costs for each storage tier plus egress, staff and migration overheads over your planning horizon.
Why is raw storage price not the whole cost?
Preservation cost includes multiple copies, periodic media refresh, fixity checking, format migration, egress and retrieval fees, and staff time; the disk price is often the smallest line item.
How many copies should I budget for?
Budget for at least three copies under the 3-2-1 rule, which roughly triples raw storage volume before you add overheads.
What is egress and why does it matter for cloud preservation?
Egress is the fee cloud providers charge to move data out; large retrievals or a provider migration can cost far more than storage itself, so always model egress and exit costs.
Should I plan for data growth?
Yes. Apply a realistic annual growth rate, often 10 to 30 percent for active collections, compounded over your planning horizon, or your estimate will fall short within a year or two.
Is cloud always cheaper than local storage?
Not necessarily; cold cloud tiers have low storage prices but high retrieval and egress fees, while local storage front-loads hardware cost but avoids egress, so the answer depends on access patterns.