Skip to content
Digital Preservation

For most archives the honest answer is that you should not choose between cloud and local preservation at all: you should run both, with local storage giving you control and fast restore, and cloud giving you geographic separation and a copy under different administrative control. The genuine comparison is not "which is better" but "which copy plays which role in my 3-2-1 strategy, and at what total cost over ten years". This guide gives you a checklist to make that decision consistent and defensible.

What actually differs between cloud and local?

The differences that matter for preservation are rarely raw price per terabyte. They are about control, failure modes, and the shape of the bill over time.

FactorLocal (disk/tape)Cloud object store
Cost shapeLarge up-front, low ongoingZero up-front, perpetual monthly
Restore speedFast (LAN), or slow for cold tapeFast download, but egress-billed
Fixity controlYou run it; full accessAPI-dependent; verify deliberately
Failure modeHardware death, site disasterAccount lockout, provider exit, price hikes
Admin controlTotalShared with provider
Exit costMove the disksEgress fees on every byte out

The asymmetry in the bottom row is the one people miss. Putting data in is usually free; getting it out to leave is not.

How do I compare the true ten-year cost?

Never compare a one-off hardware quote against one month of cloud billing. Model both over the same horizon. A defensible spreadsheet has these rows for each option: acquisition, power and cooling, staff time, media refresh (disks every 4-5 years, LTO every 7-10), and for cloud the monthly storage class fee, request fees, and a one-time full-egress charge to simulate exit.

text
ten_year_local  = hardware + (refresh_cycles * hardware * 0.7)
                + power_per_year*10 + staff_per_year*10
ten_year_cloud  = (monthly_storage * 120)
                + estimated_requests + full_egress_once

Run the egress line item even if you never plan to leave. If a full restore would cost more than re-digitising, that is a preservation risk, not just a financial one.

Does cloud storage satisfy the 3-2-1 rule?

One cloud bucket is one copy on one medium under one administrator. It does not, by itself, satisfy 3-2-1. A robust pattern is: copy 1 on local primary disk, copy 2 on local LTO or a second site, copy 3 in cloud cold storage (Glacier Deep Archive, B2, Azure Archive). Crucially, the cloud copy should use different credentials and ideally a different provider from anything else, so a single compromised account cannot delete every copy.

How do I verify fixity in the cloud?

Treat the provider's durability promise ("eleven nines") as marketing, not evidence. You need your own audit trail. Most object stores expose a stored checksum:

bash
# AWS S3 — fetch the stored SHA-256 for one object
aws s3api head-object --bucket my-archive \
  --key bag-042/manifest-sha256.txt \
  --query '{etag:ETag, checksum:ChecksumSHA256}'

# Compare against your local BagIt manifest
sha256sum -c manifest-sha256.txt

Schedule a sampling audit: monthly fixity on a random 5-10% of objects rather than the whole corpus, which keeps request and egress costs bounded while still detecting silent loss.

What about vendor lock-in and exit?

Lock-in for preservation is an existential risk, not an inconvenience. Mitigate it by storing self-describing packages: BagIt bags with embedded manifests, not provider-proprietary archive formats. Then any object store becomes interchangeable. Keep an exit runbook that states exactly how to pull every byte, what it costs, and how long it takes at your egress bandwidth cap.

A working checklist

  • Decide the role of each copy before choosing technology (primary, near-line, off-site, dark archive).
  • Model ten-year total cost for both options, including egress and media refresh.
  • Ensure no two copies share credentials, provider, or physical site.
  • Store data as self-describing BagIt packages, never provider-native archives.
  • Run scheduled sampled fixity with your own manifests, not the provider's claims.
  • Write and test an exit runbook with real timing and cost figures.
  • Document the whole decision so an auditor or successor can reconstruct your reasoning.

Key Takeaways

  • The real choice is the role each copy plays in 3-2-1, not cloud versus local in the abstract.
  • Cloud is often cheaper to fill and more expensive to leave; always price egress.
  • One cloud bucket is a single copy under a single administrator, not a backup strategy.
  • Verify fixity yourself with your own manifests; never trust silent durability claims.
  • Self-describing BagIt packages keep every option portable and defeat lock-in.
  • A hybrid model usually beats either extreme for control plus geographic separation.

Frequently Asked Questions

Is cloud storage cheaper than local storage for preservation?

Over a 5-10 year horizon, usually not. Cloud egress fees, API request costs and indefinite monthly billing often exceed the amortised cost of LTO tape or local disk arrays for cold, rarely-accessed archives.

Does cloud storage count as a backup for the 3-2-1 rule?

A single cloud bucket counts as one copy on one medium. To satisfy 3-2-1 you still need at least one geographically separate, ideally offline, copy under different administrative control.

What is data egress and why does it matter?

Egress is the fee charged for transferring data out of a cloud provider. It matters because it makes large-scale restores and provider migrations expensive, and can quietly lock you in.

Can I verify fixity on data stored in the cloud?

Yes, but you must do it deliberately. Most object stores return an ETag or stored checksum via the API; compare it to your own manifest rather than trusting the provider's silent durability claims.

Is a hybrid cloud-plus-local model worth the complexity?

For most archives, yes. A hybrid keeps a controllable on-premises copy for fast restore and audit while using cloud as an off-site, geographically separate second or third copy.