Best Practices to Use fixity and checksums

Fixity is the practice of generating a checksum for every file, storing those values safely, and periodically re-computing them to prove the bits have not changed. A checksum is the fingerprint; fixity is the routine that turns one fingerprint into ongoing, documented assurance. Use SHA-256, store manifests away from the data, re-check on a schedule, and log every result — that is the whole discipline, and it is what makes silent corruption detectable instead of catastrophic.

What problem does fixity actually solve?

Storage decays quietly. A flipped bit from cosmic radiation, a failing disk sector, or a buggy copy can corrupt a file without any error message — so-called silent data corruption or bit rot. You will not notice until you open the file years later and find a broken TIFF. Fixity catches this early: by comparing today's checksum to the one recorded at ingest, you learn a file changed before the corruption propagates to your backups and overwrites the good copies.

Which algorithm should I choose?

Algorithm	Output length	Status	Use for
MD5	128-bit	Cryptographically broken	Legacy manifest compatibility only
SHA-1	160-bit	Deprecated	Avoid for new work
SHA-256	256-bit	Current default	New manifests, AIPs
SHA-512	512-bit	Strong	Very large or high-assurance collections

SHA-256 is the practical sweet spot. MD5 is fine for detecting accidental corruption and is fast, but because collisions can be engineered it is not suitable as your sole tamper-evidence. Where a legacy system gives you MD5, record both MD5 and SHA-256.

How do I generate and verify checksums?

Generate a manifest once, then verify against it forever:

bash

# Generate a manifest for a whole tree
find . -type f -exec sha256sum {} \; > manifest-sha256.txt

# Later, verify every file against the stored manifest
sha256sum -c manifest-sha256.txt
# OK lines pass; "FAILED" lines flag a changed or missing file

For larger operations, dedicated tools add scheduling and logging. AVP's Fixity, the BagIt tools, and Bagger all wrap this in an auditable workflow with timestamps.

Where should checksums live?

Store manifests separately from the data and back them up independently. If the manifest sits in the same folder as the files and that folder corrupts, you lose both the evidence and the data. A common pattern: keep one manifest inside each BagIt bag for portability and a master copy of all manifests in a separate, versioned location. Protect the master copies from rewrites so a fixity record cannot be silently "fixed" to match corrupted data.

What is a sound re-check schedule?

Fixity is only useful if you actually re-run it. A defensible cadence:

On ingest: generate and record the baseline.
After every move, copy or format migration: re-verify immediately.
Active collections: re-check quarterly.
Cold storage: re-check at least annually.
After any storage incident: full re-verification of affected media.

Log each run — date, tool, files checked, results — in your PREMIS event history so you can prove continuity during an audit.

What happens when a check fails?

A failure is a signal, not a disaster, if you have 3-2-1 copies. Confirm the mismatch with a second tool to rule out a flaky reader. Identify which copy is bad by comparing all three against the baseline. Restore the corrupt file from a verified-good copy, re-verify, and record a fixity check event plus a replication/restoration event in your metadata. The discipline of logging is what makes the recovery defensible later.

Key Takeaways

A checksum is a fingerprint; fixity is the ongoing practice of checking it.
Default to SHA-256; keep MD5 only for legacy compatibility, never alone for tamper-evidence.
Generate a baseline at ingest and re-verify after every move or migration.
Store manifests separately from the data, with independent backups and rewrite protection.
Re-check active collections quarterly, cold storage annually, and after any incident.
Log every fixity run in preservation metadata to keep an auditable trail.
A failure is recoverable when you have verified-good copies; restore, re-verify, and record the event.

Frequently Asked Questions

What is the difference between fixity and a checksum?

A checksum is the value produced by a hash function over a file's bytes; fixity is the broader practice of generating, storing and periodically re-checking those values to prove a file has not changed.

Which checksum algorithm should I use for preservation?

Use SHA-256 as the default for new work; it is collision-resistant and widely supported. Keep MD5 only where legacy manifests already use it, ideally alongside SHA-256.

How often should I re-check fixity?

Re-verify active collections quarterly and cold storage at least annually, and always re-check a file immediately after any move, copy or migration.

Where should I store the checksums?

Store manifests separately from the files they describe, with their own backups, so a single corruption event cannot damage both the data and the evidence of its integrity.

What do I do when a fixity check fails?

Do not panic-overwrite. Confirm the failure on a second tool, identify which copy is bad, restore the affected file from a verified good copy, and record the event in your preservation metadata.

Can checksums detect a malicious change?

Cryptographic hashes like SHA-256 detect any byte change including tampering, but for tamper-evidence you should also protect the manifests themselves from being rewritten.

What problem does fixity actually solve? ​

Which algorithm should I choose? ​

How do I generate and verify checksums? ​

Where should checksums live? ​

What is a sound re-check schedule? ​

What happens when a check fails? ​

Key Takeaways ​

Frequently Asked Questions ​

What is the difference between fixity and a checksum? ​

Which checksum algorithm should I use for preservation? ​

How often should I re-check fixity? ​

Where should I store the checksums? ​

What do I do when a fixity check fails? ​

Can checksums detect a malicious change? ​

Related reading ​