Appearance
Use lossless (or uncompressed) compression for preservation masters and lossy compression only for access derivatives. The reason is permanence: lossless lets you reconstruct the exact original bytes and migrate the file indefinitely without cumulative damage, whereas lossy permanently discards information and degrades further every time it is re-saved. The decision is rarely "which is better" in the abstract — it is "which copy am I making," and that answer settles it.
What actually differs between the two?
Lossless compression (PNG, FLAC, reversible JPEG 2000, ZIP) finds and removes redundancy so the original is perfectly recoverable. Lossy compression (JPEG, MP3, H.264) throws away data the algorithm models as least perceptible, trading fidelity for far smaller files. The crucial property for archives is reversibility: lossless is reversible, lossy is not.
| Property | Lossless | Lossy |
|---|---|---|
| Reconstructs original exactly | Yes | No |
| Typical size reduction | 2–3x | 5–15x+ |
| Safe to re-compress | Yes | No (degrades) |
| Best for | Masters | Access copies |
| Examples | TIFF (LZW), PNG, FLAC | JPEG, MP3, H.264 |
How do I decide for a specific file?
Ask three questions in order:
- Is this a master or a derivative? Master → lossless. Derivative → lossy is fine.
- Will it be migrated again later? If yes, lossless prevents generational loss.
- Is lossless impractical at this scale? Only then consider lossy masters, and document the decision.
In nearly every heritage workflow the first question alone gives the answer.
Why is repeated lossy compression so damaging?
Lossy artefacts compound. Save a JPEG, edit it, save again, migrate it to another lossy format in ten years — each step discards more and adds new artefacts that can never be removed. This generation loss is the archival nightmare lossless masters exist to prevent. You can always make a fresh lossy derivative from a lossless master; you can never recover a lossless original from a lossy file.
A practical workflow with worked commands
The standard pattern: lossless master, lossy access copy, both derived from the same source.
bash
# Lossless preservation master: TIFF with LZW (or uncompressed)
vips copy scan.tif master/scan.tif[compression=lzw]
# Lossy access derivative for the web, regenerable from the master
vips thumbnail master/scan.tif access/scan.jpg 2000 \
--export-profile srgb
# Reversible (lossless) JPEG 2000 master, if you prefer JP2
opj_compress -i scan.tif -o master/scan.jp2 -r 1 # rate 1 = losslessFor audio, FLAC is your lossless master and MP3/AAC the access copy; for video, a lossless or visually-lossless master (e.g. FFV1 in Matroska) with an H.264 access copy.
When is lossy compression genuinely acceptable?
There are honest exceptions:
- Access derivatives — always fine; they are regenerable.
- Born-lossy originals — if the source is a JPEG, re-encoding to lossless gains nothing and wastes space; keep the original.
- Impractical scale — a multi-petabyte aerial survey may justify visually-lossless JPEG 2000 masters, but document the rationale.
- Audio/video access where lossless delivery is impractical.
The rule of thumb: lossy is acceptable for access or when the source was already lossy, never as a fresh master for born-lossless content.
Does file size justify lossy masters?
Usually no. Storage is cheap relative to the irreplaceability of a master, and a lossless TIFF that is 3x larger than a JPEG is a trivial cost against losing the original forever. Spend your size budget on derivatives, where you can compress aggressively because you can always regenerate them.
Key Takeaways
- Masters: lossless or uncompressed. Derivatives: lossy is fine.
- Lossless is reversible to the exact original; lossy is not.
- Repeated lossy saves cause irreversible generation loss.
- JPEG 2000 offers both reversible (lossless) and irreversible (lossy) modes.
- Keep born-lossy originals as-is rather than re-encoding them.
- Spend storage budget on masters; compress derivatives aggressively.
Frequently Asked Questions
Should preservation masters use lossless or lossy compression?
Preservation masters should be lossless (or uncompressed), so no information is discarded and the file can be migrated indefinitely without cumulative quality loss. Reserve lossy compression for access derivatives.
What is the difference between lossless and lossy compression?
Lossless compression reduces file size while allowing exact reconstruction of the original bytes; lossy compression achieves much smaller files by permanently discarding information the algorithm judges less perceptible.
Is JPEG 2000 lossless or lossy?
JPEG 2000 supports both modes. Reversible (5/3 wavelet) compression is mathematically lossless and suitable for masters; irreversible (9/7 wavelet) is lossy and suited to access copies.
Why is repeated lossy compression dangerous for archives?
Each lossy save discards more data and introduces fresh artefacts, so re-compressing across migrations causes generational degradation that cannot be reversed. Lossless masters avoid this entirely.
When is lossy compression acceptable in a heritage workflow?
Lossy is acceptable for web and viewing derivatives, very large born-lossy collections where lossless is impractical, and audio/video access copies, provided the lossless or original master is preserved separately.
How much smaller are lossy files than lossless?
It varies by content, but lossy JPEG is often 5–15x smaller than a lossless TIFF of the same image, while lossless formats like PNG or reversible JPEG 2000 typically achieve only 2–3x reduction.