Appearance
When you manage raw masters versus derivatives, the rule that prevents most problems is simple: masters are read-only and irreplaceable, derivatives are disposable and regenerable. A raw master is the untouched, highest-quality preservation copy; a derivative is a processed access copy made from it. Almost every issue archivists hit — drifting copies, broken links, accidental edits, ballooning storage — traces back to a violation of that boundary. This guide diagnoses the common failures and gives you concrete fixes.
Why have masters and derivatives at all?
Serving a 300 MB uncompressed TIFF over the web is wasteful and slow; editing your only preservation copy is dangerous. Splitting the two solves both: the master sits locked in cold storage with fixity checks, while lightweight derivatives (JPEG, web-sized PDF, plain-text OCR) do the daily work and can be rebuilt at will. The master is your source of truth; everything else is a projection of it.
Problem: my derivatives have drifted out of sync
The symptom is a derivative that no longer reflects its master — cropped differently, colour-corrected, or based on an older master. The root cause is almost always one of two things: someone edited the derivative directly, or a master was swapped without re-deriving its children.
Fix it by enforcing one-way generation. Never edit a derivative; never derive from another derivative. Re-run from the master:
bash
# regenerate the access JPEG from the master TIFF, deterministically
vips thumbnail masters/img_0042.tif access/img_0042.jpg 2000 \
--export-profile srgbRecord that exact command in a manifest so the derivative is reproducible.
How do I link a derivative back to its master?
Use a deterministic convention plus a manifest. The filename stem should match (img_0042.tif → img_0042.jpg), and a small CSV or JSON manifest should capture parentage and the generation command:
csv
derivative,master,tool,command
access/img_0042.jpg,masters/img_0042.tif,vips,"thumbnail ... 2000 --export-profile srgb"Now any derivative is auditable and rebuildable from a single row.
Problem: someone keeps editing the masters
Accidental edits are the worst failure because masters are irreplaceable. The fix is structural, not a memo asking people to be careful.
- Put masters and derivatives in separate trees:
masters/andaccess/. - Set masters read-only at the filesystem level (
chmod -R a-w masters/on POSIX, or deny-write ACLs on Windows). - Exclude
masters/from any tool that writes (image editors, batch scripts). - Verify a SHA-256 fixity manifest on a schedule so any change is detected.
What folder layout works in practice?
A clean separation that scales across a collection:
text
collection/
masters/ # read-only preservation copies, fixity-checked
img_0042.tif
access/ # regenerable web/viewing derivatives
img_0042.jpg
ocr/ # text derivatives
img_0042.txt
manifest.csv # derivative -> master + command
fixity.sha256 # checksums of mastersThe point is that a glob like access/* never touches a master, and a backup policy can treat masters/ and the rest with different rigour.
Comparing the two roles
| Property | Raw master | Derivative |
|---|---|---|
| Editable? | Never | Disposable, regenerate |
| Quality | Highest, lossless | Reduced for access |
| Storage tier | Cold, durable | Warm/online |
| On corruption | Restore from backup | Regenerate from master |
| Fixity priority | Critical | Useful, recoverable |
Problem: storage is exploding
If your archive is growing faster than expected, you are probably keeping multiple derivative generations. Because derivatives are regenerable, you can safely prune old ones and keep only masters plus the current access set. Audit with a quick size breakdown and delete stale derivative folders — never the masters.
Key Takeaways
- Masters are read-only and irreplaceable; derivatives are disposable and regenerable.
- Keep masters and derivatives in separate directory trees and lock masters read-only.
- Always derive from the master, never from another derivative.
- Record the exact generation command per derivative in a manifest.
- Fixity-check masters on a schedule; a master change should never go unnoticed.
- You can prune and rebuild derivatives to control storage growth.
Frequently Asked Questions
What is the difference between a raw master and a derivative?
A raw master is the highest-quality, untouched preservation copy you never edit or serve; a derivative is a smaller, processed access copy generated from the master for viewing, web delivery or OCR. You preserve masters and regenerate derivatives as needed.
Why are my derivatives drifting out of sync with the masters?
Drift usually means a derivative was edited directly or a master was replaced without regenerating its children. Fix it by treating masters as read-only, recording the generation command, and re-deriving from the master rather than from another derivative.
Should I store masters and derivatives in the same folder?
No. Separate them into distinct directory trees (e.g. masters/ and access/) so masters can be locked read-only and excluded from routine access workflows, reducing the risk of accidental edits or deletion.
How do I link a derivative back to its master?
Use a deterministic naming convention or a manifest that maps each derivative to its parent master plus the exact command used to create it, so any derivative can be regenerated and audited.
Can I delete derivatives to save space?
Yes, derivatives are disposable by design — you can delete and regenerate them from the master at any time. Never delete the master, since it is the only irreplaceable copy.
What checksum strategy works for masters and derivatives?
Compute and store a fixity checksum (SHA-256) for every master and verify it on a schedule. Derivatives need checksums too for integrity in transit, but a mismatch on a derivative is recoverable by regeneration.