Appearance
Choose embedded metadata when portability and self-description matter most — the file should explain itself even when separated from your system — and choose a sidecar when the record is large, frequently revised, or in a format the file cannot legally carry. Most heritage workflows end up doing both: a small embedded core (title, identifier, rights, creator) inside the master for resilience, plus a full sidecar (.xmp or .json) holding the descriptive and technical detail. The hard part is rarely the choice; it is stopping the two copies from silently drifting or vanishing. This guide walks the failures you will actually hit and how to fix the root cause.
Why did my embedded metadata vanish after an edit?
This is the single most common complaint, and the cause is almost always an export step that rewrites the file. Photoshop "Save for Web", many CMS upload pipelines, and convert from old ImageMagick builds drop XMP and IPTC chunks. Diagnose it directly:
bash
# before and after an export, compare the metadata footprint
exiftool -a -G1 -s master.tif > before.txt
exiftool -a -G1 -s derivative.jpg > after.txt
diff before.txt after.txtThe fix is not to hand-retype — it is to re-inject from an authoritative source after the lossy step:
bash
# push a rights + identifier core back into the derivative
exiftool -overwrite_original \
-XMP-dc:Rights="CC BY 4.0" \
-XMP-dc:Identifier="ark:/12345/x9q2" \
-IPTC:Credit="County Archive" derivative.jpgHow do I diagnose orphaned sidecar files?
A sidecar is metadata-by-convention: photo_0481.tif pairs with photo_0481.xmp only because the basenames match. Rename, re-foldering, or a copy tool that skips "extra" files breaks that link and you get an orphan with no error message. Audit pairing before it bites:
bash
# list masters with no matching .xmp sidecar
for f in *.tif; do
[ -f "${f%.tif}.xmp" ] || echo "ORPHAN MASTER: $f"
doneRun that check in CI or a pre-ingest script so a broken pairing fails loudly rather than shipping silently.
Embedded vs sidecar: which should I default to?
| Concern | Embedded | Sidecar |
|---|---|---|
| Survives file copy/move | Yes | Only if moved together |
| Survives format conversion | Often stripped | Yes |
| Holds large/complex records | Limited | Yes |
| Easy to batch-edit | Harder | Easy (it is just a file) |
| Self-describing if separated | Yes | No |
| Works for CSV/plain formats | No | Yes |
Default: embed the resilient core, sidecar the full record, and write down which one is authoritative.
Why are my two copies disagreeing?
Drift happens when both copies are editable and nobody declared a master. If a cataloguer fixes a date in the sidecar but the embedded copy still says the old value, downstream tools may read either one. Pick a single source of truth — usually the sidecar or your collection database — and regenerate the embedded core from it on a schedule rather than editing both by hand.
How do I stop metadata dying in transfer?
Loose sidecars are exactly the files that FTP clients, S3 sync rules and "copy the images" instructions tend to drop. Never transfer masters and sidecars as independent objects. Bag them:
bash
bagit.py --md5 ./object_4471/ # master + sidecar in one bag
# after transfer, on the receiving side:
bagit.py --validate ./object_4471/The BagIt manifest now fails validation if the sidecar went missing, turning a silent loss into a caught error.
What about formats that cannot embed at all?
CSV, plain .txt, many GIS shapefiles and most 3D mesh formats have no standard embedded-metadata slot, so the question answers itself: use a sidecar (.json, .xml, or a readme/.cpg companion). For TIFF, JPEG, PNG, PDF, WAV and MP4 you have a real choice, so apply the table above.
Key Takeaways
- Embed a small resilient core; sidecar the full descriptive and technical record.
- Most metadata "loss" is an export step rewriting the file — diagnose with
exiftooldiffs and re-inject. - Sidecars are coupled to masters only by naming convention; audit pairing in CI.
- Declare one authoritative copy and regenerate the other to prevent drift.
- Transfer masters and sidecars together inside a BagIt bag and validate after.
- Formats like CSV and shapefiles cannot embed, so they always need a sidecar.
Frequently Asked Questions
What is the difference between embedded and sidecar metadata?
Embedded metadata lives inside the file itself (EXIF, IPTC, XMP, ICC). Sidecar metadata lives in a separate companion file, usually a .xmp, .json or .xml that travels alongside the master.
Why did my embedded metadata disappear after editing?
Most editors and "save for web" exports strip or rewrite metadata chunks by default. Re-inject from your sidecar or a CSV with exiftool, and add a fixity check so silent stripping is caught.
Are sidecar files safe to lose?
No — a sidecar is only as durable as your file-naming discipline. If the basename or folder pairing breaks, the metadata is orphaned, so keep sidecars in the same directory and validate pairing in CI.
Can I have both embedded and sidecar at once?
Yes, and it is common. Embed a small descriptive and rights core for portability, keep the full record in a sidecar, and treat one of them as authoritative to avoid drift.
Which formats support embedded metadata?
TIFF, JPEG, PNG, PDF, WAV and MP4 all carry embedded blocks (XMP/EXIF/IPTC/ID3). Plain text, CSV and many raster formats do not, so those usually need a sidecar.
How do I fix metadata that won't survive transfer?
Package the master and its sidecar together in a BagIt bag or a zip, validate the manifest after transfer, and never rely on a copy tool to preserve loose companion files.