Appearance
To migrate obsolete file formats safely, first confirm the format is genuinely at risk, define the significant properties you must preserve, pilot a conversion on a representative sample, validate the output, and log every step as a preservation event. Migration is a documented, reversible-where-possible transformation — not a one-off bulk re-save. The originals stay; the migrated copies are evidence you can defend.
How do I confirm a format is really obsolete?
Age alone does not make a format obsolete; loss of renderability does. Identify each file precisely with DROID or Siegfried, then check the PRONOM record and your own tests:
bash
# identify by signature, not extension
sf -csv ./collection > identification.csv
# count formats present
sf -csv ./collection | cut -d, -f5 | sort | uniq -c | sort -rnA format is a migration candidate when several signals align: software support is shrinking, no maintained spec exists, validators are unavailable, and a registry no longer lists it as acceptable. Document the evidence — that justification is part of your audit trail.
How do I choose the target format?
Pick a current, open, well-supported target that preserves your declared significant properties. Use community recommendations as a starting point, then test.
| Obsolete source | Common target | Watch for |
|---|---|---|
WordPerfect .wpd | PDF/A + plain text | Footnotes, reveal-codes formatting |
Lotus 1-2-3 .wk1 | CSV + ODS | Formulas, multiple sheets |
Old .psd flattened | TIFF | Layers, adjustment effects |
.rm / RealMedia | FFV1/MKV or H.264 | Audio sync, frame rate |
dBASE .dbf | CSV + SIARD | Field types, encoding |
The right target is the one that survives your validation, not the one that is most convenient to produce.
What does a defensible migration workflow look like?
Treat it as a pipeline with checkpoints. A minimal, auditable batch loop:
bash
for f in ./obsolete/*.wpd; do
base=$(basename "$f" .wpd)
sha256sum "$f" >> source_checksums.txt
libreoffice --headless --convert-to pdf:"writer_pdf_Export:PDFAExport=true" \
--outdir ./migrated "$f"
sha256sum "./migrated/${base}.pdf" >> target_checksums.txt
echo "$(date -Is),$f,migrated,LibreOffice 7.x" >> migration_log.csv
doneEvery run records the source checksum, the target checksum, the timestamp, and the exact tool and version. That log is what makes the operation reproducible and the result trustworthy.
How do I verify nothing important was lost?
Validate against the significant properties you declared, not a vague impression. Combine automated and human checks:
- Text-bearing migrations: extract text from source and target, diff them, and quantify differences.
- Image migrations: compare pixels (
compare -metric AE) for lossless conversions. - Structured data: confirm row/column counts, field types and encoding match.
- Human spot-check: review a 5-10% random sample for formatting and meaning that automation cannot judge.
Record the validation outcome in your preservation metadata as a PREMIS-style event with agent, date and result.
Should I delete the original after migrating?
No — keep it. The migrated copy is a derivative whose fidelity depends on today's tools and your understanding of the format. Retaining the original preserves authenticity, allows re-migration when better tools appear, and protects you if a flaw surfaces later. Storage is cheaper than an irreversible mistake across a whole collection.
Key Takeaways
- Confirm obsolescence with identification (DROID/Siegfried) plus renderability and registry evidence — not age.
- Declare significant properties before migrating; success is measured only against them.
- Pilot on a representative sample, validate, then batch the remainder.
- Log every transformation (source/target checksums, timestamp, tool and version) for reproducibility.
- Verify with automated comparison plus a human spot-check of 5-10% of files.
- Keep the original alongside the migrated copy so you can re-migrate later.
Frequently Asked Questions
How do I know a format is actually obsolete?
Check format-risk signals: declining or removed software support, no maintained specification, few or no validators, and absence from format registries' active recommendations. PRONOM risk notes and your renderability tests matter more than the file's age alone.
Should I migrate or emulate obsolete files?
Migrate when the intellectual content transfers cleanly to a current format and significant properties are preserved. Emulate when behaviour, interactivity or exact rendering matters more than format currency, such as software, games or dynamic documents.
What significant properties must I preserve during migration?
Define them per format before you start: text content and structure, embedded images, formatting that carries meaning, embedded metadata, and any functional behaviour. Migration is judged successful only against the properties you declared essential.
How do I prove a migration did not lose information?
Use automated comparison where possible (text diff, image pixel comparison, structural diff) plus human spot-checks on a sample. Record source and target checksums, the tool and version used, and the validation results as preservation events.
Should I keep the original obsolete file after migrating?
Yes, retain the original alongside the migrated version whenever storage allows. Keeping the source preserves authenticity, lets you re-migrate with better tools later, and protects you if the first migration is found to be flawed.
Can I migrate a whole collection at once?
Pilot on a representative sample first, validate the workflow, then batch the rest. Always log each transformation so the operation is reproducible and auditable across thousands of files.