Skip to content
Digital Preservation

Assess file format obsolescence risk at three moments: at ingest for every new accession, on a regular cycle (usually annually) across the whole repository, and reactively the instant a registry, vendor or community flags a format as endangered. Obsolescence is when the software or knowledge to render a file disappears, leaving readable bits but unreadable content. The point of assessment is triage — concentrating scarce migration effort on genuinely fragile, proprietary or niche formats while leaving robust open formats simply monitored.

When is assessment worth the effort?

Not every collection needs the same scrutiny. Assessment pays off most when your holdings contain proprietary formats (old Word .doc, WordPerfect, vendor CAD files), encrypted or DRM-wrapped files, output from discontinued software, or anything with no published specification. It pays off least for collections that are already normalised to open, well-supported formats like PDF/A, TIFF, plain text or CSV — there, light monitoring beats repeated deep assessment. The honest trade-off: assessment costs staff time, and over-assessing stable formats is wasted effort.

What signals tell me a format is at risk?

Score each format against concrete indicators rather than gut feeling:

SignalLower riskHigher risk
SpecificationOpen, publishedProprietary, secret
ToolingMany readers, activeOne vendor, discontinued
AdoptionWidespreadNiche or abandoned
DependenciesSelf-containedNeeds fonts, codecs, plugins
ProtectionNoneEncrypted or DRM
StandardisationISO/IETF standardVendor-defined

A format scoring "higher risk" on three or more rows belongs near the top of your migration queue.

How do I actually identify what I hold?

Never trust file extensions — they lie and can be renamed. Identify by internal signature using PRONOM-backed tools:

bash
# Siegfried: fast signature-based identification, outputs PRONOM PUIDs
sf -multi 16 /collections/accession-2024-051/ > formats.csv

# DROID equivalent on the command line
droid -a /collections/accession-2024-051/ -p profile.droid

The resulting PUIDs (e.g. fmt/95 for PDF/A-1a) link directly to PRONOM risk information, turning a folder of mystery files into an evidence-based inventory.

When should I NOT assess (or act)?

Skip deep assessment when the cost outweighs the benefit: tiny collections already in open formats, content with no enduring value, or files you have just normalised on ingest. And critically, even a high risk score does not always mean migrate now. If the content is low-value, or migration would lose significant properties you cannot yet preserve, the right move may be to document the risk, keep the original, and revisit. Assessment informs the decision; it does not force one.

Turning the assessment into a plan

A workable cadence: profile every accession at ingest and record PUIDs in your metadata; run a repository-wide profile annually and diff it against last year's to catch newly endangered formats; and subscribe to community signals so a vendor's discontinuation notice triggers an ad-hoc review. Rank flagged formats by risk score multiplied by content value, and only then schedule migration for the top of the list.

A short worked example

You profile a 5,000-file accession and Siegfried reports 4,200 PDF/A, 600 TIFF, and 200 files identified as a discontinued proprietary database format with no current reader. The PDFs and TIFFs are low-risk: monitor them. The 200 database files score high on specification, tooling and standardisation — they go straight onto the migration plan, with the originals retained. You spent your effort exactly where it mattered.

Key Takeaways

  • Assess at ingest, on an annual cycle, and reactively when a format is flagged.
  • Obsolescence leaves bits readable but content unreadable — software and knowledge are the fragile parts.
  • Target effort at proprietary, encrypted, niche or unsupported formats; lightly monitor robust open ones.
  • Identify formats by signature with Siegfried or DROID, not by file extension.
  • Score formats against concrete signals; three-plus high-risk rows means prioritise.
  • A high risk score informs but does not dictate migration — weigh content value and significant properties.
  • Record PUIDs in metadata and diff annual profiles to catch newly endangered formats.

Frequently Asked Questions

What is file format obsolescence?

Format obsolescence is when the software, hardware or knowledge needed to render a file becomes unavailable, so the bits survive but the content can no longer be opened or correctly displayed.

When should I run a format risk assessment?

Run one at ingest for any new accession, on a scheduled cycle (typically annually) for the whole repository, and immediately whenever a vendor discontinues a format or a registry flags it as at-risk.

Is format obsolescence as common a threat as people think?

Wholesale obsolescence is rarer than feared for open, well-documented formats, but proprietary, encrypted, or niche formats remain genuinely high-risk, so assessment targets effort where it matters.

What signals indicate a format is at risk?

Watch for proprietary specifications, dependence on a single vendor or discontinued software, encryption or DRM, lack of a published specification, and very low adoption or no current tooling.

Which tools identify file formats reliably?

Use DROID and Siegfried (both backed by the PRONOM registry) to identify formats by signature, and link results to PRONOM risk information rather than trusting file extensions.

Should I migrate every at-risk file immediately?

No. Assess risk first, then act only where the risk is high and the content is valued; low-risk open formats are usually best left alone and simply monitored.