Appearance
When format identification goes wrong, the cause is almost always one of three things: a stale signature database, a file matched only by its (unreliable) extension, or a genuinely unknown format the registry doesn't yet describe. Format registries like PRONOM give every format a stable identifier — a PUID such as fmt/353 for TIFF — and tools like DROID and Siegfried match files against them. This guide diagnoses the common failures and gives the fix for each.
Why does a tool return "unknown" for a file?
This is the most reported problem, and it has a short diagnostic ladder. Work down it:
- Stale signatures — your tool's PRONOM signature file predates the format. Fix: update it (below). This resolves the majority of cases.
- Damaged header — the file's magic bytes are corrupt, so no signature matches. Fix: inspect the header and check fixity.
- Truly new/rare format — PRONOM has no signature yet. Fix: record it, and consider submitting a sample to The National Archives.
Update first; it is free and fixes most "unknowns":
bash
# Siegfried: refresh the PRONOM signature database
sf -update
sf -version # confirm the signature set date
# DROID equivalent: download the latest signature file
# (via the GUI: Tools > Check for signature updates)How do I tell whether a match is real or just by extension?
A match by signature (the file's internal bytes) is trustworthy; a match by extension alone is a guess that fails the moment someone renames a file. Always check how the tool matched.
bash
sf -json report.pdf | python -m json.tool
# look for "basis": "byte match at 0..." (good)
# versus "basis": "extension match" (weak — verify!)If a .pdf matched only by extension, open the header — a renamed JPEG will start with FFD8FF, not %PDF. Trusting extension matches is the classic root cause of mis-shelved formats in an archive.
Why do DROID and Siegfried disagree?
When two tools report different PUIDs for the same file, line up three things:
| Check | Why it causes disagreement |
|---|---|
| Signature version | Different PRONOM releases know different formats |
| Match basis | One matched by bytes, the other by extension |
| Priority rules | Ambiguous files resolve to different "best" PUIDs |
The fix is to align both tools to the same PRONOM signature version, then re-run. If they still differ, prefer the byte-signature match and inspect the file by hand. Disagreement is information — it usually flags an ambiguous or container format worth a closer look.
What does a PUID give me that a MIME type doesn't?
Extensions and MIME types are coarse and collide: .dat says nothing, and application/octet-stream covers thousands of formats. A PUID pins down the exact format and version unambiguously and persistently — fmt/95 is PDF/A-1a specifically, not "a PDF." Store the PUID in your technical metadata so that years later you can run a precise risk query ("show me every x-fmt/111 plain-text-with-no-encoding file") rather than guessing from extensions.
How do I use the registry for format-risk work?
The registry is not just for naming files; it underpins obsolescence planning. A practical workflow:
bash
# 1. Identify everything, capture PUIDs to CSV
sf -csv /archive > formats.csv
# 2. Tally formats by PUID to see your real profile
cut -d, -f6 formats.csv | sort | uniq -c | sort -rnNow cross-reference the high-count or worrying PUIDs against risk indicators (proprietary, undocumented, no current software). PRONOM records, plus tools like the file-format risk registries, tell you which PUIDs need a migration plan. The registry turns "we have lots of old files" into a ranked, actionable list.
What if PRONOM has no entry for my format?
First, confirm it really is unknown (update signatures, check the header). If it is genuinely absent, do two things: record a local interim identifier and full technical notes so the file is not lost in your system, and submit a sample to The National Archives' PRONOM research process. Their team analyses sample files and creates new signatures, which is how the registry keeps pace with emerging formats — and your submission helps every other archive too.
Key Takeaways
- Most "unknown" results come from a stale signature database — update DROID/Siegfried first.
- Trust matches made by byte signature; treat extension-only matches as guesses to verify.
- A PUID identifies a format and version unambiguously and persistently; store it in metadata.
- Tool disagreements usually trace to signature-version, match-basis or priority differences.
- Use captured PUIDs to build a format profile and drive obsolescence/risk planning.
- Inspect file headers (magic bytes) to catch renamed or damaged files.
- Submit genuinely new formats to PRONOM so the registry — and everyone — improves.
Frequently Asked Questions
What is a format registry?
A format registry is an authoritative database of file format definitions and their signatures. PRONOM, run by The National Archives (UK), is the best-known; each format gets a persistent identifier called a PUID, such as fmt/353 for TIFF.
Why does my identification tool return 'unknown' for a file?
Usually the file's signature is absent or doesn't match any in the registry — because the format is new, the file is damaged, or your signature database is out of date. Update DROID/Siegfried's signature file first, then inspect the file's header.
What is a PUID and why does it matter?
A PUID (PRONOM Unique Identifier) is a stable code for a specific format and version. It matters because it is unambiguous across systems and time, unlike file extensions or MIME types, which collide and drift.
How do I keep my signature database current?
DROID and Siegfried download signature updates from PRONOM. Run their update commands on a schedule (for example before each large ingest) so new formats are recognised; stale signatures are the top cause of false 'unknown' results.
Why do two tools disagree about a file's format?
They may use different signature sets, different versions of the PRONOM database, or different priority rules for ambiguous files. Align their signature versions and check which one matched by signature rather than by extension.
Can I submit a new format to PRONOM?
Yes. The National Archives accepts research submissions of new or under-described formats, including sample files and signature information, which is how the registry grows to cover emerging formats.