Skip to content
Born-Digital Archives

When access to born-digital records fails, work the problem in this order: confirm an access derivative actually exists (never serve the master), check the reading-room workstation has software that opens the format, verify the file passes fixity, and confirm no sensitivity rule is silently blocking or — worse — silently leaking. Most "the file won't open" tickets are a missing or broken derivative, not a lost record. Diagnose from the delivery copy backwards to the master.

Why won't a file open at all?

Three causes account for most failures. First, no access copy was ever made — the catalogue points at a dark preservation master the reader's account cannot reach. Second, the workstation lacks the application (a .wpd WordPerfect file on a machine with only Microsoft Office). Third, the format is genuinely obsolete and no current software reads it, which pushes you toward migration or emulation. Identify the format before guessing.

bash
# What format is this, really?
sf access/item-0042.bin          # Siegfried → PRONOM PUID + match basis
# fmt/40 = Word 97-2003, x-fmt/44 = WordPerfect 5.1, etc.

Why does a file open but look wrong?

A file that opens to garbled or empty content is almost always an encoding or derivation fault. Legacy plain text in CP-1252 or Latin-1, read as UTF-8, turns accented characters into mojibake. A migration that half-failed can produce a valid-looking but empty PDF. Test, then re-derive from the master.

bash
# Detect the real encoding before converting
file -i access/letters.txt                  # → charset=iso-8859-1
iconv -f ISO-8859-1 -t UTF-8 access/letters.txt > delivery/letters.txt

# Validate a delivery PDF that opened blank
jhove -m PDF-hul -h xml delivery/report.pdf | grep -i "status"

How do you give access without leaking sensitive data?

Never expose the raw master in the reading room. Serve a checked access derivative, and run a sensitivity scan before anything is delivered. bulk_extractor will surface payment-card numbers, national IDs and email addresses you would otherwise miss in thousands of files.

bash
bulk_extractor -o features/ /work/sip-088/disk.E01
# Inspect pii.txt, ccn.txt, telephone.txt before opening access

If a scan flags personal data, redact or restrict that item and regenerate the derivative — do not rely on the reader not to look.

Master vs access copy: which does the reader get?

ConcernPreservation masterReading-room access copy
FormatOriginal / normalisedOpen, stable (PDF/A, TXT, JPEG)
FixityAuthoritative manifestRe-verified on delivery
SensitivityUnredactedRedacted / restricted
VisibilityDarkReader-facing
RegenerableNo (it is the source)Yes, from the master

The rule: the master stays dark and authoritative; the reader only ever touches a copy you can regenerate.

Why is a checksum mismatch appearing?

A fixity failure at delivery has three usual roots: the stored master degraded (bit rot or a bad replica), the manifest is stale because you re-derived without updating it, or the copy to the workstation corrupted in transit. Re-verify the master against its authoritative manifest first; if the master is fine, the fault is downstream and you simply re-copy. Never serve a file that fails fixity.

What if the record needs its original software?

Some records — a 1995 HyperCard stack, a database in an extinct application — only make sense inside their original environment. Rather than migrating and losing behaviour, give access through emulation: a hosted service such as EaaSI, or a local QEMU/DOSBox configuration, presented to the reader as a controlled, screen-only session. They interact with the running software, not the underlying files.

Key Takeaways

  • Diagnose from the delivery copy backwards: confirm an access derivative exists before assuming the record is lost.
  • Identify the format with Siegfried before deciding whether the problem is missing software or true obsolescence.
  • Garbled content is usually an encoding mismatch or a failed migration — re-derive from the master and validate with JHOVE.
  • Always scan with bulk_extractor and serve redacted/restricted copies; never expose the raw master.
  • On a fixity mismatch, re-verify the master first, then chase the downstream copy — never serve a file that fails fixity.
  • For records that depend on original software, use emulation and a screen-only session rather than handing over files.

Frequently Asked Questions

Why can't readers open a file that the catalogue says exists?

Usually a format or codec problem: the access copy was never generated, the reading-room workstation lacks the right application, or the file is an obsolete format that no current software opens. Check the delivery derivative, not the preservation master.

Why does a file open but show garbled or empty content?

Most often a character-encoding mismatch (legacy text read as UTF-8), a corrupt derivative from a failed migration, or an Office file whose embedded objects or fonts are missing. Re-derive from the master and verify with JHOVE.

How do I give access without leaking sensitive data?

Serve a redacted or restricted access copy, never the raw master. Run a sensitivity scan (bulk_extractor) before opening anything, and apply item- or file-level access rules in your delivery system rather than relying on the reader to behave.

Should readers get the original files or access derivatives?

Access derivatives, in stable open formats (PDF/A, plain text, normalised images). Keep originals and preservation masters dark; expose only copies you have checked, redacted where needed, and can regenerate.

Why is a checksum mismatch appearing on delivery?

Either the file changed in storage (bit rot or a bad copy), the manifest is stale after a re-derivation, or the transfer to the reading-room workstation corrupted it. Re-verify the master's fixity, then re-copy; never serve a file that fails fixity.

How do I provide access to a record that needs emulation?

Stand up an emulated environment (EaaSI, or a local QEMU/DOSBox setup) that runs the original software, and give the reader a controlled, screen-only session rather than the raw files.