Skip to content
Born-Digital Archives

Appraising born-digital records means deciding, on a forensic copy, which files to keep, restrict or discard based on their enduring value rather than their sheer volume. You image the media first, reduce obvious clutter automatically, then apply human judgement to what remains. The classic archival ideas of provenance and value carry straight over from paper; what is new is the scale, the duplication and the operating-system noise you must filter out before the real selection begins. This guide walks a small example end to end.

What is appraisal, in plain terms?

Appraisal is the act of selection. Not everything an institution receives has lasting value, and keeping everything forever is neither affordable nor kind to future researchers who must wade through it. For born-digital records you are asking the same question an archivist always asks, "is this worth keeping, and why?", but across thousands of files at once, many of which the creator never consciously made.

Why image before you appraise?

Because appraisal is a decision, and decisions made on the original media damage it. Opening folders to judge them rewrites access times and can overwrite recoverable deleted files. So the order is fixed: capture a forensic image, mount or extract a working copy, and appraise that. The image remains the untouched authority, which means any appraisal mistake is reversible.

How do you cut the volume before judging?

A surprising share of a typical disk is duplication and system files with no archival value. Remove that mechanically before spending human attention:

bash
# 1. Find duplicate files by content hash
hashdeep -r -c sha256 ./working_copy | sort | uniq -w64 -D

# 2. Match against the NSRL known-file set to flag OS/app files
#    (hashes in the Reference Data Set can be discarded with confidence)

De-duplicating and weeding known system files routinely removes a large fraction of raw bytes, leaving a far smaller set that actually warrants thought.

What criteria decide what stays?

Once the clutter is gone, weigh each remaining group against clear criteria:

FactorKeep signalDiscard signal
Evidential valueDocuments decisions, activityTransient cache, autosave
UniquenessOnly copy, original workDuplicate of held record
Informational valueRich, reusable contentSystem-generated noise
Risk and costManageable, low riskHigh sensitivity, no value
Donor agreementWithin scopeOut of scope or restricted

Most files fall out quickly under uniqueness and evidential value; the hard cases are few, which is exactly why automation first is worth the effort.

A small worked example

Suppose you accession a writer's laptop image of 80,000 files. Hash de-duplication collapses backups and email attachments to 52,000. Filtering NSRL-known operating-system and application files leaves 18,000. Weeding browser caches and thumbnail databases leaves 9,000. Now a human looks: drafts, correspondence and research notes are kept; a folder of pirated films is out of scope and discarded; a tax archive is kept but restricted. The 80,000-file problem became a 9,000-file judgement, then a handful of policy calls.

How do you record the decision?

Write a short appraisal note while the reasoning is fresh: the criteria applied, the tools and reference sets used, what was kept, separated or restricted, and who approved it. This is what converts a personal call into an institutional, defensible decision, and it lets a successor understand why the collection looks the way it does.

Key Takeaways

  • Appraisal is selection by value, not by volume.
  • Image first; appraise the copy so the original is never altered.
  • Reduce duplication and system files mechanically before judging.
  • Use clear criteria: evidence, uniqueness, information, risk, donor scope.
  • Separate rather than destroy until the appraisal is reviewed.
  • Document criteria, tools, outcomes and approval to make it defensible.

Frequently Asked Questions

What does appraisal mean for born-digital records?

Appraisal is deciding what to keep, what to restrict and what to discard, based on a record's value rather than its volume. For born-digital material it happens on a working copy, after imaging, so the original is never altered.

Should I appraise before or after making a disk image?

Always image first, then appraise the copy. Browsing the source to appraise it would alter timestamps and could destroy deleted content, so the capture has to come before any selection decision.

How is digital appraisal different from paper appraisal?

The principles of provenance and value are the same, but scale, duplication and system files are new. You appraise gigabytes not boxes, and you must filter out application and operating-system clutter that has no archival value.

What are quick wins for reducing volume?

De-duplicate identical files by hash, remove known system and application files using a reference set such as the NSRL, and weed obvious caches and temporary folders. These steps often cut raw volume substantially before any judgement calls.

Do I delete the files I appraise out?

Usually you separate rather than destroy, at least until the appraisal is reviewed. Keeping the disk image means a rejected file can be revisited, so deletion of the master is a late and deliberate act.

What should I document about my appraisal?

Record the criteria you applied, the tools and reference sets used, what was kept or restricted and why, and who approved it. That note turns a personal judgement into a defensible institutional decision.