Appearance
Handling sensitive records responsibly means controlling who can see what, for how long, and on what documented basis — before a single page reaches a researcher. The responsible workflow has four stages: identify sensitivity, decide on closure or redaction, apply access controls technically, and log every decision. Get the order right and you protect people without sealing history away forever.
What makes a record sensitive in the first place?
Sensitivity is about potential harm to living people and communities, not about how old or interesting a document is. The standard high-risk categories are medical and mental-health files, criminal-justice and police records, adoption and child-welfare papers, asylum and immigration cases, personnel and disciplinary files, and anything revealing sexual orientation, religion or political affiliation under a hostile regime. A useful triage question: if this row were published with a name attached today, could it embarrass, endanger, or legally expose someone alive?
How do I run a sensitivity review?
Treat it as a structured pass, not a vibe check. Walk each series with a checklist and record the outcome.
text
[ ] Does it name or identify a living/recently deceased person?
[ ] Special-category data (health, ethnicity, sexuality, beliefs)?
[ ] Legal duty of confidence (medical, legal, safeguarding)?
[ ] Community-sensitive (sacred, ceremonial, traumatic)?
[ ] Risk if combined with other open datasets?
-> decision: OPEN / REDACT / CLOSE (+ review date)Assign a named reviewer and a review date to every closure so nothing stays shut by inertia.
When should I close a file versus redact it?
Closing the whole file is the blunt instrument; redaction is the scalpel. Prefer redaction when the sensitive content is a small, identifiable fraction. Keep an unredacted master in a restricted store and serve a redacted derivative to readers.
| Approach | When to use | Cost |
|---|---|---|
| Full closure | sensitivity is pervasive; review is impractical | researcher loses everything |
| Field redaction | a few identifiers in otherwise open records | medium; needs careful masking |
| Pseudonymisation | quantitative reuse where identity is irrelevant | low access cost, key must be secured |
| Anonymised derivative | publishing aggregate data | high prep, lowest risk |
How do I apply closure technically?
Encode the closure in your metadata so the access system enforces it, rather than relying on a sticky note. In a rights or access field, record the status, the legal basis and the review date.
xml
<accessrestrict>
<p>Closed under data-protection rules until 2071.</p>
<date normal="2071-01-01" type="review"/>
<legalstatus>special-category personal data</legalstatus>
</accessrestrict>For born-digital material, separate the storage tiers: an open/ tree the public can reach and a restricted/ tree behind authentication, with file permissions that match the metadata. Run a tool such as bulk_extractor to flag overlooked identifiers like national-insurance or credit-card numbers before anything goes online.
How do I serve a redacted derivative safely?
Never redact by drawing a black box over a PDF that still contains the text layer — searchable text leaks straight through. Flatten the image, strip the text layer, then re-OCR the visible page. A minimal pipeline:
bash
# rasterise to remove the underlying text, then re-OCR the safe copy
pdftoppm -png -r 300 master.pdf page
# (apply visual redaction to the PNGs here)
ocrmypdf --force-ocr redacted.png redacted_derivative.pdfVerify by searching the derivative for a name you redacted; it should return nothing.
Do I need to log who accesses sensitive records?
Yes — an access log is both an accountability tool and, in many jurisdictions, a legal requirement. Capture reader identity, the items consulted, the date, and the stated research purpose. Keep the log itself secured, since it is personal data about your readers. When a breach or complaint arrives, this log is the difference between a confident answer and a guess.
Key Takeaways
- Sensitivity is about harm to living people and communities, not the age of the document.
- Run a structured sensitivity review with a named officer and a review date, never ad hoc decisions.
- Prefer redaction or pseudonymisation over full closure when the sensitive content is a small fraction.
- Encode closure in metadata so the access system enforces it automatically.
- Redact by flattening and re-OCRing — never just overlay a box on a live text layer.
- Log access to sensitive material for accountability and legal compliance.
Frequently Asked Questions
What counts as a sensitive record in an archive?
Anything that could harm a living or recently deceased person, breach a legal duty, or expose a community to risk — medical, criminal-justice, adoption, asylum, and personnel files are typical categories. Sensitivity is contextual, so the same document can be open in one jurisdiction and closed in another.
How long should a closure period last?
Common defaults are 75 to 100 years for records about identifiable individuals, or the lifetime of the subject plus a margin. Always check the governing statute, because UK, EU and institutional rules differ.
Can I redact instead of closing a whole file?
Often yes — partial redaction opens most of a file while masking the sensitive fields. Keep an unredacted master under restricted access and serve a redacted derivative.
Who decides whether a record is sensitive?
A documented sensitivity review by a named officer, ideally guided by a written access policy and, for community material, by the source community itself. Avoid ad hoc decisions made at the reading-room desk.
Do I need to log access to sensitive material?
Yes. An access log showing who viewed what and when supports accountability, helps with breach investigations, and is often a legal or audit requirement.