Skip to content
Ethics, Bias & Sensitivity

Handling sensitive records responsibly means controlling who can see what, for how long, and on what documented basis — before a single page reaches a researcher. The responsible workflow has four stages: identify sensitivity, decide on closure or redaction, apply access controls technically, and log every decision. Get the order right and you protect people without sealing history away forever.

What makes a record sensitive in the first place?

Sensitivity is about potential harm to living people and communities, not about how old or interesting a document is. The standard high-risk categories are medical and mental-health files, criminal-justice and police records, adoption and child-welfare papers, asylum and immigration cases, personnel and disciplinary files, and anything revealing sexual orientation, religion or political affiliation under a hostile regime. A useful triage question: if this row were published with a name attached today, could it embarrass, endanger, or legally expose someone alive?

How do I run a sensitivity review?

Treat it as a structured pass, not a vibe check. Walk each series with a checklist and record the outcome.

text
[ ] Does it name or identify a living/recently deceased person?
[ ] Special-category data (health, ethnicity, sexuality, beliefs)?
[ ] Legal duty of confidence (medical, legal, safeguarding)?
[ ] Community-sensitive (sacred, ceremonial, traumatic)?
[ ] Risk if combined with other open datasets?
-> decision: OPEN / REDACT / CLOSE  (+ review date)

Assign a named reviewer and a review date to every closure so nothing stays shut by inertia.

When should I close a file versus redact it?

Closing the whole file is the blunt instrument; redaction is the scalpel. Prefer redaction when the sensitive content is a small, identifiable fraction. Keep an unredacted master in a restricted store and serve a redacted derivative to readers.

ApproachWhen to useCost
Full closuresensitivity is pervasive; review is impracticalresearcher loses everything
Field redactiona few identifiers in otherwise open recordsmedium; needs careful masking
Pseudonymisationquantitative reuse where identity is irrelevantlow access cost, key must be secured
Anonymised derivativepublishing aggregate datahigh prep, lowest risk

How do I apply closure technically?

Encode the closure in your metadata so the access system enforces it, rather than relying on a sticky note. In a rights or access field, record the status, the legal basis and the review date.

xml
<accessrestrict>
  <p>Closed under data-protection rules until 2071.</p>
  <date normal="2071-01-01" type="review"/>
  <legalstatus>special-category personal data</legalstatus>
</accessrestrict>

For born-digital material, separate the storage tiers: an open/ tree the public can reach and a restricted/ tree behind authentication, with file permissions that match the metadata. Run a tool such as bulk_extractor to flag overlooked identifiers like national-insurance or credit-card numbers before anything goes online.

How do I serve a redacted derivative safely?

Never redact by drawing a black box over a PDF that still contains the text layer — searchable text leaks straight through. Flatten the image, strip the text layer, then re-OCR the visible page. A minimal pipeline:

bash
# rasterise to remove the underlying text, then re-OCR the safe copy
pdftoppm -png -r 300 master.pdf page
# (apply visual redaction to the PNGs here)
ocrmypdf --force-ocr redacted.png redacted_derivative.pdf

Verify by searching the derivative for a name you redacted; it should return nothing.

Do I need to log who accesses sensitive records?

Yes — an access log is both an accountability tool and, in many jurisdictions, a legal requirement. Capture reader identity, the items consulted, the date, and the stated research purpose. Keep the log itself secured, since it is personal data about your readers. When a breach or complaint arrives, this log is the difference between a confident answer and a guess.

Key Takeaways

  • Sensitivity is about harm to living people and communities, not the age of the document.
  • Run a structured sensitivity review with a named officer and a review date, never ad hoc decisions.
  • Prefer redaction or pseudonymisation over full closure when the sensitive content is a small fraction.
  • Encode closure in metadata so the access system enforces it automatically.
  • Redact by flattening and re-OCRing — never just overlay a box on a live text layer.
  • Log access to sensitive material for accountability and legal compliance.

Frequently Asked Questions

What counts as a sensitive record in an archive?

Anything that could harm a living or recently deceased person, breach a legal duty, or expose a community to risk — medical, criminal-justice, adoption, asylum, and personnel files are typical categories. Sensitivity is contextual, so the same document can be open in one jurisdiction and closed in another.

How long should a closure period last?

Common defaults are 75 to 100 years for records about identifiable individuals, or the lifetime of the subject plus a margin. Always check the governing statute, because UK, EU and institutional rules differ.

Can I redact instead of closing a whole file?

Often yes — partial redaction opens most of a file while masking the sensitive fields. Keep an unredacted master under restricted access and serve a redacted derivative.

Who decides whether a record is sensitive?

A documented sensitivity review by a named officer, ideally guided by a written access policy and, for community material, by the source community itself. Avoid ad hoc decisions made at the reading-room desk.

Do I need to log access to sensitive material?

Yes. An access log showing who viewed what and when supports accountability, helps with breach investigations, and is often a legal or audit requirement.