How to Apply FAIR principles to humanities data

Applying FAIR to humanities data means making your sources Findable, Accessible, Interoperable and Reusable in that order of effort and payoff. In practice you do four concrete things: deposit your dataset where it gets a persistent identifier, attach discovery metadata, choose open formats with a community vocabulary, and write enough provenance documentation that a stranger could reuse it. FAIR is a gradient, not a pass/fail test, so aim to move each principle forward rather than achieve perfection.

What do the four FAIR principles actually require?

The acronym hides a 15-point checklist published by Wilkinson et al. in 2016. For humanities work the practical reading is:

Principle	Core requirement	Humanities default
Findable	Persistent ID + rich metadata indexed somewhere	DOI from Zenodo, DataCite metadata
Accessible	Retrievable by an open protocol, metadata stays even if data is restricted	HTTPS download or a documented access request route
Interoperable	Open formats, shared vocabularies	UTF-8 CSV/TEI, Dublin Core, CIDOC-CRM
Reusable	Clear licence, detailed provenance	CC BY, a README and data dictionary

A common misreading is that FAIR demands open data. It does not. A dataset of oral-history interviews under embargo can still be FAIR if its metadata is openly findable and the access conditions are explicit.

How do I make a humanities dataset findable?

Findability is the cheapest win. Deposit the dataset in a repository that mints a DOI, then make sure the metadata is descriptive enough to surface in a search.

text

1. Pick a repository (Zenodo, your institutional repo, or a domain one).
2. Upload the data + documentation.
3. Fill in: title, creators with ORCIDs, date, subject keywords,
   geographic and temporal coverage, related works.
4. Publish to mint the DOI — e.g. 10.5281/zenodo.1234567.

The temporal and spatial coverage fields matter enormously in history and are routinely left blank. Filling them is what lets a researcher find "trade records, Baltic, 1650–1700" rather than your specific title.

Which formats keep humanities data interoperable?

Interoperability is where humanities projects stumble, because so much material is bespoke. The rule is: prefer open, text-based, well-supported formats and reuse an existing vocabulary instead of inventing fields.

text

Tabular     -> UTF-8 CSV (not .xlsx as the archival copy)
Text/markup -> TEI P5 XML, plain UTF-8 with documented encoding
Spatial     -> GeoJSON or GeoPackage (not a proprietary .shp soup)
Images      -> TIFF or JPEG2000 with IIIF manifests
Vocabulary  -> Dublin Core, schema.org, CIDOC-CRM, getty AAT

If you must use a column like place, map it to an authority such as GeoNames or a gazetteer so the value is resolvable, not just a string.

How do I make interpretive data reusable?

Reusability is the principle FAIR-sceptics in the humanities care about most, and it is mostly about context, not technology. Three things make qualitative data reusable:

An explicit machine-readable licence (CC BY for open work, or a clear custom statement).
Rich provenance: where each source came from, how it was transformed, who decided what.
A codebook that documents every coding scheme and interpretive choice.

A spreadsheet that classifies letters as "supportive" or "hostile" is useless without the rubric that defined those categories. Write the rubric down.

What does "Accessible" mean for restricted material?

Accessible does not mean unrestricted. The principle requires that data is retrievable by a standardised, open protocol and that metadata persists even when the data is removed. For sensitive records:

Keep the metadata record live and openly findable.
State the access protocol explicitly: who may request, by what route, under what terms.
Use tombstone pages so a dead DOI still explains why the data is gone.

How do I check my FAIR score?

Run an automated evaluator and treat the result as a to-do list, not a verdict.

bash

# F-UJI evaluates a dataset by its DOI/URL against FAIR metrics
curl -X POST https://www.f-uji.net/api/v1/evaluate \
  -H "Content-Type: application/json" \
  -d '{"object_identifier":"https://doi.org/10.5281/zenodo.1234567"}'

The GO FAIR self-assessment and the FAIR Maturity Indicators give a manual alternative. Expect a first deposit to score "moderate" — the gaps are usually missing licences and unmapped vocabularies.

Pitfalls to avoid

Treating FAIR as binary. Move each principle forward incrementally.
Confusing FAIR with Open and refusing to share sensitive metadata.
Archiving an Excel file as the canonical copy instead of CSV.
Inventing bespoke metadata fields nobody else uses.
Leaving temporal/spatial coverage blank, killing discoverability.

Key Takeaways

FAIR is a gradient: aim to improve each principle, not to "pass".
The highest-payoff first move is a DOI-minting repository deposit.
FAIR is not the same as Open; restricted data can still be FAIR.
Interoperability comes from open formats plus reused community vocabularies.
Reusability of interpretive data depends on provenance and codebooks, not file type.
Automated tools like F-UJI score datasets but no body certifies FAIRness.

Frequently Asked Questions

Do humanities datasets really need to be machine-readable to be FAIR?

The Interoperable and Reusable principles assume some machine processing, but FAIR is a gradient, not a checkbox. A well-documented CSV with a persistent identifier and a clear licence is far more FAIR than a richly described PDF that no tool can parse.

What is the single most impactful FAIR step for a small project?

Deposit in a repository that mints a DOI. A persistent identifier instantly improves Findability, gives you a citable handle, and forces you to attach minimal metadata and a licence at the same time.

Does FAIR mean my data must be open access?

No. FAIR and Open are distinct. Sensitive or rights-restricted data can be FAIR if its metadata is openly findable and the access conditions are explicit and machine-readable, captured in a clear access protocol.

Which metadata standard should humanities data use to be Interoperable?

There is no single answer, but Dublin Core or DataCite covers discovery metadata, while domain vocabularies like TEI headers, CIDOC-CRM or schema.org add semantic depth. Reusing a community vocabulary beats inventing your own.

How do I make qualitative or interpretive data Reusable?

Document provenance and your interpretive decisions in a codebook or data dictionary, attach an explicit licence, and explain coding schemes. Reusability for qualitative data is largely about transparent context, not file format.

Are FAIR principles a standard I can be certified against?

FAIR is a set of guiding principles, not a certification. Tools like the FAIR self-assessment and F-UJI evaluator give indicative scores, but no body formally certifies a dataset as FAIR.

What do the four FAIR principles actually require? ​

How do I make a humanities dataset findable? ​

Which formats keep humanities data interoperable? ​

How do I make interpretive data reusable? ​

What does "Accessible" mean for restricted material? ​

How do I check my FAIR score? ​

Pitfalls to avoid ​

Key Takeaways ​

Frequently Asked Questions ​

Do humanities datasets really need to be machine-readable to be FAIR? ​

What is the single most impactful FAIR step for a small project? ​

Does FAIR mean my data must be open access? ​

Which metadata standard should humanities data use to be Interoperable? ​

How do I make qualitative or interpretive data Reusable? ​

Are FAIR principles a standard I can be certified against? ​

Related reading ​