When to Expose archival description as LOD

Expose archival description as LOD when your finding aids contain entities — people, places, organisations, functions — that researchers want to query across collections and institutions, and when you can commit to stable URIs for the long term. If your only goal is helping a reader discover one fonds through a single finding aid, a well-marked-up HTML page with Schema.org and a sitemap delivers nearly all the value at a fraction of the cost. LOD pays off through linkability and reuse, not through publication itself.

This is a decision, not a default. Below are the trade-offs, the cost drivers, and the concrete signals that tell you whether archival LOD fits your sources.

What does "archival description as LOD" actually mean?

Traditionally you describe a collection in EAD — an XML tree of <c> components nested under <archdesc>, optimised for hierarchical, narrative description. LOD takes the same facts and re-expresses them as RDF triples with globally resolvable URIs, so that "the agent who created this series" can be the same URI a museum and a library also point to.

turtle

@prefix rico: <https://www.ica.org/standards/RiC/ontology#> .
@prefix : <https://archive.example.org/id/> .

:fonds-GB-0123 a rico:RecordSet ;
    rico:title "Papers of Ada Lovelace" ;
    rico:hasRecordSetType :series ;
    rico:hasProvenance :agent-lovelace .

:agent-lovelace a rico:Person ;
    rico:name "Ada Lovelace" ;
    owl:sameAs <http://www.wikidata.org/entity/Q7259> .

The owl:sameAs link to Wikidata is the whole point: it turns local description into a node in a global graph.

When is exposing archival LOD genuinely worth it?

Reach for LOD when several of these hold:

Your descriptions name recurring entities that appear across multiple collections (a prolific correspondent, a recurring place, a parent body).
Researchers ask cross-collection questions ("show me every fonds touching the Royal Society between 1830-1870").
You already maintain authority records (ISAAR-CPF) you can mint URIs for.
You participate in an aggregator (Archives Portal Europe, a national hub) that consumes RDF.
You can guarantee persistent identifiers — a w3id.org redirect or institutional PID service.

If three or more apply, the linkage value is real and compounding.

When should you NOT do it?

Be honest about disqualifying signals. LOD adds cost without adding reuse when:

Description is thin or item-level only, with no entities worth reconciling.
There is no PID strategy — broken URIs are worse than no URIs.
The collection is single and closed, with no cross-collection demand.
You lack staff time for reconciliation and re-publishing.
Stakeholders only need a searchable finding aid PDF.

A dead SPARQL endpoint and rotting URIs damage trust more than a plain HTML finding aid ever would.

Which ontology should you map to?

Option	Best for	Trade-off
RiC-O (Records in Contexts)	Native archival semantics: records, agents, activities, levels	Newer; tooling still maturing
CIDOC-CRM	Cross-domain heritage integration with museums	Event-centric modelling is verbose for archives
Schema.org `ArchiveComponent`	Search-engine discovery, low effort	Coarse; loses archival nuance
Dublin Core / EDM	Aggregator ingest (Europeana)	Flattens hierarchy

For most archives the answer is RiC-O for the rich graph, Schema.org alongside it for discovery. See ontology vs vocabulary for why you usually need both.

How do you publish without overcommitting?

Start with the cheapest tier that delivers reuse and only escalate on demand:

bash

# Tier 1 — static RDF, no server logic
# Convert EAD -> RiC-O Turtle (e.g. RiC-O Converter / XSLT)
java -jar rico-converter.jar ead2rico finding_aids/ rdf/

# Serve Turtle + JSON-LD with content negotiation behind a CDN
# /id/fonds-GB-0123  -> 303 redirect to /id/fonds-GB-0123.ttl or .jsonld

Add a full data dump and a sitemap.xml. Only stand up a triplestore (compared here) when external developers actually need ad-hoc federated queries.

What does it really cost to maintain?

The one-off conversion is the easy part. The recurring costs are:

URI governance — every minted URI is a forever promise.
Reconciliation — re-matching agents to Wikidata/VIAF as records change (reconcile to LOD).
Re-publishing — your finding aids will be revised; your RDF must follow.
Validation — SHACL or quality checks to stop silent rot.

Budget staff time, not just hosting.

Key Takeaways

LOD value comes from cross-collection linkage and reuse, not from publishing alone.
Strong signals to proceed: recurring entities, authority records, aggregator participation, and a real PID strategy.
Disqualifying signals: thin item-level metadata, no persistent identifiers, a single closed collection, no maintenance capacity.
RiC-O gives native archival semantics; pair it with Schema.org for search discovery.
You do not need a SPARQL endpoint — static Turtle/JSON-LD with content negotiation covers most cases.
The lasting cost is the URI commitment and ongoing reconciliation, not the conversion.
A neglected endpoint with broken URIs is worse than a solid HTML finding aid.

Frequently Asked Questions

Should I publish my finding aids as LOD?

Publish as LOD when your descriptions carry reusable entities (people, places, organisations) that researchers want to query across collections, and when you can maintain stable URIs. If your goal is only human discovery of one collection, a good HTML finding aid plus a sitemap is usually enough.

What is the difference between EAD and LOD for archives?

EAD is an XML container for a single hierarchical finding aid, optimised for narrative description. LOD models the same facts as RDF triples with global URIs, so entities can be linked and queried across institutions. Records in Contexts (RiC-O) is the bridge ontology.

Is Records in Contexts required to publish archival LOD?

No, but RiC-O is the purpose-built ontology for archival description and gives you record-, agent- and activity-level classes out of the box. You can also map to CIDOC-CRM or Schema.org, but you lose archival-specific semantics like provenance and levels of arrangement.

How much does it cost to maintain archival LOD?

Budget for a triplestore or static RDF hosting, persistent URI governance, periodic reconciliation against Wikidata or VIAF, and re-publishing when finding aids change. The recurring cost is the URI commitment, not the one-off conversion.

Can I expose LOD without running a SPARQL endpoint?

Yes. Static JSON-LD or Turtle files with content negotiation, plus a data dump and a sitemap, deliver most discovery value at a fraction of the cost. A live SPARQL endpoint is only worth it when external developers need ad-hoc federated queries.

What signals say archival LOD is NOT worth it?

Thin item-level metadata, no persistent identifier strategy, no staff time for reconciliation, a single closed collection, or stakeholders who only need a searchable PDF finding aid. In those cases LOD adds cost without adding reuse.

What does "archival description as LOD" actually mean? ​

When is exposing archival LOD genuinely worth it? ​

When should you NOT do it? ​

Which ontology should you map to? ​

How do you publish without overcommitting? ​

What does it really cost to maintain? ​

Key Takeaways ​

Frequently Asked Questions ​

Should I publish my finding aids as LOD? ​

What is the difference between EAD and LOD for archives? ​

Is Records in Contexts required to publish archival LOD? ​

How much does it cost to maintain archival LOD? ​

Can I expose LOD without running a SPARQL endpoint? ​

What signals say archival LOD is NOT worth it? ​

Related reading ​