Best Practices to Design URIs for heritage data

Q: Should I encode semantics like the object type into the URI path?

Keep it light. A short type segment such as `/object/` or `/person/` aids routing and human readability, but avoid deep taxonomies in the path because reclassification would otherwise force you to break identifiers.

A good heritage URI is short, opaque, owned by you, and never reused: design it so the string itself carries no information that can later become wrong. The single most consequential decision is choosing a domain and path pattern you can keep resolving for decades, because once a URI is published and cited it must not change. Everything else, content negotiation, type segments, accession numbers, is detail layered on top of that permanence guarantee.

What makes a URI "cool" for heritage data?

Tim Berners-Lee's rule still governs: cool URIs do not change. For a museum, archive or library this means the identifier must outlive the current collection management system, the current hosting provider, and probably the current staff. Concretely:

Use a domain you control long-term, or delegate persistence to w3id.org, a PURL, or an ARK resolver (n2t.net).
Keep the path free of technology hints like /cgi-bin/, /wp-content/, query strings such as ?id=42, or session tokens.
Avoid embedding the storage location, server name, or product brand.

A pattern that ages well looks like https://data.example.org/object/00041523. It is human-scannable, has a clear type segment, and the local part is a stable opaque key.

How should I separate the thing from its description?

A physical chalice and the HTML page describing it are different resources, and conflating them produces broken inference. There are two standard mechanisms:

Approach	Mechanism	Best for	Cost
303 redirect	Thing URI returns `303 See Other` to a document URI	Large per-record datasets	One extra HTTP round-trip
Hash URI	`…/vocab#term` names the thing within a document	Small vocabularies, SKOS schemes	Whole document must load

For a collection of 200,000 objects, use 303 redirects so each record can be served independently. For a 60-term controlled vocabulary, hash URIs are simpler and faster.

text

# 303 pattern
GET /object/00041523            -> 303 See Other
Location: /object/00041523.ttl  (or .html via content negotiation)

Should accession numbers go into the URI?

They can, and reusing your existing stable local identifier saves a mapping table. The rules: percent-encode anything outside the unreserved set, and treat the value as immutable.

python

from urllib.parse import quote
acc = "1998.21/a-b"
local = quote(acc, safe="")          # 1998.21%2Fa-b
uri = f"https://data.example.org/object/{local}"

If your accession numbers are messy or get reissued, mint fresh opaque keys instead and store the accession number as a dcterms:identifier literal.

How do I handle multiple formats from one URI?

Do not bake .json, .rdf or .xml into the canonical identifier. Use HTTP content negotiation so the abstract URI redirects to whatever serialisation the client asked for.

http

GET /object/00041523 HTTP/1.1
Accept: text/turtle
-> 303 -> /object/00041523.ttl

This keeps one stable citation target while you are free to add or retire serialisations later.

A working checklist before you mint anything

Domain you can hold for 20+ years, or a resolver (w3id, PURL, ARK).
Short path with a light type segment (/object/, /person/, /place/).
Opaque, immutable local key; never reissue a retired one.
No file extensions, query strings, or technology tokens in the canonical form.
Decide 303 vs hash per dataset, not globally.
Separate URIs for the conceptual object and each surrogate (IIIF manifest, scan).
Document the scheme in a public README so successors do not reinvent it.

What goes wrong most often?

Teams encode classification into the path (/paintings/dutch/17c/...), then reclassify an object and either break the URI or live with a lie. Others publish https://oldserver.museum/db.php?rec=42, migrate, and lose every external link. A third trap is case-sensitivity drift, where /Object/ and /object/ resolve differently across systems. Pick one casing, lowercase by convention, and enforce it.

Key Takeaways

Permanence is the prime directive: choose a domain or resolver you can keep alive for decades.
Keep the local key opaque and immutable; never reuse a retired identifier.
Distinguish the real-world thing from its description with 303 redirects (large data) or hash URIs (small vocabularies).
Strip extensions, query strings and product names from the canonical URI; use content negotiation for formats.
Reuse stable accession numbers only after percent-encoding and freezing them.
Mint distinct URIs for conceptual objects versus their digital surrogates.
Publish your URI scheme so the policy is documented and defensible across the whole collection.

Frequently Asked Questions

Should heritage URIs include the file extension like .json or .rdf?

No. Keep the identifier opaque and put format selection in HTTP content negotiation, redirecting from the abstract URI to a format-specific document URI. Extensions in the canonical identifier leak implementation detail and break if you change serialisations.

What is the difference between a 303 redirect and a hash URI for things?

A 303 (See Other) redirect distinguishes a real-world thing from the document describing it and suits large per-record datasets. Hash URIs (#this) are cheaper to serve because no redirect is needed, but they force the whole containing document to load, so reserve them for small vocabularies.

Can I use a museum accession number directly inside a URI?

Yes, if you percent-encode unsafe characters and freeze the value. Accession numbers are stable local identifiers, but slashes, spaces and ampersands must be encoded, and you should never reissue a retired number to a different object.

How long should heritage URIs stay live?

Treat them as permanent. Cool URIs do not change, so plan for decades using a domain you control or a resolver service like w3id.org or a PURL, and keep redirects in place even after a system migration.

Do I need a separate URI for every digital surrogate?

Yes when the surrogate is a distinct resource you want to cite, such as a IIIF manifest or a specific scan. Mint a URI for the conceptual object and separate ones for its representations, then link them explicitly.

Should I encode semantics like the object type into the URI path?

Keep it light. A short type segment such as /object/ or /person/ aids routing and human readability, but avoid deep taxonomies in the path because reclassification would otherwise force you to break identifiers.

What makes a URI "cool" for heritage data? ​

How should I separate the thing from its description? ​

Should accession numbers go into the URI? ​

How do I handle multiple formats from one URI? ​

A working checklist before you mint anything ​

What goes wrong most often? ​

Key Takeaways ​

Frequently Asked Questions ​

Should heritage URIs include the file extension like .json or .rdf? ​

What is the difference between a 303 redirect and a hash URI for things? ​

Can I use a museum accession number directly inside a URI? ​

How long should heritage URIs stay live? ​

Do I need a separate URI for every digital surrogate? ​

Should I encode semantics like the object type into the URI path? ​

Related reading ​