Best Practices to Choose an RDF vocabulary

To choose an RDF vocabulary well, reuse established, maintained, openly licensed vocabularies before inventing your own, and record every decision in a metadata application profile. The best vocabulary is the one that already describes your domain, has visible governance and a dated version, and is understood by the systems you want to link to. Minting bespoke terms is a last resort, not a starting point.

What makes a vocabulary trustworthy?

Before adopting any namespace, run it through a trust check. A trustworthy vocabulary has:

A dereferenceable namespace where each term URI returns a definition.
A named maintaining body (W3C, Getty, DCMI, a standards group).
A dated or versioned IRI so you can pin to a release.
An open licence on the term definitions.
Live adoption by datasets you respect.

A vocabulary failing two or more of these is a future migration headache. Glamour or completeness does not compensate for abandonment.

How do you avoid inventing terms you do not need?

Search published vocabularies before minting anything. Useful registries:

Linked Open Vocabularies (LOV) at lov.linkeddata.es, searchable by term.
Getty AAT, TGN, ULAN for art, place and agent concepts.
Library of Congress authorities and the BIBFRAME vocabulary.
schema.org and Dublin Core Terms for general description.

If LOV returns a property meaning what you mean, use it. Only when nothing fits do you mint a local term, and even then you describe it properly:

turtle

@prefix myo: <https://data.myarchive.org/ontology/> .
myo:bindingStyle a rdf:Property ;
    rdfs:label "binding style"@en ;
    rdfs:comment "Local term: the bookbinding technique observed on a volume."@en ;
    rdfs:isDefinedBy <https://data.myarchive.org/ontology/> .

Which vocabularies fit which heritage need?

Need	Recommended	Notes
Generic description	Dublin Core Terms	Universal, low barrier
Web discoverability	schema.org	Indexed by search engines
Concepts / thesauri	SKOS	For your controlled lists
Events, provenance	CIDOC CRM	Powerful, heavy to model
Bibliographic	BIBFRAME, FRBRoo	Library-grade granularity
Agents, names	FOAF, schema.org	Reconcile to VIAF, Wikidata
Geography	GeoSPARQL, GeoNames	Coordinates and topology

Combining several is the norm. The skill is choosing the smallest combination that covers your data.

How do you test a vocabulary against your real data?

Do not adopt on paper. Map ten real records and look for friction:

Take ten representative records from your collection.
Map each field to a candidate property.
Note every field that has no clean match.
Decide per gap: different property, different vocabulary, or local term.

If half your fields have no home, the vocabulary is wrong for your domain. If two or three do, you have found exactly where local terms are justified.

Why does versioning matter so much?

Vocabularies change. Terms get deprecated; ranges narrow. Pin to a version IRI where one exists so your data does not silently shift meaning:

turtle

<https://data.myarchive.org/dataset/2025>
    dct:conformsTo <http://www.cidoc-crm.org/cidoc-crm/7.1.1/> .

Recording the exact version you modelled against is what keeps a dataset defensible years later.

Where do all these decisions live?

In a metadata application profile (MAP): a single document that lists every class and property you use, its source vocabulary, whether it is mandatory or optional, cardinality, and a worked example. The MAP is what lets a new colleague encode the next ten thousand records consistently. Without it, every cataloguer reinvents your choices.

Key Takeaways

Reuse established, maintained, openly licensed vocabularies before minting your own.
Search Linked Open Vocabularies and the Getty and Library of Congress registries first.
Mix vocabularies deliberately; combining several is normal heritage practice.
Test any candidate by mapping ten real records and counting the unmatched fields.
Pin to a dated version IRI so meaning does not silently drift.
Record every choice in a metadata application profile as the team's single source of truth.

Frequently Asked Questions

Should I reuse an existing vocabulary or build my own?

Reuse first, almost always. Mint your own terms only for genuinely local concepts that no published vocabulary covers, and document each one with an rdfs:comment and a stable URI.

How do I check whether a vocabulary is actively maintained?

Look for a dated version IRI, a public issue tracker, a named maintaining organisation, and recent commits or releases. An undated namespace with no governance is a long-term risk.

Can I mix vocabularies in one dataset?

Yes, and you should. Most heritage datasets combine Dublin Core for description, SKOS for concepts, schema.org for web visibility and a domain ontology for events. Mixing is normal and expected.

What licence does a vocabulary need?

An open one such as CC0 or CC BY for the term definitions, so you can reproduce labels and link freely. Check the namespace document; closed or unstated licensing should give you pause.

How many vocabularies is too many?

There is no hard cap, but each one is a dependency you must document. Aim for the smallest set that covers your data well, and record every namespace in an application profile.

Where do I record my vocabulary decisions?

In a metadata application profile: a document listing every property and class you use, its source vocabulary, its obligation and an example. It is the single source of truth for your team.

What makes a vocabulary trustworthy? ​

How do you avoid inventing terms you do not need? ​

Which vocabularies fit which heritage need? ​

How do you test a vocabulary against your real data? ​

Why does versioning matter so much? ​

Where do all these decisions live? ​

Key Takeaways ​

Frequently Asked Questions ​

Should I reuse an existing vocabulary or build my own? ​

How do I check whether a vocabulary is actively maintained? ​

Can I mix vocabularies in one dataset? ​

What licence does a vocabulary need? ​

How many vocabularies is too many? ​

Where do I record my vocabulary decisions? ​

Related reading ​