Appearance
A controlled vocabulary is simply an agreed list of approved terms that everyone on a project uses to describe the same thing the same way. Instead of one cataloguer typing "photograph," another "photo," and a third "b/w print," all three pick one approved term — ideally with a stable identifier. The payoff is consistency, findability and a path into linked data. This guide introduces the idea in plain language and walks a small worked example.
Why does free text cause problems?
Imagine a thousand records of historical photographs. Without a controlled vocabulary you get "photograph," "photo," "Photograph," "albumen print," "b&w photo" — all meaning roughly the same thing. A search for one misses the rest. Controlled vocabularies fix this by forcing a choice from a list, so the concept "photographs" is recorded identically every time.
What is a controlled vocabulary, really?
At its simplest it is three things:
- A finite list of approved terms.
- A rule that you must pick from the list.
- Ideally, a stable identifier (URI) for each term.
A richer version, a thesaurus, also records relationships — which term is preferred, which are synonyms, and which terms are broader or narrower. The Getty Art & Architecture Thesaurus (AAT) is a good example: "photographs" has a preferred label, alternate labels, and a place in a hierarchy under "visual works."
Which vocabularies should beginners reach for?
You rarely need to invent one. Heritage work has well-established options:
| Vocabulary | Use it for | Example concept |
|---|---|---|
| Getty AAT | Object types, materials, techniques | photographs, lithographs |
| Getty TGN | Places (historical and modern) | Ipswich, Constantinople |
| Getty ULAN | Artists, makers, firms | John Constable |
| LCSH | Subjects / topics | Enclosure of common lands |
| LC NAF | Personal & corporate names | Royal Society (Great Britain) |
Each provides a stable URI per concept, which is what makes your metadata link-ready.
A small worked example
Suppose you are cataloguing a 1903 street photograph. Without a vocabulary you might write type: photo. With AAT you record both the label and the identifier:
yaml
type:
label: "photographs"
uri: "http://vocab.getty.edu/aat/300046300"
subject:
label: "Street life"
scheme: "LCSH"
place:
label: "Ipswich (England)"
uri: "http://vocab.getty.edu/tgn/7011361"Now any system can resolve aat/300046300 to the exact concept, regardless of how the label is spelled or translated. That is the whole game: the word may vary, the identifier does not.
Why is the URI better than the word?
The word "photographs" is ambiguous across languages and spellings; the URI aat/300046300 is not. It identifies one concept globally. Recording the URI alongside the human-readable label means a French interface can show "photographies" while still matching an English record — they share the identifier. It also lets your data join the linked-data web, where other datasets reference the same Getty concept.
When should you build your own vocabulary?
Only when nothing established fits — perhaps a local classification of estate records with no external equivalent. If you do build one:
- Keep a master list with definitions (scope notes) so terms are applied consistently.
- Mark one preferred label per concept and list non-preferred variants.
- Publish it as SKOS (Simple Knowledge Organization System) so others can reuse it:
turtle
ex:estate-map a skos:Concept ;
skos:prefLabel "estate maps"@en ;
skos:altLabel "manorial maps"@en ;
skos:broader ex:maps .How do controlled vocabularies pay off?
- Search recall and precision both rise — synonyms collapse onto one term.
- Faceted browse becomes possible — a clean term list powers filters.
- Interoperability — shared vocabularies let collections aggregate cleanly.
- Future linked data — URIs are the on-ramp.
The cost is a little discipline at cataloguing time. It is almost always worth it.
Key Takeaways
- A controlled vocabulary is an agreed term list you must choose from, ending free-text drift.
- A thesaurus is a richer controlled vocabulary with preferred labels and term relationships.
- Reach for established vocabularies first: AAT, TGN, ULAN, LCSH, LC NAF.
- Record the term URI as well as the label — the identifier is what disambiguates and links.
- Build your own only when nothing fits, and publish it as SKOS.
- Controlled vocabularies raise search recall and precision and enable faceted browse.
- They are the practical on-ramp to linked open data.
Frequently Asked Questions
What is a controlled vocabulary?
A controlled vocabulary is an agreed, finite list of approved terms used to describe things consistently, so that everyone records the same concept the same way. It replaces free text with a chosen term and often a stable identifier.
What is the difference between a controlled vocabulary and a thesaurus?
A simple controlled vocabulary is just an approved term list. A thesaurus adds relationships between terms (broader, narrower, related) and preferred-versus-non-preferred labels, so it is a richer, structured kind of controlled vocabulary.
Which controlled vocabularies should heritage projects use?
Common choices are the Getty AAT for object types and materials, TGN for places, ULAN for people and organisations, Library of Congress Subject Headings for subjects, and the NAF for personal and corporate names.
Why use a term URI instead of just the word?
A URI is unambiguous and language-independent: it identifies the exact concept even if the label is spelled differently or translated, and it lets your metadata link into the wider linked-data web.
Do I have to use an external vocabulary or can I make my own?
You can build a local vocabulary when no external one fits, but prefer an established standard where one exists so your data is interoperable. If you must build your own, document it and consider publishing it as SKOS.
How do controlled vocabularies improve search?
They collapse synonyms and spelling variants onto one term, so a search for that concept finds every record regardless of how a cataloguer phrased it. This raises both recall and precision.