Appearance
Linking a collection to LOD means swapping fragile free-text values, a creator's name typed slightly differently in every record, for stable URIs that point to shared, authoritative entities on the open web. The payoff is that your catalogue stops being an island: a researcher who knows an artist by their Wikidata or VIAF identifier can find your objects, and you inherit dates, alternate names and relationships from the authority instead of maintaining them yourself. You can start today in a spreadsheet; you do not need a triplestore or RDF to begin.
What is linked open data, in plain language?
Three plain ideas underlie LOD:
- Things get URIs. Vermeer is not the string "Vermeer" but the identifier
https://www.wikidata.org/entity/Q41264. - URIs resolve. Put that URI in a browser and you get data back about the thing.
- Data links to other data. That Vermeer record points onward to his works, his city, his dates, all by URI.
Your collection joins this web the moment your records reference the same URIs.
How do I start linking without any new software?
Add a column. If you have a "Creator" column with names, add a "Creator URI" column and fill it with the authority identifier for each person.
text
Title | Creator | Creator URI
View of Delft | Johannes Vermeer | https://www.wikidata.org/entity/Q41264
The Milkmaid | Johannes Vermeer | https://www.wikidata.org/entity/Q41264
Self-portrait | Rembrandt | https://www.wikidata.org/entity/Q5598That is genuine linking. The values are now stable identifiers any other dataset can match against. Tools like OpenRefine automate filling this column through reconciliation, but the concept is just a column of URIs.
Which authorities should a beginner link to?
| Entity type | Good first authority | Why |
|---|---|---|
| People | Wikidata, VIAF, LoC | Wide coverage, onward links |
| Places | GeoNames, World Historical Gazetteer | Coordinates and variants |
| Concepts | Getty AAT, LCSH | Standard vocabularies |
| Works | Wikidata | Connects to creators and events |
Pick whichever your discipline already cites so your links interoperate with peers.
A small worked example
Say you have three painting records. Reconcile the creators to Wikidata, store the URIs, then express the link explicitly. In Turtle it reads almost like a sentence:
turtle
@prefix dcterms: <http://purl.org/dc/terms/> .
<https://data.example.org/object/00041523>
dcterms:title "View of Delft" ;
dcterms:creator <https://www.wikidata.org/entity/Q41264> .The object is your record; the creator is a shared, resolvable entity. You have linked your collection to the wider graph in three lines.
What is owl:sameAs and when do I use it?
Sometimes you already minted your own URI for an entity and later discover the authority has one too. owl:sameAs declares them identical so consumers merge the records.
turtle
<https://data.example.org/person/p042>
owl:sameAs <https://www.wikidata.org/entity/Q41264> .Use it only when you are certain the two URIs denote exactly the same thing. A careless sameAs spreads bad data across every dataset that trusts yours.
Why link instead of copy?
Copying a birth date from Wikidata freezes a snapshot that goes stale and loses its provenance. Linking stores a pointer, so the authority stays the source of truth and corrections upstream flow to you. Linking is lighter to maintain and honest about where facts come from.
What is the smallest next step after a spreadsheet?
Once your URI columns exist, you can:
- Export to CSV and keep using it as enriched tabular data, or
- Convert to RDF/Turtle when you want to run graph queries, or
- Publish a small dataset so others can link to you.
None of these is mandatory on day one. Linking proves its value while still in a spreadsheet.
Key Takeaways
- Linking replaces free-text values with stable, resolvable URIs for shared entities.
- You can start in a spreadsheet by adding a URI column; no triplestore needed.
- Choose authorities your discipline cites: Wikidata, VIAF, GeoNames, Getty AAT.
- Linking keeps the authority as the source of truth; copying freezes stale snapshots.
- Use
owl:sameAsonly when two URIs truly denote the same thing. - RDF conversion is a later, optional step, not a prerequisite for linking.
- The goal is to make your collection findable and connected, not isolated.
Frequently Asked Questions
What does it actually mean to link a collection to LOD?
It means replacing free-text values like a creator's name or a place with stable URIs that point to authoritative records on the open web, so your data connects to other datasets instead of sitting in isolation.
Do I need a triplestore to start linking?
No. You can begin in a spreadsheet by adding a column of URIs next to your existing values. A triplestore becomes useful later when you want to query the graph or publish a SPARQL endpoint, not on day one.
Which authority should I link people and places to first?
Start with Wikidata for broad coverage and onward links, VIAF or the Library of Congress authorities for people, and GeoNames or a historical gazetteer for places. Pick the one your community already cites.
What is a sameAs link and when do I use it?
owl:sameAs asserts that two URIs denote the exact same thing, letting consumers merge information across datasets. Use it only when you are confident the entities are identical, because a wrong sameAs propagates errors widely.
How is linking different from just copying data from Wikidata?
Linking stores a pointer (a URI) and lets the authoritative source stay the source of truth, so updates flow through. Copying freezes a snapshot that goes stale and detaches from its provenance.
Do I have to convert everything to RDF before I can link?
No. Linking and RDF are separable. You can add authority URIs to tabular data first, prove the value, and convert to RDF only when you need graph queries or formal publication.