Model heritage data as RDF triples: A Practical Guide

To model heritage data as RDF triples, you break each catalogue record into atomic subject-predicate-object statements where the subject is your record's URI, the predicate is a reused property, and the object is either a literal value or another URI. One spreadsheet row becomes many triples that all share the same subject. The skill is deciding which values stay as text and which become links.

What does one record look like as triples?

Take a single manuscript row with title, date, place and subject. Each field is its own statement about the same subject URI:

turtle

@prefix dct: <http://purl.org/dc/terms/> .
@prefix schema: <http://schema.org/> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

<https://data.myarchive.org/item/MS-0421> a schema:Manuscript ;
    dct:title "Letter to John Aubrey"@en ;
    dct:created "1685"^^xsd:gYear ;
    dct:spatial <https://sws.geonames.org/2654675/> ;
    dct:creator <https://data.myarchive.org/person/p-0093> .

That is five triples. Read each line as a sentence ending in a full stop, and you have read the model.

How do you decide literal versus URI?

This is the central modelling judgement. The rule of thumb:

Literal when the value is descriptive text or a raw measurement: titles, scope notes, dimensions, transcribed text.
URI when the value is an entity others might also reference: people, places, subjects, organisations, concepts.

The moment you write the same string twice and would want them to mean the same thing, that string should become a URI. "Bristol" in fifty records should resolve to one GeoNames URI, not fifty identical literals.

Modelling people and agents without losing yourself

A creator is not a string; it is an entity with its own properties. Give people their own subject URIs and describe them once:

turtle

<https://data.myarchive.org/person/p-0093> a schema:Person ;
    schema:name "John Aubrey" ;
    schema:birthDate "1626"^^xsd:gYear ;
    schema:sameAs <http://www.wikidata.org/entity/Q353702> .

The schema:sameAs link to Wikidata is what turns isolated data into linked data. Now any tool that knows Aubrey from Wikidata can connect to your holdings.

How do you handle dates, ranges and uncertainty?

Never bury structure inside a string like "circa 1680-1690". Decompose it:

turtle

<https://data.myarchive.org/item/MS-0510>
    schema:startDate "1680"^^xsd:gYear ;
    schema:endDate   "1690"^^xsd:gYear ;
    dct:temporal     "circa, approximate"@en .

Machines can now filter by date range, while the human-facing qualifier survives. For deep event modelling, CIDOC CRM offers timespans, but that is a later step.

When are blank nodes acceptable?

Blank nodes (anonymous resources) are fine for internal structure with no external identity, such as a measurement bundle:

turtle

<https://data.myarchive.org/item/MS-0421>
    schema:height [ a schema:QuantitativeValue ;
                    schema:value 21 ; schema:unitText "cm" ] .

But anything you might link to from outside, a person, a place, a work, deserves a real URI. Blank nodes do not dereference and cannot be referenced across datasets.

Choosing a serialisation: which and when?

Serialisation	Best for	Why
Turtle	Authoring, review	Human-readable, compact prefixes
N-Triples	Bulk load, diffs	One triple per line, line-stable in git
JSON-LD	Web apps, APIs	Native to JavaScript and schema.org consumers
RDF/XML	Legacy interchange	Verbose; avoid unless a tool demands it

They carry identical information. Convert with rapper (from raptor2) or rdflib:

bash

rapper -i turtle -o ntriples collection.ttl > collection.nt

A reusable Python sketch

For repeatable conversion from CSV, rdflib keeps the mapping in code you can version:

python

from rdflib import Graph, URIRef, Literal, Namespace
from rdflib.namespace import DCTERMS, XSD

g = Graph()
BASE = Namespace("https://data.myarchive.org/item/")
for row in records:
    s = URIRef(BASE + row["id"])
    g.add((s, DCTERMS.title, Literal(row["title"], lang="en")))
    if row["year"]:
        g.add((s, DCTERMS.created, Literal(row["year"], datatype=XSD.gYear)))
g.serialize("collection.ttl", format="turtle")

Key Takeaways

One record becomes many triples that share a subject URI.
Turn reusable entities (people, places, subjects) into URIs; keep raw text as literals.
Give people and agents their own URIs and link them with schema:sameAs.
Decompose dates and ranges into typed start/end values, not strings.
Use blank nodes only for internal structure with no external identity.
Author in Turtle, load N-Triples, serve JSON-LD; they are interchangeable.

Frequently Asked Questions

What exactly is an RDF triple?

A triple is a statement of subject, predicate, object, for example 'item MS-0421' (subject) 'was created in' (predicate) 'the year 1685' (object). Every fact in your dataset becomes one triple.

Should a spreadsheet row become one triple or many?

Many. A single catalogue row typically explodes into one triple per populated column: title, date, creator, place, subject and so on, all sharing the same subject URI.

When should an object be a literal versus a URI?

Use a literal for free text and raw values like titles or notes. Use a URI whenever the value is an entity you might link to or reuse, such as a person, place, subject or agent.

How do I handle a date range like 'circa 1680 to 1690'?

Model it with explicit start and end properties (for example schema:startDate and schema:endDate) typed as xsd:gYear, and record uncertainty separately rather than burying 'circa' in a string.

Do I need blank nodes?

Sometimes, for intermediate structures like a measurement or an address that has no independent identity. Where the thing could ever be referenced externally, mint a real URI instead of a blank node.

Which serialisation should I write triples in?

Turtle for human authoring and review, N-Triples for bulk loading and diffing, and JSON-LD when feeding web applications. They are interchangeable; pick by task.

What does one record look like as triples? ​

How do you decide literal versus URI? ​

Modelling people and agents without losing yourself ​

How do you handle dates, ranges and uncertainty? ​

When are blank nodes acceptable? ​

Choosing a serialisation: which and when? ​

A reusable Python sketch ​

Key Takeaways ​

Frequently Asked Questions ​

What exactly is an RDF triple? ​

Should a spreadsheet row become one triple or many? ​

When should an object be a literal versus a URI? ​

How do I handle a date range like 'circa 1680 to 1690'? ​

Do I need blank nodes? ​

Which serialisation should I write triples in? ​

Related reading ​

What does one record look like as triples?

How do you decide literal versus URI?

Modelling people and agents without losing yourself

How do you handle dates, ranges and uncertainty?

When are blank nodes acceptable?

Choosing a serialisation: which and when?

A reusable Python sketch

Key Takeaways

Frequently Asked Questions

What exactly is an RDF triple?

Should a spreadsheet row become one triple or many?

When should an object be a literal versus a URI?

How do I handle a date range like 'circa 1680 to 1690'?

Do I need blank nodes?

Which serialisation should I write triples in?

Related reading