Skip to content
Linked Open Data

To publish a linked open dataset, you make your RDF available at stable, resolvable URIs, attach an open licence such as CC0, provide a downloadable dump, and add a small description so catalogues and machines can find it. You do not need heavy infrastructure to start: a static Turtle file with a licence on a web host already counts as published linked open data. This guide walks a tiny dataset from spreadsheet to public, queryable resource.

What does "published" actually mean here?

Four things make a dataset genuinely published as LOD:

  1. Resolvable URIs that return data when opened in a browser.
  2. An open licence so others may legally reuse it.
  3. A downloadable dump (a Turtle or JSON-LD file).
  4. Discoverability through a catalogue and a machine-readable description.

A SPARQL endpoint is a welcome fifth element but is not required for your first release. Many respected datasets begin as a licensed file on Zenodo.

The five-star model: how good is good enough?

Tim Berners-Lee's five-star scale is the standard yardstick:

StarsWhat it means
Open licence, any format (even a PDF)
★★Structured, machine-readable (Excel)
★★★Non-proprietary format (CSV)
★★★★Uses URIs so others can link to it (RDF)
★★★★★Links out to other datasets

Heritage projects should target four or five stars. The jump from four to five, adding outbound links to Wikidata or GeoNames, is what makes your data part of the wider web rather than an island.

A worked example: from spreadsheet to triples

Start with a three-column spreadsheet of postcards: id, title, place. Convert it to Turtle with outbound links already in place:

turtle
@prefix dct: <http://purl.org/dc/terms/> .
@prefix schema: <http://schema.org/> .

<https://data.myarchive.org/card/PC-001> a schema:Photograph ;
    dct:title "Harbour at dawn"@en ;
    dct:spatial <https://sws.geonames.org/2654675/> .

The dct:spatial link to GeoNames is your fifth star. Even one outbound link per record transforms the dataset's value.

How do you actually put it online?

For a first release, no server is needed:

bash
# validate before you publish
riot --validate collection.ttl

# then deposit the file plus a LICENSE on Zenodo or a web host

riot (from Apache Jena) catches malformed Turtle before it embarrasses you. Upload the validated file and a LICENSE stating CC0. That alone is a four-star, licensed, downloadable dataset.

To add the optional SPARQL endpoint later, load the same file into Fuseki:

bash
fuseki-server --file=collection.ttl /ds

What licence makes it truly open?

Use CC0 wherever you can. It places the data in the public domain and is the de facto norm of the LOD cloud, so reusers face zero friction. CC BY is acceptable when attribution genuinely matters. Avoid non-commercial or no-derivatives clauses entirely; they break interoperability and disqualify your data from much of the open ecosystem.

How will anyone find it?

Discoverability is a publishing step, not a hope. Three concrete actions:

  • Deposit in a registered catalogue (Zenodo, re3data) that mints a DOI.
  • Add a VoID description, a small RDF file stating the dataset's size, vocabularies, licence and example resources.
  • Link to it from your institution's website and any related dataset.

A minimal VoID stub:

turtle
@prefix void: <http://rdfs.org/ns/void#> .
@prefix dct: <http://purl.org/dc/terms/> .
<https://data.myarchive.org/dataset/postcards>
    a void:Dataset ;
    dct:title "Postcard collection as LOD"@en ;
    dct:license <https://creativecommons.org/publicdomain/zero/1.0/> ;
    void:exampleResource <https://data.myarchive.org/card/PC-001> .

Key Takeaways

  • Publishing means resolvable URIs, an open licence, a downloadable dump and discoverability.
  • You can start without a server: a validated, licensed Turtle file on Zenodo counts.
  • Aim for four or five stars; outbound links to Wikidata or GeoNames earn the fifth.
  • Validate with riot before release, then add a SPARQL endpoint as an upgrade.
  • Use CC0 by default; avoid non-commercial and no-derivatives clauses.
  • Make it findable with a catalogue DOI and a small VoID description.

Frequently Asked Questions

What does it mean to 'publish' a linked open dataset?

It means making your RDF available at stable URIs that resolve in a browser, providing a downloadable dump, attaching an open licence, and ideally offering a SPARQL endpoint so others can query it directly.

Do I need a server to publish LOD?

Not necessarily. You can publish a static Turtle or JSON-LD file with a licence on any web host or repository like Zenodo. A SPARQL endpoint is a useful upgrade, not a requirement for step one.

What licence should I attach?

CC0 is the strongest signal for reuse and the norm for the LOD cloud. CC BY is acceptable if attribution matters to you, but avoid non-commercial or no-derivatives terms, which break the 'open' part.

How do people find my dataset once it is published?

Register it in a catalogue such as Zenodo, re3data or the Linked Open Data cloud, add a VoID description, and link to it from your institution's site. Discoverability is a publishing step, not an afterthought.

What are the five stars of linked open data?

Tim Berners-Lee's scale: one star for open-licensed data in any format, rising to five stars for RDF that links out to other datasets. Most heritage projects should aim for at least four stars.

What is a VoID file?

VoID (Vocabulary of Interlinked Datasets) is a small RDF description of your dataset: its size, vocabularies, licence, example resources and links. It helps machines and catalogues understand what you published.