How to Licence your own heritage data

To licence your own heritage data, first confirm you actually hold the rights, then choose a licence that matches the data type — CC0 for factual metadata, CC BY for original datasets where attribution matters — and publish a machine-readable licence URI alongside a human-readable statement. The biggest practical decision is openness versus attribution friction, and for most catalogue metadata the answer that maximises reuse is CC0.

Step 1 — Confirm you have the rights to license

You can only license what you own. Audit your dataset for three layers: the underlying objects (often not yours), your original metadata and transcriptions (usually yours), and any database right in the compilation. Strip out third-party material or describe it separately before applying a blanket licence — licensing data you don't own is copyfraud.

Step 2 — Choose the right licence for the data type

The choice hinges on what kind of data it is and how much friction you can accept.

Data type	Recommended	Why
Catalogue metadata, factual fields	CC0	Weak copyright; max aggregation
Original research dataset, curated	CC BY 4.0	Attribution, still very open
Code / scripts shipped with data	MIT or Apache-2.0	Standard, permissive
Images you hold copyright in	CC BY / decide per use	Visual works are more original

For metadata destined for Europeana or Wikidata, CC0 is effectively required — BY and NC variants get rejected or stripped by aggregators because attribution stacking is unworkable at scale.

Step 3 — Handle copyright and database right together

In the UK and EU your compilation may carry a sui generis database right on top of copyright. The good news: CC 4.0 licences cover both. A single CC0 1.0 waiver or CC BY 4.0 grant disposes of copyright and database right in one move, so you don't need a bespoke instrument.

Step 4 — Make the licence machine-readable

A licence buried in prose can't be harvested. Publish it in three places:

text

dataset/
  LICENSE                # full licence text + SPDX-style ID (CC0-1.0)
  README.md              # human statement: "Metadata: CC0. Images: see item records."
  datapackage.json       # "licenses":[{"name":"CC0-1.0","path":"https://..."}]

json

{
  "name": "parish-registers-1700-1850",
  "licenses": [
    { "name": "CC0-1.0", "path": "https://creativecommons.org/publicdomain/zero/1.0/" }
  ]
}

The URI is what harvesters and DCAT catalogues read; the README is what humans read.

Step 5 — License layers separately when needed

Rarely is one licence right for everything. Split metadata from media explicitly:

yaml

metadata_licence: "https://creativecommons.org/publicdomain/zero/1.0/"   # CC0
image_rights:     "http://rightsstatements.org/vocab/InC/1.0/"           # per item
note: "All catalogue fields are CC0. Image reuse governed by per-item rights."

What are the common pitfalls?

Applying CC0 to a dataset that contains third-party copyrighted material — clean it first.
Using NC or ND on metadata — it blocks aggregation and educational reuse and rarely helps.
Forgetting the database right — confirm your licence version (4.0) covers it.
Shipping only a prose statement — add the machine-readable URI.
Treating a citation request as a binding term — keep the courtesy ask separate from the legal grant.

Key Takeaways

License only what you own — strip or separate third-party material first.
CC0 is the default for factual metadata; CC BY 4.0 when attribution genuinely matters.
For Europeana/Wikidata aggregation, CC0 is effectively required.
CC 4.0 waives/licenses copyright and the database right in one instrument.
Publish a machine-readable licence URI plus a human-readable README.
License media and metadata separately when their rights differ.
A citation request alongside CC0 is a courtesy, not an enforceable term.

Frequently Asked Questions

What is the recommended default licence for heritage metadata?

CC0 is the widely recommended default for structured metadata, because facts and short factual records attract weak or no copyright and CC0 removes all friction for aggregation into Europeana, Wikidata and research datasets.

Should I use CC0 or CC BY for my data?

Use CC0 for metadata and factual datasets where you want maximum reuse and minimal litigation risk; use CC BY when attribution genuinely matters and the data is original enough to carry copyright, accepting that BY adds friction for downstream aggregators.

Can I license data that contains a database right?

Yes. CC 4.0 licences explicitly cover the EU/UK sui generis database right, so a single CC0 or CC BY 4.0 grant can waive or license both copyright and database right together.

How do I license images and metadata differently in one dataset?

Apply separate, clearly stated licences per layer: for example CC0 for the metadata fields and a RightsStatements.org value or CC BY for the images. Document the scope of each so reusers are not confused.

Where should the licence live so machines can read it?

Put a machine-readable licence URI in the dataset metadata (a data package descriptor, DCAT record, or a LICENSE file with an SPDX-style identifier) and a human-readable statement in the README, so both people and harvesters can act on it.

Can I add a non-binding request alongside CC0?

Yes. You can publish a community norm or citation request next to a CC0 dataset asking for attribution, but it is a courtesy, not an enforceable term — keep the legal grant and the request clearly separate.

Step 1 — Confirm you have the rights to license ​

Step 2 — Choose the right licence for the data type ​

Step 3 — Handle copyright and database right together ​

Step 4 — Make the licence machine-readable ​

Step 5 — License layers separately when needed ​

What are the common pitfalls? ​

Key Takeaways ​

Frequently Asked Questions ​

What is the recommended default licence for heritage metadata? ​

Should I use CC0 or CC BY for my data? ​

Can I license data that contains a database right? ​

How do I license images and metadata differently in one dataset? ​

Where should the licence live so machines can read it? ​

Can I add a non-binding request alongside CC0? ​

Related reading ​