Appearance
To licence your own heritage data, first confirm you actually hold the rights, then choose a licence that matches the data type — CC0 for factual metadata, CC BY for original datasets where attribution matters — and publish a machine-readable licence URI alongside a human-readable statement. The biggest practical decision is openness versus attribution friction, and for most catalogue metadata the answer that maximises reuse is CC0.
Step 1 — Confirm you have the rights to license
You can only license what you own. Audit your dataset for three layers: the underlying objects (often not yours), your original metadata and transcriptions (usually yours), and any database right in the compilation. Strip out third-party material or describe it separately before applying a blanket licence — licensing data you don't own is copyfraud.
Step 2 — Choose the right licence for the data type
The choice hinges on what kind of data it is and how much friction you can accept.
| Data type | Recommended | Why |
|---|---|---|
| Catalogue metadata, factual fields | CC0 | Weak copyright; max aggregation |
| Original research dataset, curated | CC BY 4.0 | Attribution, still very open |
| Code / scripts shipped with data | MIT or Apache-2.0 | Standard, permissive |
| Images you hold copyright in | CC BY / decide per use | Visual works are more original |
For metadata destined for Europeana or Wikidata, CC0 is effectively required — BY and NC variants get rejected or stripped by aggregators because attribution stacking is unworkable at scale.
Step 3 — Handle copyright and database right together
In the UK and EU your compilation may carry a sui generis database right on top of copyright. The good news: CC 4.0 licences cover both. A single CC0 1.0 waiver or CC BY 4.0 grant disposes of copyright and database right in one move, so you don't need a bespoke instrument.
Step 4 — Make the licence machine-readable
A licence buried in prose can't be harvested. Publish it in three places:
text
dataset/
LICENSE # full licence text + SPDX-style ID (CC0-1.0)
README.md # human statement: "Metadata: CC0. Images: see item records."
datapackage.json # "licenses":[{"name":"CC0-1.0","path":"https://..."}]json
{
"name": "parish-registers-1700-1850",
"licenses": [
{ "name": "CC0-1.0", "path": "https://creativecommons.org/publicdomain/zero/1.0/" }
]
}The URI is what harvesters and DCAT catalogues read; the README is what humans read.
Step 5 — License layers separately when needed
Rarely is one licence right for everything. Split metadata from media explicitly:
yaml
metadata_licence: "https://creativecommons.org/publicdomain/zero/1.0/" # CC0
image_rights: "http://rightsstatements.org/vocab/InC/1.0/" # per item
note: "All catalogue fields are CC0. Image reuse governed by per-item rights."What are the common pitfalls?
- Applying CC0 to a dataset that contains third-party copyrighted material — clean it first.
- Using NC or ND on metadata — it blocks aggregation and educational reuse and rarely helps.
- Forgetting the database right — confirm your licence version (4.0) covers it.
- Shipping only a prose statement — add the machine-readable URI.
- Treating a citation request as a binding term — keep the courtesy ask separate from the legal grant.
Key Takeaways
- License only what you own — strip or separate third-party material first.
- CC0 is the default for factual metadata; CC BY 4.0 when attribution genuinely matters.
- For Europeana/Wikidata aggregation, CC0 is effectively required.
- CC 4.0 waives/licenses copyright and the database right in one instrument.
- Publish a machine-readable licence URI plus a human-readable README.
- License media and metadata separately when their rights differ.
- A citation request alongside CC0 is a courtesy, not an enforceable term.
Frequently Asked Questions
What is the recommended default licence for heritage metadata?
CC0 is the widely recommended default for structured metadata, because facts and short factual records attract weak or no copyright and CC0 removes all friction for aggregation into Europeana, Wikidata and research datasets.
Should I use CC0 or CC BY for my data?
Use CC0 for metadata and factual datasets where you want maximum reuse and minimal litigation risk; use CC BY when attribution genuinely matters and the data is original enough to carry copyright, accepting that BY adds friction for downstream aggregators.
Can I license data that contains a database right?
Yes. CC 4.0 licences explicitly cover the EU/UK sui generis database right, so a single CC0 or CC BY 4.0 grant can waive or license both copyright and database right together.
How do I license images and metadata differently in one dataset?
Apply separate, clearly stated licences per layer: for example CC0 for the metadata fields and a RightsStatements.org value or CC BY for the images. Document the scope of each so reusers are not confused.
Where should the licence live so machines can read it?
Put a machine-readable licence URI in the dataset metadata (a data package descriptor, DCAT record, or a LICENSE file with an SPDX-style identifier) and a human-readable statement in the README, so both people and harvesters can act on it.
Can I add a non-binding request alongside CC0?
Yes. You can publish a community norm or citation request next to a CC0 dataset asking for attribution, but it is a courtesy, not an enforceable term — keep the legal grant and the request clearly separate.