Appearance
When volunteers transcribe, tag or describe your historical material, you need to settle two separate rights layers before launch: the rights in the original document (which the project must clear) and the rights in the volunteer's added contribution (which a contributor agreement handles). The simplest, most reusable default is to ask contributors to release their work under CC0, so the data is free for everyone — but you must say so before anyone contributes, not after. This guide walks a complete beginner through why, and shows a worked example.
Why does crowdsourced data even need a licence?
Because contributions can be copyrightable. A purely mechanical transcription often has too little originality to attract new copyright — but the moment a volunteer adds a description, a translation, a tag taxonomy, or editorial notes, judgement enters and copyright can subsist. If a thousand people each hold rights in fragments of your dataset, reuse becomes a nightmare. A clear licence collapses that thousand-way uncertainty into one predictable answer.
What are the two layers, plainly?
Think of every crowdsourced record as a sandwich:
text
[ Volunteer's contribution ] <- governed by your contributor agreement
[ The original document ] <- governed by the source's own rightsA CC0 contributor licence frees the top layer. It does not magically free the bottom layer — if the underlying manuscript is still in copyright, the transcription does not unlock it. Beginners often conflate these; keep them separate.
What is the safest default licence?
For transcription and tagging projects, CC0 is the workhorse. It removes "attribution stacking" (the problem of crediting thousands of people) and maximises reuse, which is why platforms like Zooniverse and many library programmes default to it. If your community strongly values credit, CC BY is the next step, paired with collective attribution.
| Licence | Volunteers get credit? | Reuse friction | Typical use |
|---|---|---|---|
| CC0 | No (community thanked, not legally required) | Lowest | Mass transcription |
| CC BY 4.0 | Yes, collectively | Low | Credit-valuing communities |
| CC BY-SA 4.0 | Yes + share-alike | Medium | Wiki-style ecosystems |
A small worked example
Imagine a project transcribing 5,000 WWI diary pages. Here is the minimal setup:
- Clear the source. The diaries were written 1916; the author died 1969, so UK copyright runs to 2039. You secure a licence from the depositing family to publish transcriptions.
- Write the contributor terms. One short paragraph, shown at sign-up.
- Show it at the point of contribution, with an explicit "I agree" step.
text
Contributor terms (shown before first task):
"By contributing transcriptions, tags or notes to this project you
agree to release your contribution under a Creative Commons CC0 1.0
public-domain dedication. You confirm your contribution is your own
work. The original diaries remain the property of their depositors
and are published here under separate permission."That single paragraph, plus an unticked checkbox the volunteer must tick, is enough to make the contribution layer clean. The diary text itself is handled by step 1.
How do I record who agreed, and when?
Keep a lightweight provenance trail so the licence is enforceable later:
text
contribution_id, user_id (or anon-hash), task, timestamp, terms_versionStoring terms_version matters: if you ever revise the terms, you can prove which version each contributor accepted. This is also why you must set terms before launch — you cannot retroactively impose CC0 on contributions someone already made under no stated terms.
What about attribution if I do not use CC0?
Use collective attribution. Rather than listing 1,200 names against every record, credit the community as a body — "Transcribed by volunteers of the Diaries Project" — and maintain a single contributors page you can link to. State this collective approach explicitly in the terms so volunteers know what credit to expect. Done well, this satisfies CC BY while staying practical at scale. For deeper guidance see crediting volunteers.
Common beginner mistakes
- Launching first, deciding the licence later (you cannot retro-license).
- Assuming a CC0 transcription frees an in-copyright original.
- Listing every individual for attribution instead of crediting collectively.
- Not versioning the terms, so you cannot prove what people agreed to.
- Forgetting that exported data inherits the licence — downstream users rely on it.
Key Takeaways
- Crowdsourced records have two rights layers: the original and the contribution.
- A contributor agreement only governs the contribution layer.
- CC0 is the safest, lowest-friction default; CC BY with collective attribution is the credit-friendly alternative.
- Decide and display terms before anyone contributes — you cannot retro-license.
- Record
user,timestampandterms_versionto keep the licence enforceable. - A CC0 transcription does not free an underlying in-copyright work.
Frequently Asked Questions
Do volunteer transcribers own copyright in their transcriptions?
It depends. A faithful, mechanical transcription of someone else's text usually has too little originality to attract fresh copyright, but tags, notes, descriptions and translations that involve judgement often do. Because the line is fuzzy, projects use a contributor agreement to settle it up front.
What is the safest default licence for crowdsourced data?
CC0 is the most common default for transcription projects because it removes attribution-stacking headaches and makes the data freely reusable. If your community values credit, CC BY is the next most common choice, with collective rather than per-person attribution.
Can I change the licence after the project has started?
Only prospectively, and only with care. You cannot retroactively re-license contributions people already made under different terms unless your original agreement allowed it, so set the terms before volunteers contribute.
How do I attribute thousands of anonymous volunteers under CC BY?
Use collective attribution: credit the project and community as a whole (for example, 'Transcribed by volunteers of the X Project') rather than listing every individual, and link to a contributors page. State this approach in your contributor terms.
What if a volunteer transcribes a document that is still in copyright?
The transcription cannot grant rights the underlying document never gave up. You must clear or confirm the source's rights separately; the contributor licence only governs the volunteer's added layer, not the original work.