Beginner's Guide to Datasets citable

To make a dataset citable you give it three things: a persistent identifier (almost always a DOI), stable descriptive metadata, and a frozen archived version that will not change underneath the citation. The simplest path for a beginner is to deposit the dataset in a free repository such as Zenodo, which mints the DOI and generates the citation string for you. After that, anyone — including future you — can cite the exact data you used, and you get credit when they do.

Why should a dataset be citable at all?

Two reasons, one scholarly and one practical. Scholarly: citing data makes research reproducible — a reader can fetch the precise dataset behind a claim. Practical: citation is how data creators get credit. A well-cited dataset is a line on your CV and, increasingly, something funders count. An uncited spreadsheet emailed around as an attachment is invisible to both.

What are the three ingredients of a citable dataset?

Think of citability as a small recipe.

Ingredient	What it does	How you get it
Persistent identifier	A link that never breaks	DOI from a repository
Stable metadata	Describes who, what, when	Fill the deposit form
Fixed version	Stops silent changes	Repository freezes on publish

Miss any one and the citation weakens: a URL without a DOI rots, metadata without a frozen version becomes a moving target.

How do I make my first dataset citable? (A worked example)

Follow this from a standing start with Zenodo, which is free and needs no institutional account.

text

1. Sign in at zenodo.org (ORCID login is fine).
2. New upload -> drag in your CSV and README.
3. Fill in:
     Title:    Gloucestershire Baptisms, 1813-1837
     Creators: Reed, Elara  (add your ORCID)
     Type:     Dataset
     Licence:  CC BY 4.0
     Keywords: parish registers; baptisms; Gloucestershire; 1810s
4. Reserve a DOI (optional) or just Publish.
5. Zenodo mints, e.g., 10.5281/zenodo.7654321 and freezes the files.

That is it. Your data now has a permanent, citable identity.

What does the finished citation look like?

Zenodo (and most repositories) hands you the formatted citation. It follows the DataCite pattern:

text

Reed, E. (2025). Gloucestershire Baptisms, 1813-1837 (v1.0)
[Data set]. Zenodo. https://doi.org/10.5281/zenodo.7654321

Creators, year, title, version, repository, DOI — copy it straight into your reference list. You never assemble this by hand.

How do I add machine-readable citation to a project?

If your data lives in a Git repository, add a CITATION.cff file so tools can offer a one-click citation.

yaml

cff-version: 1.2.0
title: "Gloucestershire Baptisms, 1813-1837"
authors:
  - family-names: Reed
    given-names: Elara
    orcid: "https://orcid.org/0000-0000-0000-0000"
version: 1.0.0
doi: 10.5281/zenodo.7654321
date-released: 2025-02-11
license: CC-BY-4.0
type: dataset

GitHub reads this automatically and shows a "Cite this repository" button.

Can I cite one specific version?

Yes, and you should when reproducibility matters. Versioning repositories mint a DOI per version plus an overarching concept DOI.

text

Concept DOI 10.5281/zenodo.7654320  -> "always the latest"
  v1.0       10.5281/zenodo.7654321  -> frozen exact state
  v1.1       10.5281/zenodo.7654399  -> after corrections

Cite the version DOI in a paper so the reader gets exactly what you analysed; cite the concept DOI on a project page so readers find the newest release.

Common beginner mistakes

Sharing a Google Sheets link and calling it a citation — links rot, sheets change.
Forgetting the licence, which legally blocks reuse even of openly available data.
Leaving keywords and coverage blank, making the dataset hard to find.
Citing a moving version when the paper needs a frozen one.

Key Takeaways

Citability needs three things: a persistent identifier, stable metadata, and a frozen version.
Depositing in Zenodo or similar mints a free DOI and writes the citation for you.
A DOI is permanent where a URL rots — that permanence is what citation requires.
A data citation lists creators, year, title, version, repository and DOI.
Use version DOIs for reproducibility and the concept DOI to point at the latest.
A CITATION.cff file makes citation machine-readable and enables one-click citing.

Frequently Asked Questions

What makes a dataset citable?

A dataset becomes citable when it has a persistent identifier (usually a DOI), stable metadata describing it, and a fixed, archived version that will not silently change. Together these let others reference the exact thing you made.

Do I have to pay to get a DOI for my data?

Not as an individual researcher. Depositing in Zenodo, Figshare, Dryad or most institutional repositories mints a DOI for free. The repository pays DataCite for the DOI infrastructure on your behalf.

What is the difference between a DOI and a URL?

A URL points at a location that can move or vanish; a DOI is a persistent identifier that resolves to the current location and is committed to never breaking. That permanence is exactly what citation needs.

What should a data citation include?

Creators, year, title, version, publisher or repository, and the DOI. Most repositories generate a ready-made citation string for you when you deposit, so you rarely have to assemble it by hand.

Can I cite a specific version of a dataset?

Yes. Versioning repositories mint a separate DOI per version plus a concept DOI for the whole. Cite the version DOI for exact reproducibility and the concept DOI when you want readers to follow the latest.

What is a CITATION.cff file?

CITATION.cff is a small plain-text file in a project or repository that states how the work should be cited in a machine-readable format. GitHub and other tools read it to offer a one-click citation.

Why should a dataset be citable at all? ​

What are the three ingredients of a citable dataset? ​

How do I make my first dataset citable? (A worked example) ​

What does the finished citation look like? ​

How do I add machine-readable citation to a project? ​

Can I cite one specific version? ​

Common beginner mistakes ​

Key Takeaways ​

Frequently Asked Questions ​

What makes a dataset citable? ​

Do I have to pay to get a DOI for my data? ​

What is the difference between a DOI and a URL? ​

What should a data citation include? ​

Can I cite a specific version of a dataset? ​

What is a CITATION.cff file? ​

Related reading ​

Why should a dataset be citable at all?

What are the three ingredients of a citable dataset?

How do I make my first dataset citable? (A worked example)

What does the finished citation look like?

How do I add machine-readable citation to a project?

Can I cite one specific version?

Common beginner mistakes

Key Takeaways

Frequently Asked Questions

What makes a dataset citable?

Do I have to pay to get a DOI for my data?

What is the difference between a DOI and a URL?

What should a data citation include?

Can I cite a specific version of a dataset?

What is a CITATION.cff file?

Related reading