Appearance
A data paper is a short, peer-reviewed publication whose subject is a dataset: it describes how the data was created, what it contains and how to reuse it, without arguing a thesis about the phenomena. You write one to make your data findable and citable and to earn formal academic credit for curation work that would otherwise go unrecognised. Think of it as the manual and certificate of authenticity for a dataset that lives, separately, in a repository.
What exactly is a data paper?
Where a normal article says "here is what the evidence means," a data paper says "here is the evidence, how I built it, and how you can trust and reuse it." It is deliberately argument-free. The dataset is deposited in a repository and minted a DOI; the paper is published in a journal and cites that DOI. The two are linked but live in different places — a division that lets the data be updated and re-versioned independently of the prose.
Why bother writing one?
Three concrete payoffs:
- Credit. Years assembling a corpus or gazetteer become a citable, peer-reviewed output on your CV.
- Reuse. A documented dataset is found and reused; an undocumented Zenodo upload is not.
- Quality. Peer review of the data catches gaps in documentation you would never spot alone.
Where do humanities data papers get published?
| Journal | Scope | Notes |
|---|---|---|
| Journal of Open Humanities Data | Humanities datasets | Open access, short format |
| Research Data Journal for the Humanities and Social Sciences | HSS data | Brill, peer-reviewed |
| Data in Brief | Cross-disciplinary | Strict template, fast |
| Journal of Cultural Analytics | Data + light analysis | Accepts dataset descriptions |
Each links the published paper to the deposited dataset by DOI, so deposit first, then write.
What goes in a data paper?
The structure is fairly standard, and short — most run 1,500–3,000 words because the dataset, not the prose, carries the weight:
text
1. Title and abstract - what the dataset is, in one paragraph
2. Background / context - why it exists, what gap it fills
3. Methods - how it was collected, transcribed, cleaned
4. Data description - files, formats, fields, record counts
5. Reuse potential - who could use it and how
6. Availability statement - repository, DOI, licenceA small worked example
Suppose you have transcribed 1,200 Victorian charity-bazaar advertisements into a CSV. A reuse and availability block might read:
markdown
## Reuse potential
This corpus supports research on Victorian philanthropy, women's
associational culture, and the language of charitable appeal. Each
record carries a date, place, organising body and verbatim text,
enabling both quantitative trend analysis and close reading.
## Availability
- Repository: Zenodo
- DOI: 10.5281/zenodo.1234567
- Licence: CC BY 4.0
- Format: UTF-8 CSV with an accompanying data dictionary (data_dictionary.md)Notice it states the licence and points to a data dictionary — reviewers will check both.
How do I describe the data clearly?
Give counts, formats and field meanings, not adjectives. A reader should be able to open your files knowing what to expect:
markdown
## Data description
- `bazaar_ads.csv` — 1,200 rows, 7 columns, UTF-8.
- `id` : unique record identifier
- `date` : ISO 8601 (YYYY-MM-DD), exact where printed
- `place` : settlement name, modern spelling
- `body` : organising charity, as printed
- `text` : verbatim advertisement
- `data_dictionary.md` — full field definitions and value domains.
- Source: British Newspaper Archive, 1850–1890.Avoid vague phrases like "various dates"; state the range and the encoding.
Common beginner mistakes
- Treating it like a research article and sneaking in conclusions — reviewers will ask you to remove them.
- Forgetting the licence or leaving the dataset "all rights reserved".
- Describing fields in prose instead of a structured dictionary.
- Depositing the data after submission, so the DOI is missing at review.
Key Takeaways
- A data paper documents and credits a dataset; it does not argue a thesis.
- Deposit the data with its own DOI first, then write the paper that cites it.
- Target a dedicated journal such as the Journal of Open Humanities Data.
- Keep it short — methods, data description and reuse, around 1,500–3,000 words.
- State the licence and link a data dictionary; reviewers check both.
- Describe data with counts, formats and field definitions, not adjectives.
- It is peer-reviewed and citable, so it counts toward your academic record.
Frequently Asked Questions
What is a data paper?
A data paper is a peer-reviewed publication that describes a dataset rather than arguing a thesis about it. Its purpose is to make the data findable, citable and reusable, and to give the people who created it formal academic credit.
How is a data paper different from a normal research article?
A research article makes an argument and interprets findings; a data paper documents how a dataset was made, what it contains and how to reuse it. A data paper deliberately avoids hypotheses and conclusions about the phenomena studied.
Where do I publish a humanities data paper?
Dedicated journals such as the Journal of Open Humanities Data, Research Data Journal for the Humanities and Social Sciences, and Data in Brief accept them. The dataset itself is deposited in a repository and the paper links to it by DOI.
Do I deposit the data and the paper separately?
Yes. The dataset goes to a repository like Zenodo or the UK Data Service and receives its own DOI, while the data paper is published in a journal and cites that DOI. The two objects are linked but distinct.
How long is a typical data paper?
Most are short, around 1,500 to 3,000 words, because the dataset carries the substance. The text focuses on methods, structure and reuse rather than extended discussion.
Does a data paper count for my academic record?
Increasingly yes. It is peer-reviewed and citable, so it appears in your publication list and gives credit for data work that a traditional article would bury in a footnote.