Appearance
A data management plan (DMP) is a short, structured document — usually one to three pages — that states what data your project will create, how you will store and document it during the work, and how you will preserve and share it afterwards. For historians and archivists the fastest route is to start in DMPonline or DMPTool with your funder's template, then answer six questions concretely: what data, what formats, how documented, how stored, what ethics and rights, and where it will live long-term. Write the first draft before you collect anything.
What goes in a humanities DMP?
Funders vary, but every DMP answers the same six areas. Here is a working skeleton with humanities-specific defaults.
text
1. Data description - sources, types, volumes, formats produced
2. Documentation - metadata standard, README, data dictionary
3. Storage & backup - 3-2-1 rule, where, who has access
4. Ethics & legal - copyright, GDPR, sensitive records, consent
5. Selection & retain - what to keep, what to discard, for how long
6. Sharing & preserve - repository, DOI, licence, embargoThe single most common weakness reviewers flag is vagueness. "Data will be stored securely" scores nothing. "Working files held in the institutional OneDrive with nightly backup, plus a quarterly archival copy to the institutional repository" is a plan.
How do I describe my data before it exists?
Estimate, but be specific. State the source material, the digital outputs, expected volumes and formats.
The project will transcribe approximately 4,000 pages of nineteenth-century parish registers (Gloucestershire Archives, D2052) using Transkribus. Outputs: PAGE-XML transcriptions (~120 MB), a derived CSV of baptism records (~5 MB), and TEI P5 editions of selected pages. Raw scans (TIFF, ~40 GB) remain with the archive.
This paragraph alone answers volume, format, provenance and the boundary of what you will and will not steward.
Which storage and backup model should I commit to?
Commit to the 3-2-1 rule explicitly: three copies, on two media types, one off-site. Spell out the actual locations.
text
Copy 1: working files -> institutional cloud (OneDrive/SharePoint)
Copy 2: local mirror -> encrypted external SSD, weekly
Copy 3: off-site/archive-> institutional repository, quarterly
Integrity: SHA-256 fixity check on the archival copy at each depositNaming the cadence ("weekly", "quarterly") and the integrity check signals competence to reviewers and protects you from your own optimism.
How do I handle ethics and rights in the plan?
This is where humanities DMPs differ most from STEM ones. Address three things:
- Copyright of the source material — who owns the originals, what reuse the archive permits.
- GDPR / personal data — living individuals named in twentieth-century records trigger data-protection duties.
- Sensitivity — records of marginalised groups, medical or criminal records may need restricted access even when out of copyright.
State your legal basis and your access decision per category. "Records naming individuals born after 1925 will be embargoed for 100 years from creation" is the kind of concrete rule reviewers want.
Where will the data live, and under what licence?
Close with the preservation and sharing commitment. Name the repository, the persistent identifier, the licence, and any embargo.
text
Deposit: Zenodo (mints DOI) at project end
Licence: CC BY 4.0 for derived data; raw scans remain rights-restricted
Embargo: none on aggregate data; personal-data subset embargoed
Format: UTF-8 CSV + TEI XML as archival mastersHow do I cost it into the grant?
Reviewers want a defensible figure. Itemise rather than guess.
| Item | Typical cost |
|---|---|
| Repository deposit (Zenodo, institutional) | usually free |
| Cloud/working storage | often institutional, otherwise small |
| Curation/cleaning staff time | the real cost — defend it in days |
| Transcription credits (e.g. Transkribus) | per-page, budget explicitly |
| Software licences | GIS, OCR, etc. if not open source |
Staff time for documentation and cleaning is the line most often cut and most often regretted.
What tools speed this up?
- DMPonline (DCC, UK/EU) and DMPTool (US) — funder templates and live guidance.
- machine-actionable DMPs (maDMP) via the RDA standard, if your funder accepts them.
- Your funder's own template — always check first; AHRC, ESRC, Wellcome and Horizon Europe differ.
Key Takeaways
- A DMP is short (1–3 pages) but must be specific — name tools, repositories and cadences.
- Write the first draft before collecting data, then revise at milestones.
- Use DMPonline/DMPTool to load your exact funder template.
- The 3-2-1 backup rule with named locations beats "stored securely".
- Ethics, copyright and GDPR are where humanities DMPs need the most detail.
- Cost curation staff time honestly — it is the line most often regretted.
Frequently Asked Questions
How long should a data management plan be?
Most funders cap a DMP at one to three pages or around 1,000 words. The goal is a specific, credible plan, not an exhaustive one, so name concrete tools, repositories and formats rather than padding with generalities.
Do I need a DMP if I have no funding?
It is not mandatory without a funder, but a lightweight DMP is still worth writing. It forces early decisions about formats, backups and licensing that are painful to retrofit once data exists.
What is DMPonline and should I use it?
DMPonline (and the US-based DMPTool) is a free web tool that gives you funder-specific templates and guidance. It is the fastest way to start because it pre-loads the exact questions your funder will assess.
When in a project should I write the DMP?
Write the first version before data collection begins, ideally at the proposal stage. A DMP is a living document, so revise it at major milestones and again before deposit.
What is the difference between a DMP and a data dictionary?
A DMP is a forward-looking plan about how you will handle, store and share data across a project. A data dictionary is documentation of the fields and values within a specific dataset, written as the data takes shape.
How do I cost data management in a grant budget?
Itemise storage, repository deposit fees, any transcription or curation staff time, and software licences. Most archival deposit is free or low-cost, but staff time for cleaning and documentation is the real expense to defend.