Appearance
Plan project documentation at kick-off, before any data exists — that is the moment when capturing decisions is cheapest and reconstruction is impossible to fake later. The exception is a genuinely throwaway experiment, where a single README suffices. For anything funded, collaborative, or meant to outlive one person, a lightweight documentation plan written up front pays for itself many times over by the time of handoff, publication, or preservation.
When does planning documentation pay off, and when does it not?
The deciding variable is reuse intent. If a future stranger — a co-author, a repository curator, or you in three years — must rerun or reinterpret your work, documentation is load-bearing infrastructure. If nobody, including you, will ever touch the output again, elaborate documentation is waste. Most DH projects fall on the reuse side because grants, repositories and the discipline itself increasingly demand it.
| Signal | Plan documentation | Minimal README is fine |
|---|---|---|
| Funded / grant-reportable | yes | — |
| More than one contributor | yes | — |
| Data intended for a repository | yes | — |
| One-off exploratory script | — | yes |
| Lifespan beyond the team | yes | — |
| Personal sandbox, no sharing | — | yes |
Why is kick-off the cheapest moment?
Documentation captured while work happens costs a fraction of documentation reconstructed afterwards. The reason is memory decay: the rationale for choosing gazetteer A over gazetteer B, or why 1,200 ambiguous records were excluded, lives in someone's head for weeks, not years. Capture it as a decision log entry the day you make the call:
markdown
## 2024-11-12 — excluded undated charters
Decision: drop the 1,184 charters with no inferable date from the network graph.
Why: edges need a temporal ordering; undated nodes distorted centrality.
Alternative considered: impute dates from cartulary position — rejected, too speculative.
Owner: E. ReedA log like this takes ninety seconds per entry and answers the "why did you do it this way?" question that reviewers and successors always ask.
What should a documentation plan actually contain?
Keep it to one page. A workable plan names five things: what artefacts get documented (data, code, decisions, provenance), in what format, where they live, who owns each, and when they are reviewed.
yaml
# documentation-plan.yml
artefacts:
readme: { format: markdown, location: repo root, owner: PI }
data_dictionary: { format: yaml, location: /docs, owner: curator }
decision_log: { format: markdown, location: /docs, owner: curator }
provenance: { format: PROV-O / prose, location: /docs, owner: curator }
review_cadence: monthly
done_definition: "a stranger can rerun the pipeline from the README alone"The "done definition" matters most: it converts a vague aspiration into a testable bar.
Is a documentation plan the same as a data management plan?
No, and conflating them causes gaps. A DMP is the funder-facing promise about storage, backup, sharing and preservation — often a fixed-format deliverable for the grant. A documentation plan is the internal, day-to-day agreement about what gets recorded and by whom. The DMP says "data will be deposited in a repository with a DOI"; the documentation plan ensures that when you deposit, the data is actually understandable. You generally need both, and the documentation plan is what makes the DMP's promises deliverable.
How do you avoid over-documenting?
Over-documentation is real and it has a cost: it slows the work and, worse, goes stale, becoming actively misleading. The discipline is to document the non-obvious. Do not narrate what the code already states; do record what the code cannot — why a threshold was set, which records were judged unreliable, what a column means in archival terms. Auto-generate everything you can (API docs from docstrings, schema from the database) and hand-write only the human judgement layer.
Who owns it, and how do you keep it alive?
Assign documentation to a named person as a CRediT Data Curation role with real time budgeted, not "the team" — shared ownership means none. Keep it in version control next to the data so it travels with the work and shows up in code review. Review it at a set cadence; documentation that is never reviewed is documentation nobody trusts.
Key Takeaways
- Plan documentation at kick-off — captured-as-you-go costs a fraction of reconstructed-after-the-fact.
- The deciding signal is reuse intent: funded, collaborative or long-lived work needs a plan; throwaway experiments do not.
- A one-page plan names artefacts, formats, locations, owners and a testable "done definition".
- A documentation plan is not a DMP — you usually need both, and the plan makes the DMP deliverable.
- Document the non-obvious (decisions, rationale, archival meaning); auto-generate the rest.
- Give documentation a single named owner with budgeted time, and review it on a cadence.
Frequently Asked Questions
When is the right moment to plan documentation?
At project kick-off, before any data is created. The cheapest documentation is captured as work happens; reconstructing it months later costs several times more and is often impossible because the decision-makers have moved on.
Is it ever right NOT to write a documentation plan?
For a genuinely throwaway one-week experiment with no reuse intent, a single README is enough. Anything funded, collaborative or intended to outlive a single person warrants a real plan.
How much documentation is too much?
When writing it slows the work more than it de-risks it. The test is reuse: document anything a future stranger would need to rerun or reinterpret your results, and skip narration that the code or data already makes obvious.
What is the difference between a DMP and a documentation plan?
A data management plan is the funder-facing promise about storage, sharing and preservation. A documentation plan is the internal working agreement about what gets recorded, by whom and where, day to day. You usually need both.
Who should own the documentation plan?
A named person, not 'the team'. Shared ownership of documentation reliably means no ownership. Assign it as a CRediT Data Curation role with explicit time allocated.
What is the cheapest documentation that still works?
A README plus a decision log committed alongside the data. Together they answer 'what is this?' and 'why did we do it this way?' — the two questions future users ask most.