Skip to content
Reproducible Humanities Research

A research README fails most often for one reason: it documents the author's tacit knowledge instead of a reproducible procedure. The fix is to write the README as a script a stranger can execute top to bottom — description, environment, data, the one command that reproduces your headline result, then citation and licence — and to test it on a clean machine. If you only fix one thing today, copy your install-and-run steps into a fresh container and watch where they break.

The five errors that break a research README

Most broken READMEs share the same root causes. Diagnose yours against this list:

  1. Hidden environment — it assumes packages or a conda env you never wrote down.
  2. Absolute pathsC:\Users\elara\corpus\ works only on your laptop.
  3. Missing data step — the code needs data/letters.csv but nothing says how to get it.
  4. No single run command — the reader cannot tell which of nine notebooks produces Figure 2.
  5. Stale instructions — the README describes last year's pipeline.

Why does my README work for me but not for anyone else?

Because your machine carries state the README does not. The diagnostic is brutal but reliable: spin up a clean container and follow your own instructions verbatim.

bash
docker run --rm -it -v "$PWD":/proj -w /proj python:3.11-slim bash
# now run ONLY what your README says, nothing from memory

Every command you have to improvise is a missing line in the README. Replace absolute paths with relative ones or a configurable DATA_DIR, and move secret keys to an .env.example you document.

What is the minimal structure that actually reproduces results?

Order the file so a reader reaches "it works" as fast as possible:

markdown
# Project title
One sentence on what this is and what it produces.

## Install
    uv sync --frozen          # or: pip install -r requirements.txt

## Get the data
Download from <DOI/url>, place in `data/`. Licence: CC-BY-4.0.

## Reproduce the main result
    make all                  # writes figures/ and tables/ in ~6 min

## Cite
See CITATION.cff. Code under MIT, prose under CC-BY-4.0.

The "Reproduce" block is the heart: one command, an expected runtime, and where the outputs land so the reader can verify them.

How do I document data without committing it?

Never paste large or sensitive corpora into the repo. Instead give the reader a fetch path and a checksum so they know they got the right file:

bash
curl -L -o data/corpus.zip https://zenodo.org/record/XXXX/files/corpus.zip
sha256sum -c data/corpus.sha256   # fails loudly if the download is wrong

State the licence and any access restrictions (for example, archival material under copyright) right there, so a reader does not discover a legal blocker three hours in.

How do I stop the README going stale?

Wire the quick-start into continuous integration. A tiny GitHub Actions job that installs the environment and runs make all on a sample will fail the moment an instruction rots:

yaml
- run: uv sync --frozen
- run: make smoke   # a fast subset that exercises the documented path

Then make it a rule that any commit changing behaviour updates the README in the same commit. Reviewers should reject a pipeline change whose docs were not touched.

Key Takeaways

  • A research README is a runnable procedure, not a description; test it on a clean machine.
  • Cover five things: description, install, data, one reproduce command, citation/licence.
  • Kill absolute paths and hidden environment state — the usual reasons others can't run it.
  • Don't commit data; give a fetch URL, a DOI and a checksum instead.
  • Lead with a fast quick-start; push deep detail to linked docs.
  • Test the quick-start in CI and update the README in the same commit as behaviour changes.

Frequently Asked Questions

What must a research README contain at minimum?

A one-line description, how to install the environment, how to get the data, the exact command to reproduce the main result, and a citation plus licence. If a stranger cannot run your headline output from those five things, the README is incomplete.

Why does my README work for me but not for collaborators?

Almost always because it documents your tacit setup rather than a clean one. Test it by following it yourself on a fresh machine or container, with no hidden environment variables, absolute paths or undocumented data files.

Where should the README live and what should it be called?

Put README.md at the repository root so GitHub, GitLab and Zenodo render it automatically. Use Markdown, not a Word document, so it stays diff-able and renders everywhere.

How long should a research README be?

Long enough that someone reproduces your result, short enough that they actually read it. A single screen of quick-start plus links to deeper docs beats a 4000-word wall of text.

Should the README include the data itself?

No. Describe where the data lives, its licence and how to fetch it, and link to a deposited copy with a DOI. Large or sensitive data does not belong in the repository.

How do I keep the README from going stale?

Test the quick-start in CI so a broken instruction fails the build, and update the README in the same commit that changes the behaviour it documents.