Appearance
Make automates a research pipeline by recording, for each output file, which input files it depends on and the command that produces it; then make runs only the steps whose inputs have changed and produces everything in the correct order. Instead of remembering "run clean.py, then count.py, then plot.py", you type make and the tool figures out the order from your rules and skips work that is already done. For a historian wrangling OCR, cleaning and analysis scripts, this converts a fragile folder of scripts into one reproducible command.
What is a Makefile made of?
A Makefile is a plain text file called Makefile containing rules. Each rule has three parts:
makefile
target: prerequisites
command- target — the file you want to build (e.g.
figures/timeline.png). - prerequisites — the files it needs (e.g.
data/clean.csvand the script). - command — the shell line that creates the target, indented with a tab.
That single shape — output, inputs, command — is the whole mental model. Everything else is convenience.
A small worked example you can follow
Imagine three steps: clean a raw CSV, count word frequencies, plot them. Here is the entire pipeline:
makefile
.PHONY: all clean
all: figures/freq.png
data/clean.csv: data/raw.csv scripts/clean.py
python scripts/clean.py data/raw.csv $@
results/counts.csv: data/clean.csv scripts/count.py
python scripts/count.py $< $@
figures/freq.png: results/counts.csv scripts/plot.py
python scripts/plot.py $< $@
clean:
rm -f data/clean.csv results/counts.csv figures/freq.pngRun make and it builds figures/freq.png, automatically running the clean and count steps first because they are prerequisites. Here $@ means "the target", $< means "the first prerequisite" — automatic variables that save typing.
Why does Make skip steps, and is that safe?
Make compares timestamps. If data/clean.csv is newer than data/raw.csv and scripts/clean.py, the clean step is "up to date" and Make skips it. Edit clean.py and only the affected steps re-run. This is safe as long as your rule lists every real input — including the script itself, which beginners often forget. A rule that omits its script will not rebuild when you fix a bug.
bash
make # build everything stale
make -n # dry run: print what WOULD run, change nothing
touch data/raw.csv && make # force the chain to re-runMake versus a plain shell script
| Concern | Shell script | Make |
|---|---|---|
| Re-runs unchanged steps | Always | Only stale ones |
| Documents inputs/outputs | Implicit | Explicit per rule |
| Parallelism | Manual | make -j4 for free |
| Partial rebuild after one edit | Re-run all | Just the affected branch |
For a pipeline that takes seconds, the difference is small; for one that OCRs for an hour, skipping done work is the whole point.
What trips up beginners?
Three things, in order of frequency:
- Spaces instead of a tab under a rule, giving
*** missing separator. Configure your editor to keep tabs in Makefiles. - Forgetting the script as a prerequisite, so edits don't trigger rebuilds.
- Recipes that don't actually write the named target, which makes Make rebuild every time. Each command must create exactly the file named as its target.
When you outgrow Make — wildcards over hundreds of pages, a compute cluster, conda integration per step — move up to Snakemake or Nextflow, which keep the same dependency-graph idea with more power.
Key Takeaways
- Make builds a dependency graph from rules and runs only the steps whose inputs changed.
- A rule is just target, prerequisites and a tab-indented command.
- Always list the script itself as a prerequisite so fixes trigger rebuilds.
- Use
$@and$<to avoid repeating filenames, and.PHONYfor non-file targets likeall/clean. make -npreviews actions;make -j4parallelises independent steps.- Indentation must be a literal tab — spaces cause the classic "missing separator" error.
Frequently Asked Questions
What is Make and why use it for research?
Make is a small, decades-old tool that runs commands in the right order based on which files have changed. For research it turns a folder of ad-hoc scripts into one reproducible command, make, that rebuilds your results from raw data.
Do I need to know programming to use Make?
Not really. A Makefile is a list of recipes: each says which output file it builds, which input files it needs, and the shell command to run. If you can write a shell command, you can write a rule.
Why does Make care about file timestamps?
Make rebuilds a target only when its inputs are newer than the output. That is what lets it skip work that is already up to date and re-run only the steps a change actually affects.
What is the difference between Make and just a shell script?
A shell script runs everything top to bottom every time. Make builds a dependency graph and runs only the stale parts, and it documents the inputs and outputs of every step explicitly.
Why do my Makefile recipes fail with 'missing separator'?
Because the command lines under a rule must be indented with a real tab, not spaces. This is the single most common Make error for beginners.
Is Make still a good choice in 2025, or should I use Snakemake?
Make is perfect for small to medium DH pipelines and is installed almost everywhere. Reach for Snakemake or Nextflow when you need wildcards over hundreds of files, a cluster, or richer Python integration.