Beginner's Guide to Pipelines with Make

Q: What is Make and why use it for research?

Make is a small, decades-old tool that runs commands in the right order based on which files have changed. For research it turns a folder of ad-hoc scripts into one reproducible command, `make`, that rebuilds your results from raw data.

Q: Do I need to know programming to use Make?

Not really. A Makefile is a list of recipes: each says which output file it builds, which input files it needs, and the shell command to run. If you can write a shell command, you can write a rule.

Q: Why does Make care about file timestamps?

Make rebuilds a target only when its inputs are newer than the output. That is what lets it skip work that is already up to date and re-run only the steps a change actually affects.

Q: What is the difference between Make and just a shell script?

A shell script runs everything top to bottom every time. Make builds a dependency graph and runs only the stale parts, and it documents the inputs and outputs of every step explicitly.

Q: Why do my Makefile recipes fail with 'missing separator'?

Because the command lines under a rule must be indented with a real tab, not spaces. This is the single most common Make error for beginners.

Q: Is Make still a good choice in 2025, or should I use Snakemake?

Make is perfect for small to medium DH pipelines and is installed almost everywhere. Reach for Snakemake or Nextflow when you need wildcards over hundreds of files, a cluster, or richer Python integration.

Make automates a research pipeline by recording, for each output file, which input files it depends on and the command that produces it; then make runs only the steps whose inputs have changed and produces everything in the correct order. Instead of remembering "run clean.py, then count.py, then plot.py", you type make and the tool figures out the order from your rules and skips work that is already done. For a historian wrangling OCR, cleaning and analysis scripts, this converts a fragile folder of scripts into one reproducible command.

What is a Makefile made of?

A Makefile is a plain text file called Makefile containing rules. Each rule has three parts:

makefile

target: prerequisites
	command

target — the file you want to build (e.g. figures/timeline.png).
prerequisites — the files it needs (e.g. data/clean.csv and the script).
command — the shell line that creates the target, indented with a tab.

That single shape — output, inputs, command — is the whole mental model. Everything else is convenience.

A small worked example you can follow

Imagine three steps: clean a raw CSV, count word frequencies, plot them. Here is the entire pipeline:

makefile

.PHONY: all clean

all: figures/freq.png

data/clean.csv: data/raw.csv scripts/clean.py
	python scripts/clean.py data/raw.csv $@

results/counts.csv: data/clean.csv scripts/count.py
	python scripts/count.py $< $@

figures/freq.png: results/counts.csv scripts/plot.py
	python scripts/plot.py $< $@

clean:
	rm -f data/clean.csv results/counts.csv figures/freq.png

Run make and it builds figures/freq.png, automatically running the clean and count steps first because they are prerequisites. Here $@ means "the target", $< means "the first prerequisite" — automatic variables that save typing.

Why does Make skip steps, and is that safe?

Make compares timestamps. If data/clean.csv is newer than data/raw.csv and scripts/clean.py, the clean step is "up to date" and Make skips it. Edit clean.py and only the affected steps re-run. This is safe as long as your rule lists every real input — including the script itself, which beginners often forget. A rule that omits its script will not rebuild when you fix a bug.

bash

make            # build everything stale
make -n         # dry run: print what WOULD run, change nothing
touch data/raw.csv && make   # force the chain to re-run

Make versus a plain shell script

Concern	Shell script	Make
Re-runs unchanged steps	Always	Only stale ones
Documents inputs/outputs	Implicit	Explicit per rule
Parallelism	Manual	`make -j4` for free
Partial rebuild after one edit	Re-run all	Just the affected branch

For a pipeline that takes seconds, the difference is small; for one that OCRs for an hour, skipping done work is the whole point.

What trips up beginners?

Three things, in order of frequency:

Spaces instead of a tab under a rule, giving *** missing separator. Configure your editor to keep tabs in Makefiles.
Forgetting the script as a prerequisite, so edits don't trigger rebuilds.
Recipes that don't actually write the named target, which makes Make rebuild every time. Each command must create exactly the file named as its target.

When you outgrow Make — wildcards over hundreds of pages, a compute cluster, conda integration per step — move up to Snakemake or Nextflow, which keep the same dependency-graph idea with more power.

Key Takeaways

Make builds a dependency graph from rules and runs only the steps whose inputs changed.
A rule is just target, prerequisites and a tab-indented command.
Always list the script itself as a prerequisite so fixes trigger rebuilds.
Use $@ and $< to avoid repeating filenames, and .PHONY for non-file targets like all/clean.
make -n previews actions; make -j4 parallelises independent steps.
Indentation must be a literal tab — spaces cause the classic "missing separator" error.

Frequently Asked Questions

What is Make and why use it for research?

Make is a small, decades-old tool that runs commands in the right order based on which files have changed. For research it turns a folder of ad-hoc scripts into one reproducible command, make, that rebuilds your results from raw data.

Do I need to know programming to use Make?

Not really. A Makefile is a list of recipes: each says which output file it builds, which input files it needs, and the shell command to run. If you can write a shell command, you can write a rule.

Why does Make care about file timestamps?

Make rebuilds a target only when its inputs are newer than the output. That is what lets it skip work that is already up to date and re-run only the steps a change actually affects.

What is the difference between Make and just a shell script?

A shell script runs everything top to bottom every time. Make builds a dependency graph and runs only the stale parts, and it documents the inputs and outputs of every step explicitly.

Why do my Makefile recipes fail with 'missing separator'?

Because the command lines under a rule must be indented with a real tab, not spaces. This is the single most common Make error for beginners.

Is Make still a good choice in 2025, or should I use Snakemake?

Make is perfect for small to medium DH pipelines and is installed almost everywhere. Reach for Snakemake or Nextflow when you need wildcards over hundreds of files, a cluster, or richer Python integration.

What is a Makefile made of? ​

A small worked example you can follow ​

Why does Make skip steps, and is that safe? ​

Make versus a plain shell script ​

What trips up beginners? ​

Key Takeaways ​

Frequently Asked Questions ​

What is Make and why use it for research? ​

Do I need to know programming to use Make? ​

Why does Make care about file timestamps? ​

What is the difference between Make and just a shell script? ​

Why do my Makefile recipes fail with 'missing separator'? ​

Is Make still a good choice in 2025, or should I use Snakemake? ​

Related reading ​