Appearance
Choose a notebook when you are exploring, explaining or teaching, and a script when you need a step to run the same way every time, unattended and in order. Notebooks interleave code, narrative and output in cells you run interactively, which is ideal for discovery and showing your reasoning. Scripts are plain files that run start to finish, which is ideal for repeatable pipelines. Most real projects use both: explore in notebooks, then harden the stable parts into scripts.
This is a beginner's orientation with a small worked example, so you can feel the difference rather than just read about it.
What exactly is a notebook, and what is a script?
A Jupyter notebook is a document of cells. Each cell holds code or prose; you run a code cell and its output appears right below it. You can run cells in any order, tweak, and re-run — perfect for "what does this corpus look like?" A script, by contrast, is a single .py file that runs top to bottom when you type python clean.py. No interactivity, no surprises about order. The notebook is a workbench; the script is a machine.
When should I reach for a notebook?
Reach for a notebook when the work is exploratory or explanatory. You are sampling a newly OCR'd corpus, plotting the distribution of letter lengths, sanity-checking a date parser, or writing a tutorial where readers should see each step's output. The killer feature is the tight feedback loop: change a line, see the new chart instantly, and keep your prose, code and results in one shareable document.
python
# A notebook cell: look at the data and react
import pandas as pd
letters = pd.read_csv("data/processed/letters.csv")
letters["year"].value_counts().sort_index().plot.bar()When should I reach for a script?
Reach for a script when a step must be repeatable and reliable. Cleaning a dataset, running the same OCR post-processing across 800 files, or any task you will run again next month belongs in a script. Scripts run in order every time, take arguments, slot into automation, and produce identical results on a fresh machine.
python
# clean.py — runs the same way every time
import sys, pandas as pd
def clean(infile, outfile):
df = pd.read_csv(infile)
df["surname"] = df["surname"].str.strip().str.title()
df.to_csv(outfile, index=False)
if __name__ == "__main__":
clean(sys.argv[1], sys.argv[2])How do notebooks and scripts compare at a glance?
| Dimension | Notebook | Script |
|---|---|---|
| Best for | exploration, teaching | repeatable pipelines |
| Execution | any order, interactive | top to bottom, one shot |
| Output | embedded, visible | written to files/console |
| Git diffs | noisy JSON | clean text |
| Automation | awkward | natural |
| Reproducibility risk | out-of-order cells | low |
Why do notebooks trip people up on reproducibility?
The notebook's strength — running cells in any order — is also its trap. You can end up with output on screen that no clean run could reproduce, because you ran cell 7, edited cell 3, and never re-ran cell 5. The discipline is simple: before you trust or share a notebook, choose Restart kernel and run all. If it produces the same result from a clean state, it is reproducible; if it errors, you just caught a hidden dependency on execution order.
The second gotcha is Git. Notebooks save outputs and metadata as JSON, so commits become noisy and merges painful. Run nbstripout (as a Git filter) to strip outputs before committing, and your history stays readable.
How do I combine both in one project?
The mature pattern is a graduation pipeline. Explore in a notebook; once a chunk of logic is stable, lift it into a .py module and import it back into the notebook. The notebook stays a thin narrative layer calling tested functions, while the heavy, repeatable work lives in scripts that automation can run. You get the explanatory power of notebooks and the reliability of scripts at once.
Key Takeaways
- Notebooks suit exploration and explanation; scripts suit repeatable, unattended pipelines.
- Notebooks run cells in any order interactively; scripts run top to bottom every time.
- Always Restart and run all before trusting a notebook, to catch out-of-order dependencies.
- Strip notebook outputs with
nbstripoutto keep Git diffs clean. - The mature pattern graduates stable logic into scripts that notebooks import.
- Beginners should start in a notebook for fast visible feedback, then harden routine steps into scripts.
Frequently Asked Questions
What is the basic difference between a notebook and a script?
A notebook mixes code, prose and output in interactive cells you run in any order. A script is a plain text file that runs top to bottom in one go. Notebooks suit exploration; scripts suit repeatable pipelines.
Are notebooks bad for reproducibility?
Not inherently, but out-of-order execution can produce results you cannot rebuild. Always restart the kernel and run all cells before trusting a notebook's output.
Can I use both in one project?
Yes, and most mature projects do. Explore and explain in notebooks, then move stable logic into scripts or a small module the notebooks import.
Why do notebooks cause messy Git diffs?
Notebooks store output and metadata as JSON, so diffs are noisy. Strip outputs before committing with a tool like nbstripout to keep history readable.
Which should a beginner start with?
Start with a notebook. The immediate, visible feedback makes learning far easier; graduate parts to scripts once a step becomes routine and needs to run unattended.