Skip to content
Python for Historians

The core best practice for managing Python environments is one isolated environment per project, with every package version pinned in a committed requirements.txt and the environment folder itself ignored by Git. That isolation is precisely what lets a historical analysis run identically next year or on a colleague's machine. Skip it, and a routine package upgrade will eventually break an analysis you depended on.

Here is the working checklist, plus the reasoning behind each rule.

Why does a historian need virtual environments?

Without isolation, every project shares one global set of packages. Upgrade pandas for a new project and an older script — say, your census analysis — may silently change its output or stop running. A virtual environment gives each project its own package versions, so projects cannot interfere with each other and old work stays rerunnable.

bash
python -m venv .venv          # create
source .venv/bin/activate     # activate (macOS/Linux)
.venv\Scripts\activate        # activate (Windows)

How do I pin and record exact versions?

The difference between "it works on my machine" and reproducible research is pinned versions. Install what you need, then freeze:

bash
pip install pandas geopy pdfplumber
pip freeze > requirements.txt   # records exact pinned versions

A collaborator then runs pip install -r requirements.txt and gets the identical set. Record the Python version too, because behaviour differs across 3.10, 3.11, and 3.12.

venv, conda, or uv: which should I choose?

ToolBest forTrade-off
venv + pipText, data, web, APIsStandard, always present; manual locking
condaGeospatial (GDAL, GEOS)Heavier; solves binary dependencies cleanly
uvFast installs, modern lockfilesNewer; same workflow, far quicker

For most historians doing text and tabular work, venv plus pip is the right default. Reach for conda only when a compiled stack like GDAL fights you, and try uv when install speed matters across many projects.

What makes an environment truly reproducible?

A plain pip freeze pins direct packages but can still drift on transitive dependencies. For defensible, archivable results, lock the full tree:

bash
pip install pip-tools
pip-compile requirements.in   # produces a fully pinned requirements.txt

This records not just pandas but every library pandas itself pulls in, so the environment is bit-for-bit recreatable years later — essential when you cite the analysis in a publication.

Should I commit the environment folder to Git?

No. The .venv folder holds machine-specific compiled binaries that do not transfer between computers and bloat the repository. Ignore it and commit only the recipe:

# .gitignore
.venv/
__pycache__/

Commit requirements.txt (or the lock file) — that is the reproducible part. The folder is disposable; the version list is the artefact.

How many environments and when do I rebuild?

One environment per project. A single shared environment is the most common cause of "it used to work": adding a dependency for a new project upgrades a library an old analysis relied on. When an environment misbehaves, rebuild it from scratch rather than patching:

bash
rm -rf .venv && python -m venv .venv
pip install -r requirements.txt

Because the recipe is committed, throwing the folder away and recreating it is cheap and proves your pins are complete.

The working checklist

  • [ ] One virtual environment per project, never global.
  • [ ] Environment folder in .gitignore.
  • [ ] Exact versions pinned in requirements.txt.
  • [ ] Python version recorded in the README.
  • [ ] Transitive dependencies locked with pip-tools or uv for publications.
  • [ ] Environment rebuildable from scratch with one command.

Key Takeaways

  • Keep one isolated environment per project so upgrades never break old analyses.
  • Pin exact versions with pip freeze into a committed requirements.txt.
  • Use venv + pip by default; conda for geospatial stacks; uv for speed.
  • Lock transitive dependencies with pip-tools for publication-grade reproducibility.
  • Never commit the environment folder — .gitignore it and commit only the recipe.
  • Record the Python version, and rebuild from scratch to prove your pins are complete.

Frequently Asked Questions

Why do I need a virtual environment at all?

A virtual environment isolates each project's packages so that upgrading a library for one project cannot silently break another. For reproducible history research, isolation is what lets an analysis still run years later.

Should I use venv, conda, or something newer like uv?

Use the built-in venv plus pip for most text-and-data work; it is standard and always available. Choose conda when you need compiled geospatial stacks like GDAL, and consider uv if you want much faster installs with the same workflow.

What is the difference between pip freeze and a requirements file?

pip freeze lists the exact installed versions; you redirect it into requirements.txt to record them. The requirements file is what others install from, so freezing pinned versions is what makes an environment reproducible.

How do I make my environment reproducible for someone else?

Pin exact versions in requirements.txt, record the Python version, and ideally lock transitive dependencies with a tool like pip-tools or uv. Commit these files so a collaborator recreates the identical environment.

Should I commit the virtual environment folder to Git?

No. Add the environment folder to .gitignore and commit only the requirements or lock file. The environment contains machine-specific binaries that do not transfer and bloat the repository.

How many environments should a historian keep?

One per project, not one global setup. A shared environment guarantees that upgrading a package for a new project eventually breaks an old analysis you need to rerun.