Appearance
The core best practice for managing Python environments is one isolated environment per project, with every package version pinned in a committed requirements.txt and the environment folder itself ignored by Git. That isolation is precisely what lets a historical analysis run identically next year or on a colleague's machine. Skip it, and a routine package upgrade will eventually break an analysis you depended on.
Here is the working checklist, plus the reasoning behind each rule.
Why does a historian need virtual environments?
Without isolation, every project shares one global set of packages. Upgrade pandas for a new project and an older script — say, your census analysis — may silently change its output or stop running. A virtual environment gives each project its own package versions, so projects cannot interfere with each other and old work stays rerunnable.
bash
python -m venv .venv # create
source .venv/bin/activate # activate (macOS/Linux)
.venv\Scripts\activate # activate (Windows)How do I pin and record exact versions?
The difference between "it works on my machine" and reproducible research is pinned versions. Install what you need, then freeze:
bash
pip install pandas geopy pdfplumber
pip freeze > requirements.txt # records exact pinned versionsA collaborator then runs pip install -r requirements.txt and gets the identical set. Record the Python version too, because behaviour differs across 3.10, 3.11, and 3.12.
venv, conda, or uv: which should I choose?
| Tool | Best for | Trade-off |
|---|---|---|
venv + pip | Text, data, web, APIs | Standard, always present; manual locking |
conda | Geospatial (GDAL, GEOS) | Heavier; solves binary dependencies cleanly |
uv | Fast installs, modern lockfiles | Newer; same workflow, far quicker |
For most historians doing text and tabular work, venv plus pip is the right default. Reach for conda only when a compiled stack like GDAL fights you, and try uv when install speed matters across many projects.
What makes an environment truly reproducible?
A plain pip freeze pins direct packages but can still drift on transitive dependencies. For defensible, archivable results, lock the full tree:
bash
pip install pip-tools
pip-compile requirements.in # produces a fully pinned requirements.txtThis records not just pandas but every library pandas itself pulls in, so the environment is bit-for-bit recreatable years later — essential when you cite the analysis in a publication.
Should I commit the environment folder to Git?
No. The .venv folder holds machine-specific compiled binaries that do not transfer between computers and bloat the repository. Ignore it and commit only the recipe:
# .gitignore
.venv/
__pycache__/Commit requirements.txt (or the lock file) — that is the reproducible part. The folder is disposable; the version list is the artefact.
How many environments and when do I rebuild?
One environment per project. A single shared environment is the most common cause of "it used to work": adding a dependency for a new project upgrades a library an old analysis relied on. When an environment misbehaves, rebuild it from scratch rather than patching:
bash
rm -rf .venv && python -m venv .venv
pip install -r requirements.txtBecause the recipe is committed, throwing the folder away and recreating it is cheap and proves your pins are complete.
The working checklist
- [ ] One virtual environment per project, never global.
- [ ] Environment folder in
.gitignore. - [ ] Exact versions pinned in
requirements.txt. - [ ] Python version recorded in the README.
- [ ] Transitive dependencies locked with
pip-toolsoruvfor publications. - [ ] Environment rebuildable from scratch with one command.
Key Takeaways
- Keep one isolated environment per project so upgrades never break old analyses.
- Pin exact versions with
pip freezeinto a committedrequirements.txt. - Use
venv+pipby default; conda for geospatial stacks;uvfor speed. - Lock transitive dependencies with
pip-toolsfor publication-grade reproducibility. - Never commit the environment folder —
.gitignoreit and commit only the recipe. - Record the Python version, and rebuild from scratch to prove your pins are complete.
Frequently Asked Questions
Why do I need a virtual environment at all?
A virtual environment isolates each project's packages so that upgrading a library for one project cannot silently break another. For reproducible history research, isolation is what lets an analysis still run years later.
Should I use venv, conda, or something newer like uv?
Use the built-in venv plus pip for most text-and-data work; it is standard and always available. Choose conda when you need compiled geospatial stacks like GDAL, and consider uv if you want much faster installs with the same workflow.
What is the difference between pip freeze and a requirements file?
pip freeze lists the exact installed versions; you redirect it into requirements.txt to record them. The requirements file is what others install from, so freezing pinned versions is what makes an environment reproducible.
How do I make my environment reproducible for someone else?
Pin exact versions in requirements.txt, record the Python version, and ideally lock transitive dependencies with a tool like pip-tools or uv. Commit these files so a collaborator recreates the identical environment.
Should I commit the virtual environment folder to Git?
No. Add the environment folder to .gitignore and commit only the requirements or lock file. The environment contains machine-specific binaries that do not transfer and bloat the repository.
How many environments should a historian keep?
One per project, not one global setup. A shared environment guarantees that upgrading a package for a new project eventually breaks an old analysis you need to rerun.