Appearance
To set up Python for historical research, install the current stable CPython from python.org, create one virtual environment per project, and add the small toolkit most historians actually use: pandas, requests, lxml, openpyxl and jupyterlab. That combination handles spreadsheets, archive APIs, TEI/XML and exploratory notebooks without drowning you in machine-learning dependencies you will never touch. The whole installation takes about twenty minutes and, done once correctly, saves you days of "it worked yesterday" confusion later.
What exactly do you need to install?
For a working historian, the essentials are deliberately short:
- Python itself — the interpreter, from python.org. Tick "Add Python to PATH" on Windows.
- A virtual environment tool —
venvships with Python, so nothing extra to install. - An editor or notebook — VS Code (free) or JupyterLab.
- A handful of libraries — installed per project, not globally.
Resist the urge to install everything you read about. A clean base plus targeted, per-project additions ages far better than a global pile of half-remembered packages.
How do you create a virtual environment?
A virtual environment isolates one project's packages so an upgrade for your gazetteer project cannot break your census-analysis project. Create one inside each project folder:
bash
# inside your project folder
python -m venv .venv
# activate it
# macOS / Linux:
source .venv/bin/activate
# Windows PowerShell:
.venv\Scripts\Activate.ps1
# install your toolkit
pip install pandas requests lxml openpyxl jupyterlabOnce activated, your prompt shows (.venv). Everything you pip install now lives only in this project. To leave, type deactivate.
Should you record your dependencies?
Yes — this is the single habit that separates reproducible research from a future headache. After installing, freeze the exact versions:
bash
pip freeze > requirements.txtA colleague, a reviewer, or future-you can then rebuild the identical environment with pip install -r requirements.txt. For a 1641 Depositions project I rebuilt three years later, this one file meant the analysis ran first time on a new laptop.
Anaconda or plain Python: which for historians?
| Factor | Plain Python + venv | Miniconda |
|---|---|---|
| Install size | ~100 MB | ~400 MB+ |
| Geospatial libs (GDAL, geopandas) | Can be fiddly | Pre-built, easy |
| Speed to first script | Faster | Slower |
| Mixes with system Python | Cleanly | Separate ecosystem |
Pick plain Python unless your work is heavily GIS-based. Mixing pip and conda carelessly is a classic source of broken environments, so commit to one per project.
What folder structure should you use?
Consistency beats cleverness. A reliable starting layout:
mycorrespondence-project/
.venv/
data/
raw/ # never edited by hand
processed/
notebooks/
src/
requirements.txt
README.mdTreat data/raw/ as read-only — your analysis scripts read from it and write derived files to data/processed/. That discipline means you can always re-run from sources if a transformation goes wrong.
How do you check the install actually works?
Run a three-line smoke test before trusting anything:
python
import pandas as pd
df = pd.read_csv("data/raw/sample.csv", encoding="utf-8")
print(df.shape, df.columns.tolist())If that prints a sensible row/column count, your interpreter, your packages and your file paths all agree. Encoding errors here are the most common first stumble — historical sources are full of accented names and old code pages, so pass encoding="utf-8" (or latin-1) explicitly rather than relying on the default.
What pitfalls trip up beginners most?
- Installing globally instead of into a venv, then watching one project break another.
- Spaces and accents in folder paths, which confuse some tools — keep paths plain ASCII.
- Editing raw data in Excel and silently mangling dates or leading zeros in catalogue references.
- Chasing the newest Python the week it ships, before libraries have wheels.
Key Takeaways
- Use the current stable CPython from python.org; skip the bleeding-edge release for a few weeks.
- One virtual environment per project keeps work isolated and reproducible.
- A starter toolkit of
pandas,requests,lxml,openpyxlandjupyterlabcovers most archival tasks. - Freeze versions with
pip freeze > requirements.txtfrom day one. - Choose Miniconda only when you need painful geospatial binaries.
- Keep
data/raw/read-only and put your code under Git. - Always pass an explicit
encodingwhen reading historical text.
Frequently Asked Questions
Which Python version should a historian install?
Install the current stable CPython (3.11 or 3.12 at the time of writing). Avoid the very newest point release for a few weeks until your key libraries publish compatible wheels.
Do I need Anaconda or can I use plain Python?
Plain Python from python.org plus a virtual environment is lighter and fully sufficient for most archival work. Choose Miniconda only if you need GDAL, geopandas or other geospatial binaries that are painful to compile.
Should I learn the command line first?
Learn five commands: cd, ls/dir, python, pip and activating a virtual environment. That is enough to follow almost every tutorial; deeper shell skills can wait.
Where should I keep my research code and data?
Keep one folder per project containing a code folder, a raw data folder you never edit by hand, and a requirements file. Back the whole thing up and put the code under Git.
What is a virtual environment and why does it matter?
A virtual environment is an isolated copy of Python plus packages for one project. It stops a library upgrade for one project from silently breaking another and makes your work reproducible.
Is Jupyter or VS Code better for beginners?
Start in Jupyter for exploratory analysis where you want to see results inline. Move to VS Code or scripts once your code grows past a few hundred lines or needs to be rerun reliably.