How to Set up Python for historical research

To set up Python for historical research, install the current stable CPython from python.org, create one virtual environment per project, and add the small toolkit most historians actually use: pandas, requests, lxml, openpyxl and jupyterlab. That combination handles spreadsheets, archive APIs, TEI/XML and exploratory notebooks without drowning you in machine-learning dependencies you will never touch. The whole installation takes about twenty minutes and, done once correctly, saves you days of "it worked yesterday" confusion later.

What exactly do you need to install?

For a working historian, the essentials are deliberately short:

Python itself — the interpreter, from python.org. Tick "Add Python to PATH" on Windows.
A virtual environment tool — venv ships with Python, so nothing extra to install.
An editor or notebook — VS Code (free) or JupyterLab.
A handful of libraries — installed per project, not globally.

Resist the urge to install everything you read about. A clean base plus targeted, per-project additions ages far better than a global pile of half-remembered packages.

How do you create a virtual environment?

A virtual environment isolates one project's packages so an upgrade for your gazetteer project cannot break your census-analysis project. Create one inside each project folder:

bash

# inside your project folder
python -m venv .venv

# activate it
# macOS / Linux:
source .venv/bin/activate
# Windows PowerShell:
.venv\Scripts\Activate.ps1

# install your toolkit
pip install pandas requests lxml openpyxl jupyterlab

Once activated, your prompt shows (.venv). Everything you pip install now lives only in this project. To leave, type deactivate.

Should you record your dependencies?

Yes — this is the single habit that separates reproducible research from a future headache. After installing, freeze the exact versions:

bash

pip freeze > requirements.txt

A colleague, a reviewer, or future-you can then rebuild the identical environment with pip install -r requirements.txt. For a 1641 Depositions project I rebuilt three years later, this one file meant the analysis ran first time on a new laptop.

Anaconda or plain Python: which for historians?

Factor	Plain Python + venv	Miniconda
Install size	~100 MB	~400 MB+
Geospatial libs (GDAL, geopandas)	Can be fiddly	Pre-built, easy
Speed to first script	Faster	Slower
Mixes with system Python	Cleanly	Separate ecosystem

Pick plain Python unless your work is heavily GIS-based. Mixing pip and conda carelessly is a classic source of broken environments, so commit to one per project.

What folder structure should you use?

Consistency beats cleverness. A reliable starting layout:

mycorrespondence-project/
  .venv/
  data/
    raw/          # never edited by hand
    processed/
  notebooks/
  src/
  requirements.txt
  README.md

Treat data/raw/ as read-only — your analysis scripts read from it and write derived files to data/processed/. That discipline means you can always re-run from sources if a transformation goes wrong.

How do you check the install actually works?

Run a three-line smoke test before trusting anything:

python

import pandas as pd
df = pd.read_csv("data/raw/sample.csv", encoding="utf-8")
print(df.shape, df.columns.tolist())

If that prints a sensible row/column count, your interpreter, your packages and your file paths all agree. Encoding errors here are the most common first stumble — historical sources are full of accented names and old code pages, so pass encoding="utf-8" (or latin-1) explicitly rather than relying on the default.

What pitfalls trip up beginners most?

Installing globally instead of into a venv, then watching one project break another.
Spaces and accents in folder paths, which confuse some tools — keep paths plain ASCII.
Editing raw data in Excel and silently mangling dates or leading zeros in catalogue references.
Chasing the newest Python the week it ships, before libraries have wheels.

Key Takeaways

Use the current stable CPython from python.org; skip the bleeding-edge release for a few weeks.
One virtual environment per project keeps work isolated and reproducible.
A starter toolkit of pandas, requests, lxml, openpyxl and jupyterlab covers most archival tasks.
Freeze versions with pip freeze > requirements.txt from day one.
Choose Miniconda only when you need painful geospatial binaries.
Keep data/raw/ read-only and put your code under Git.
Always pass an explicit encoding when reading historical text.

Frequently Asked Questions

Which Python version should a historian install?

Install the current stable CPython (3.11 or 3.12 at the time of writing). Avoid the very newest point release for a few weeks until your key libraries publish compatible wheels.

Do I need Anaconda or can I use plain Python?

Plain Python from python.org plus a virtual environment is lighter and fully sufficient for most archival work. Choose Miniconda only if you need GDAL, geopandas or other geospatial binaries that are painful to compile.

Should I learn the command line first?

Learn five commands: cd, ls/dir, python, pip and activating a virtual environment. That is enough to follow almost every tutorial; deeper shell skills can wait.

Where should I keep my research code and data?

Keep one folder per project containing a code folder, a raw data folder you never edit by hand, and a requirements file. Back the whole thing up and put the code under Git.

What is a virtual environment and why does it matter?

A virtual environment is an isolated copy of Python plus packages for one project. It stops a library upgrade for one project from silently breaking another and makes your work reproducible.

Is Jupyter or VS Code better for beginners?

Start in Jupyter for exploratory analysis where you want to see results inline. Move to VS Code or scripts once your code grows past a few hundred lines or needs to be rerun reliably.

What exactly do you need to install? ​

How do you create a virtual environment? ​

Should you record your dependencies? ​

Anaconda or plain Python: which for historians? ​

What folder structure should you use? ​

How do you check the install actually works? ​

What pitfalls trip up beginners most? ​

Key Takeaways ​

Frequently Asked Questions ​

Which Python version should a historian install? ​

Do I need Anaconda or can I use plain Python? ​

Should I learn the command line first? ​

Where should I keep my research code and data? ​

What is a virtual environment and why does it matter? ​

Is Jupyter or VS Code better for beginners? ​

Related reading ​