Beginner's Guide to Historical occupations

To standardise historical occupations you map each verbatim job title — however it was spelled or abbreviated — onto a controlled classification, most commonly HISCO, the Historical International Standard Classification of Occupations. This turns a chaotic free-text column into codes you can count, compare across sources, and link to social class. The core idea is simple: one occupation, one code, with the original wording preserved for audit.

Why bother standardising at all?

Imagine a census column with cordwainer, shoe maker, boot & shoe mkr, and snob (a slang term for a shoemaker). Counted raw, that is four occupations of one person each; standardised, it is one occupation of four people. Without this step every cross-tabulation, every map, every comparison between two parishes is distorted by spelling and dialect rather than by real social structure.

What is HISCO and why use it?

HISCO is a five-digit international scheme covering occupational titles from about 1690 to 1970. Because it is shared, your coded data become comparable with hundreds of other projects and with derived class schemes. A few example codes:

Verbatim title	HISCO code	HISCO label
Shoemaker / cordwainer	80110	Shoemaker, general
Agricultural labourer	62105	Field crop farm worker
Schoolmaster	13320	Teacher, primary
Domestic servant	54020	Servant, domestic

How does a small worked example look?

Start with a tiny lookup dictionary and apply it, lowercasing and trimming first.

python

import pandas as pd

lookup = {
    "cordwainer": 80110, "shoemaker": 80110, "boot maker": 80110,
    "ag lab": 62105, "agricultural labourer": 62105,
    "schoolmaster": 13320,
}

df = pd.DataFrame({"raw": ["Cordwainer", "Ag Lab", " shoemaker ", "Mercer"]})
df["clean"] = df["raw"].str.lower().str.strip()
df["hisco"] = df["clean"].map(lookup)      # unmatched -> NaN
print(df)

Mercer comes back unmatched — that is exactly the residue you hand-code next, then add to the dictionary so it matches automatically forever after.

What is the practical workflow?

Extract the verbatim occupation strings to their own column.
Normalise case, whitespace, and obvious abbreviations.
Auto-match against a HISCO dictionary (the open histauto/OCCHISCO resources help here).
Hand-code the unmatched residue, recording each decision.
Grow the dictionary with every new decision so coverage improves over time.

Most large censuses resolve the bulk of entries by dictionary alone, leaving a manageable tail for human judgement.

HISCO is occupational, not social. To analyse mobility or inequality, translate codes with a companion scheme:

HISCLASS sorts HISCO into about twelve social-class groups.
HISCAM assigns a continuous social-interaction status score, handy for regression.

Because these are deterministic crosswalks, you code once into HISCO and derive class for free.

What mistakes do beginners make?

Overwriting the original. Always keep the verbatim string; you cannot audit or recode without it.
Coding to false precision. "Labourer" with no qualifier should go to a general code, not be guessed into "agricultural".
Ignoring gendered and informal work. Women's and unpaid labour are under-recorded; note this rather than reading silence as absence.
Inconsistent rules across a project. Write your coding decisions down so a second coder reproduces them.

Key Takeaways

Standardising maps messy job titles to one controlled code per occupation.
HISCO is the standard five-digit scheme for historical occupations, 1690 to 1970.
Auto-match the common titles with a dictionary; hand-code only the residue.
Always keep the verbatim string beside the assigned code for auditing.
Derive social class from HISCO using HISCLASS or HISCAM crosswalks.
Avoid false precision and record your coding rules for consistency.

Frequently Asked Questions

What does it mean to standardise occupations?

Standardising means mapping the many ways a job was written down to a single controlled code, so that 'cordwainer', 'shoemaker', and 'boot maker' all resolve to one category you can count and compare.

What is HISCO?

HISCO, the Historical International Standard Classification of Occupations, is a five-digit scheme covering occupational titles from roughly 1690 to 1970 across many countries. It is the standard target for coding historical job titles.

Why not just count the raw job titles?

Raw titles are inconsistent in spelling, language, and specificity, so counting them fragments one occupation into dozens of tiny groups. Standardising lets you compare across sources, places, and time.

Yes. Schemes like HISCLASS and HISCAM translate HISCO codes into social-class groups or a continuous status scale, so you can analyse mobility and stratification.

Do I have to code every title by hand?

No. Start with an automated dictionary lookup that handles the common titles, then hand-code only the residue that does not match. This typically resolves most entries automatically.

What should I keep alongside the code?

Always keep the original verbatim string next to its assigned code. You will need it to audit decisions and to recode if the classification or your rules change.

Why bother standardising at all? ​

What is HISCO and why use it? ​

How does a small worked example look? ​

What is the practical workflow? ​

How do I get social class from the codes? ​

What mistakes do beginners make? ​

Key Takeaways ​

Frequently Asked Questions ​

What does it mean to standardise occupations? ​

What is HISCO? ​

Why not just count the raw job titles? ​

Can I derive social class from HISCO codes? ​

Do I have to code every title by hand? ​

What should I keep alongside the code? ​

Related reading ​