Beginner's Guide to A prosopographical database

A prosopographical database is a structured collection of people defined by a shared characteristic — a profession, a place, an institution — recording their attributes and relationships so you can study the group as a whole. To build one as a beginner, start with three linked tables (persons, sources, and factoids), record each historical claim as a single evidence-linked factoid, and grow from there. You can stand this up in SQLite in an afternoon.

Prosopography is collective biography. Rather than writing the life of one bishop, you ask what all the bishops of a diocese over two centuries had in common: where they trained, who their patrons were, how long they served. The database is the instrument that makes those patterns visible.

What goes in a prosopographical database?

Three things, and keeping them distinct is the whole trick:

Persons — one record per real individual, with a stable local identifier.
Sources — the documents your claims come from.
Factoids — individual assertions that connect a person, a source, and a fact.

A factoid is a sentence like "the 1582 visitation records that Richard Hooker was rector of Drayton". It binds the claim to its evidence, which is what separates scholarship from a rumour list.

Why is the factoid model better than a spreadsheet?

A spreadsheet wants one row per person, but people are not one-dimensional. One person holds several offices, appears in several sources, and attracts conflicting claims. Cram that into a single row and you either lose information or create a tangle of office1, office2, office3 columns. The factoid model lets a person have any number of factoids, each independently sourced.

sql

CREATE TABLE person (
    person_id   TEXT PRIMARY KEY,   -- local stable id, e.g. p0001
    display_name TEXT NOT NULL,
    viaf        TEXT,               -- optional authority link
    wikidata    TEXT
);

CREATE TABLE source (
    source_id   TEXT PRIMARY KEY,
    citation    TEXT NOT NULL
);

CREATE TABLE factoid (
    factoid_id  INTEGER PRIMARY KEY,
    person_id   TEXT REFERENCES person(person_id),
    source_id   TEXT REFERENCES source(source_id),
    type        TEXT,    -- office, kinship, residence, event
    value       TEXT,
    date_edtf   TEXT,    -- e.g. 1582 or 1580~
    note        TEXT
);

A small worked example

Suppose two sources mention a "Thomas Wode". Record one person and two factoids:

sql

INSERT INTO person VALUES ('p0001', 'Thomas Wode', NULL, NULL);
INSERT INTO source VALUES ('s01', 'Visitation of Kent, 1574');
INSERT INTO source VALUES ('s02', 'Parish register, Cranbrook');
INSERT INTO factoid VALUES (1,'p0001','s01','office','curate','1574','');
INSERT INTO factoid VALUES (2,'p0001','s02','kinship','married Joan Pell','1576','');

Now a single query — "list every office and its source for Thomas Wode" — returns evidence-backed facts, not a guess.

How do I handle people I am not sure are the same?

Do not force a merge. Keep them as two person records and add a relationship note such as "possibly identical with p0001, same parish, no date conflict". When stronger evidence appears, you merge and the factoids follow. This is gentler and more honest than collapsing them early and discovering the mistake later.

How does conflicting evidence get stored?

This is where the model earns its keep. If one source says a person died in 1610 and another says 1612, you store both as death factoids, each pointing at its source. The database represents the disagreement faithfully; your analysis or footnotes can then weigh them. Hiding the conflict by choosing one date silently is exactly what scholarly databases must avoid.

What tools should a beginner reach for?

Start with SQLite — it is a single file, needs no server, and the schema above runs as-is. The free DB Browser for SQLite gives you a graphical view. When the project outgrows a single file — many editors, web access, a public interface — migrate to PostgreSQL, a Django-based factoid framework like those behind major national prosopographies, or a graph database such as Neo4j if relationships dominate your questions.

How do I connect my database to the wider scholarly web?

Leave optional viaf and wikidata columns on the person table. Fill them only for individuals who genuinely have an authority record; mint local identifiers for everyone else. Later you can publish as linked open data and let other projects cite your people by stable id — without ever inventing a false link to a famous namesake.

Key Takeaways

Prosopography studies a defined group of people collectively, not as separate biographies.
Use three tables: persons, sources, and evidence-linked factoids.
The factoid model handles many offices, multiple sources, and conflict — a spreadsheet cannot.
Store conflicting claims as separate sourced factoids rather than picking one.
Keep uncertain identities as separate records with a "possibly same" note.
SQLite is plenty to start; scale up to Postgres, Django, or Neo4j later.
Add optional VIAF/Wikidata columns and mint local ids for everyone else.

Frequently Asked Questions

What is a prosopographical database?

It is a structured record of a defined group of people, capturing not just who they were but their attributes and relationships — offices held, kin, places, and events — so you can study the group collectively rather than as isolated biographies.

What is the core data model for prosopography?

A factoid model is the standard: separate tables for persons, sources, and factoids, where each factoid is a single assertion ("source S says person P held office O in year Y"). This keeps claims tied to evidence.

Why not just use one big spreadsheet?

A flat spreadsheet forces one row per person and cannot hold many offices, multiple sources, or conflicting claims cleanly. A relational model with a factoid table handles uncertainty and one-to-many facts properly.

How do I handle conflicting evidence about a person?

Store both claims as separate factoids, each linked to its source. The database records that the sources disagree rather than forcing you to pick one and hide the conflict.

What tools can a beginner use?

SQLite with a few tables is enough to start, and entirely free. For larger projects, the factoid-based frameworks behind major prosopographies (such as those built in Django) or a graph database like Neo4j are common steps up.

How do I link my people to the wider world?

Add optional identifier columns for VIAF or Wikidata where a person genuinely has one, and mint your own stable local identifiers for everyone else. This lets you connect to linked open data later without forcing false links.

What goes in a prosopographical database? ​

Why is the factoid model better than a spreadsheet? ​

A small worked example ​

How do I handle people I am not sure are the same? ​

How does conflicting evidence get stored? ​

What tools should a beginner reach for? ​

How do I connect my database to the wider scholarly web? ​

Key Takeaways ​

Frequently Asked Questions ​

What is a prosopographical database? ​

What is the core data model for prosopography? ​

Why not just use one big spreadsheet? ​

How do I handle conflicting evidence about a person? ​

What tools can a beginner use? ​

How do I link my people to the wider world? ​

Related reading ​