Appearance
Most problems storing historical data in SQLite trace back to one fact: SQLite uses dynamic typing, so columns do not enforce the types you assume. The fixes are to store dates as ISO-8601 strings, force TEXT affinity on reference codes, keep transactions short, and validate on insert. Get those four right and the majority of "corrupted" history databases never happen.
This troubleshooting guide walks the errors archivists actually hit, with the diagnosis and the fix for each.
Why does SQLite mangle my dates and reference numbers?
SQLite's type affinity means a column declared INTEGER will convert numeric-looking text. A reference 0042 loses its leading zeros; a date 1066 may be fine but 1066-10-14 stored in an integer column becomes 1066 after arithmetic coercion in some queries.
Force the type and validate:
sql
CREATE TABLE records (
id INTEGER PRIMARY KEY,
ref TEXT NOT NULL, -- keeps '0042'
event_date TEXT, -- ISO-8601 'YYYY-MM-DD'
date_raw TEXT, -- original 'c. 1640'
CHECK (event_date IS NULL OR event_date GLOB '[0-9][0-9][0-9][0-9]-*')
);The CHECK constraint rejects malformed dates at insert time, which is far cheaper than discovering them months later.
How do I fix 'database is locked'?
This is the single most reported SQLite error. The cause is concurrent writes against the default rollback journal. Diagnose by checking for an open transaction or an idle connection in a notebook.
python
import sqlite3
con = sqlite3.connect("history.db", timeout=10)
con.execute("PRAGMA journal_mode=WAL") # readers no longer block the writer
con.execute("PRAGMA busy_timeout=5000") # wait 5s instead of failing instantlyWAL mode lets readers continue while one writer works — ideal for a single-curator workflow where a script writes and a notebook reads.
Why is my UTF-8 text turning into mojibake?
Garbled accents (é becoming é) mean bytes were inserted under the wrong encoding. SQLite stores text as-is; it does not transcode. Fix the source, not the database:
python
with open("source.csv", encoding="utf-8") as fh:
rows = list(csv.reader(fh))
# insert decoded strings; do NOT pass raw bytesRe-import the affected rows. Patching individual cells almost always leaves some corruption behind.
How should I model uncertain historical dates?
Historians' dates are frequently imprecise, and forcing them into one column destroys information. Split the concern:
| Column | Type | Example | Purpose |
|---|---|---|---|
date_sort | TEXT (ISO) | 1640-01-01 | machine sorting / range queries |
date_raw | TEXT | c. 1640 | the historian's exact wording |
date_precision | TEXT | decade | how much to trust date_sort |
Queries sort on date_sort; display always shows date_raw. You never silently invent precision.
Why are my queries slow on a small database?
Size is rarely the culprit — missing indexes are. SQLite scans the whole table for any unindexed filter. Add indexes on the columns you join and filter:
sql
CREATE INDEX idx_records_ref ON records(ref);
CREATE INDEX idx_records_date ON records(date_sort);Run EXPLAIN QUERY PLAN SELECT ... to confirm SQLite uses the index rather than a SCAN TABLE.
A quick diagnostic checklist
When something looks wrong, check in this order:
- Did an insert coerce a type? Run
SELECT typeof(col) FROM t LIMIT 20. - Is a stray connection holding a lock? Restart the kernel; enable WAL.
- Is the encoding wrong at the source, not the column?
- Does the slow query show
SCAN TABLEin its plan? - Did a
CHECKconstraint or precision split get skipped on import?
Key Takeaways
- SQLite's dynamic typing is the root of most "corruption"; force TEXT affinity on references and dates.
- Store dates as ISO-8601 strings plus a raw-wording column; never overwrite a historian's imprecision.
- Fix
database is lockedwith WAL mode and a busy timeout, and by closing stray connections. - Mojibake comes from the source encoding — re-import decoded UTF-8 rather than patching cells.
- Slow queries mean missing indexes, not large data; check with
EXPLAIN QUERY PLAN. - Validate on insert with
CHECKconstraints so bad data never lands in the table.
Frequently Asked Questions
Why does SQLite turn my years into the wrong values?
SQLite has no native date type and uses dynamic typing, so a column can silently mix integers and text. Store dates as ISO-8601 strings (YYYY-MM-DD) or four-digit year integers, and validate on insert rather than trusting the column.
How do I fix 'database is locked' errors?
This usually means a second connection holds a write lock, often an open transaction or a notebook cell that never committed. Close stray connections, keep transactions short, and enable WAL mode with PRAGMA journal_mode=WAL for concurrent reads.
Why are leading zeros disappearing from my reference numbers?
A catalogue reference like 0042 stored in an INTEGER column becomes 42. Define those columns as TEXT explicitly, because SQLite's type affinity converts numeric-looking strings unless you force it.
Should I store uncertain or approximate dates in SQLite?
Yes, but split them: keep a machine-sortable date plus a separate text column for the original expression like 'c. 1640' or 'before Michaelmas'. Never overwrite the historian's wording with a guessed precise date.
How big can a SQLite history database get before it struggles?
SQLite handles databases into the hundreds of gigabytes comfortably for read-heavy archival work. Performance problems are almost always missing indexes, not size — add an index on every column you filter or join on.
Why does my UTF-8 text come back as mojibake?
The data was likely inserted as bytes in another encoding. Open the connection with text factory defaults, ensure source files are decoded as UTF-8 before insert, and re-import the affected rows rather than patching them in place.