Troubleshooting: Store historical data in SQLite

Q: Why are leading zeros disappearing from my reference numbers?

A catalogue reference like `0042` stored in an INTEGER column becomes 42. Define those columns as TEXT explicitly, because SQLite's type affinity converts numeric-looking strings unless you force it.

Most problems storing historical data in SQLite trace back to one fact: SQLite uses dynamic typing, so columns do not enforce the types you assume. The fixes are to store dates as ISO-8601 strings, force TEXT affinity on reference codes, keep transactions short, and validate on insert. Get those four right and the majority of "corrupted" history databases never happen.

This troubleshooting guide walks the errors archivists actually hit, with the diagnosis and the fix for each.

Why does SQLite mangle my dates and reference numbers?

SQLite's type affinity means a column declared INTEGER will convert numeric-looking text. A reference 0042 loses its leading zeros; a date 1066 may be fine but 1066-10-14 stored in an integer column becomes 1066 after arithmetic coercion in some queries.

Force the type and validate:

sql

CREATE TABLE records (
  id        INTEGER PRIMARY KEY,
  ref       TEXT NOT NULL,            -- keeps '0042'
  event_date TEXT,                    -- ISO-8601 'YYYY-MM-DD'
  date_raw  TEXT,                     -- original 'c. 1640'
  CHECK (event_date IS NULL OR event_date GLOB '[0-9][0-9][0-9][0-9]-*')
);

The CHECK constraint rejects malformed dates at insert time, which is far cheaper than discovering them months later.

How do I fix 'database is locked'?

This is the single most reported SQLite error. The cause is concurrent writes against the default rollback journal. Diagnose by checking for an open transaction or an idle connection in a notebook.

python

import sqlite3
con = sqlite3.connect("history.db", timeout=10)
con.execute("PRAGMA journal_mode=WAL")   # readers no longer block the writer
con.execute("PRAGMA busy_timeout=5000")  # wait 5s instead of failing instantly

WAL mode lets readers continue while one writer works — ideal for a single-curator workflow where a script writes and a notebook reads.

Why is my UTF-8 text turning into mojibake?

Garbled accents (é becoming Ã©) mean bytes were inserted under the wrong encoding. SQLite stores text as-is; it does not transcode. Fix the source, not the database:

python

with open("source.csv", encoding="utf-8") as fh:
    rows = list(csv.reader(fh))
# insert decoded strings; do NOT pass raw bytes

Re-import the affected rows. Patching individual cells almost always leaves some corruption behind.

How should I model uncertain historical dates?

Historians' dates are frequently imprecise, and forcing them into one column destroys information. Split the concern:

Column	Type	Example	Purpose
`date_sort`	TEXT (ISO)	`1640-01-01`	machine sorting / range queries
`date_raw`	TEXT	`c. 1640`	the historian's exact wording
`date_precision`	TEXT	`decade`	how much to trust `date_sort`

Queries sort on date_sort; display always shows date_raw. You never silently invent precision.

Why are my queries slow on a small database?

Size is rarely the culprit — missing indexes are. SQLite scans the whole table for any unindexed filter. Add indexes on the columns you join and filter:

sql

CREATE INDEX idx_records_ref  ON records(ref);
CREATE INDEX idx_records_date ON records(date_sort);

Run EXPLAIN QUERY PLAN SELECT ... to confirm SQLite uses the index rather than a SCAN TABLE.

A quick diagnostic checklist

When something looks wrong, check in this order:

Did an insert coerce a type? Run SELECT typeof(col) FROM t LIMIT 20.
Is a stray connection holding a lock? Restart the kernel; enable WAL.
Is the encoding wrong at the source, not the column?
Does the slow query show SCAN TABLE in its plan?
Did a CHECK constraint or precision split get skipped on import?

Key Takeaways

SQLite's dynamic typing is the root of most "corruption"; force TEXT affinity on references and dates.
Store dates as ISO-8601 strings plus a raw-wording column; never overwrite a historian's imprecision.
Fix database is locked with WAL mode and a busy timeout, and by closing stray connections.
Mojibake comes from the source encoding — re-import decoded UTF-8 rather than patching cells.
Slow queries mean missing indexes, not large data; check with EXPLAIN QUERY PLAN.
Validate on insert with CHECK constraints so bad data never lands in the table.

Frequently Asked Questions

Why does SQLite turn my years into the wrong values?

SQLite has no native date type and uses dynamic typing, so a column can silently mix integers and text. Store dates as ISO-8601 strings (YYYY-MM-DD) or four-digit year integers, and validate on insert rather than trusting the column.

How do I fix 'database is locked' errors?

This usually means a second connection holds a write lock, often an open transaction or a notebook cell that never committed. Close stray connections, keep transactions short, and enable WAL mode with PRAGMA journal_mode=WAL for concurrent reads.

Why are leading zeros disappearing from my reference numbers?

A catalogue reference like 0042 stored in an INTEGER column becomes 42. Define those columns as TEXT explicitly, because SQLite's type affinity converts numeric-looking strings unless you force it.

Should I store uncertain or approximate dates in SQLite?

Yes, but split them: keep a machine-sortable date plus a separate text column for the original expression like 'c. 1640' or 'before Michaelmas'. Never overwrite the historian's wording with a guessed precise date.

How big can a SQLite history database get before it struggles?

SQLite handles databases into the hundreds of gigabytes comfortably for read-heavy archival work. Performance problems are almost always missing indexes, not size — add an index on every column you filter or join on.

Why does my UTF-8 text come back as mojibake?

The data was likely inserted as bytes in another encoding. Open the connection with text factory defaults, ensure source files are decoded as UTF-8 before insert, and re-import the affected rows rather than patching them in place.

Why does SQLite mangle my dates and reference numbers? ​

How do I fix 'database is locked'? ​

Why is my UTF-8 text turning into mojibake? ​

How should I model uncertain historical dates? ​

Why are my queries slow on a small database? ​

A quick diagnostic checklist ​

Key Takeaways ​

Frequently Asked Questions ​

Why does SQLite turn my years into the wrong values? ​

How do I fix 'database is locked' errors? ​

Why are leading zeros disappearing from my reference numbers? ​

Should I store uncertain or approximate dates in SQLite? ​

How big can a SQLite history database get before it struggles? ​

Why does my UTF-8 text come back as mojibake? ​

Related reading ​