Best Practices to Plan retrospective catalogue conversion

Plan retrospective catalogue conversion by writing a conversion specification first, deciding explicitly whether you are converting faithfully or enhancing as you go, mapping legacy fields to a standard like ISAD(G) or DACS, and building QA sampling into every batch. Retroconversion turns legacy finding aids — card indexes, typescript handlists, old database dumps — into structured electronic records that can go online. The programmes that fail do so because they started keying before agreeing the rules; the ones that succeed treat it as a documented, auditable data-migration project.

What exactly are you converting, and to what?

First, inventory your source formats, because each behaves differently:

Card indexes — slow, often inconsistent, frequently the richest in subject detail.
Typescript / printed lists — convert fast, may OCR well.
Legacy database exports — fastest if the schema maps cleanly, but watch for non-standard fields and free-text dumping grounds.
Manuscript handlists — slowest, may need transcription before structuring.

Then fix the target: a controlled CSV mapped to ISAD(G)/DACS elements, or EAD XML for systems that ingest it. Decide this before keying a single record.

How do you decide between faithful conversion and enhancement?

This is the decision that governs cost and consistency. Make it a written policy.

Policy	What it means	Cost	Risk
Faithful	Convert as-is, light normalisation only	Low	Carries forward old errors
Enhanced	Improve dates, add access points, fix arrangement	High	Scope creep, inconsistency
Hybrid	Faithful baseline + named enhancement passes	Medium	Needs clear scope per pass

The defensible default is faithful conversion with light normalisation (standardising date formats, expanding obvious abbreviations), treating enhancement as a separate, costed activity. Silently enhancing some records and not others is how you end up with a catalogue nobody trusts.

How do you write a conversion specification?

The specification is the single source of truth every cataloguer follows. It should pin down:

Field mapping — which legacy field goes to which standard element.
Normalisation rules — date conventions, capitalisation, abbreviations.
Reference-code policy — preserve, or assign-with-concordance.
What to record verbatim vs interpret — transcription vs supplied content.
What to skip or flag — ambiguous entries route to a senior reviewer.

text

Date normalisation rule (from the spec):
  Source "c1850s"          -> "[185-]"  (decade uncertain)
  Source "1923-24"         -> "1923-1924"
  Source "n.d."            -> "[n.d.]" + flag for review
  Source "circa 1900"      -> "[c.1900]"
Reference codes: preserve original; if absent, assign GB-0042/<series>/<seq>
  and add row to concordance.csv

How do you keep quality consistent at scale?

Consistency is a process, not a hope. Three mechanisms:

Templates — a locked spreadsheet, one column per element, with data validation so cataloguers cannot invent fields.
QA sampling — check a fixed percentage (commonly 5–10%) of every batch against the source. Track error rates by cataloguer and by source type; if a batch fails, return the whole batch.
A decision log — every judgement call ("treated undated postcards as series-level") is recorded so the next person is consistent with the last.

bash

# Sanity-check a converted batch before import: flag rows missing key fields
python qa_check.py batch_07.csv --require identifier,title,date,level \
  --report batch_07_errors.csv

Why pilot before you estimate the whole programme?

Never quote a timeline for the full conversion until you have run a representative pilot. Source quality drives throughput more than anything else — a clean typescript list might convert at several times the rate of a handwritten card index. Pilot a representative sample of each source type, measure records per hour including QA, and extrapolate from real numbers. Estimates built on optimism rather than a pilot are the most common reason retroconversion programmes blow their budgets.

How do you protect reference codes and citations?

Existing reference codes are how published research cites your holdings. Preserve them. Where you must assign new codes — because a legacy list never had any — maintain a concordance table mapping any old identifier to the new one, so older citations and internal cross-references can still be resolved. Treat the concordance as a permanent deliverable, not a working file.

A pre-launch checklist

Before the programme scales up:

Source formats inventoried; pilot run for each.
Conversion specification written and approved.
Faithful-vs-enhanced policy decided and documented.
Field mapping to ISAD(G)/DACS locked in templates.
Reference-code policy and concordance process set.
QA sampling rate and pass/fail rules agreed.
Decision log started.
Throughput measured from the pilot, not guessed.

Key Takeaways

Write the conversion specification before keying anything — it is the rulebook for the whole programme.
Decide explicitly between faithful conversion and enhancement, and never mix them silently.
Map legacy fields to a standard (ISAD(G)/DACS) using locked, validated templates.
Preserve original reference codes; where you assign new ones, keep a permanent concordance.
Build QA sampling (5–10% of every batch) and a decision log into the process for consistency.
Pilot each source type and estimate throughput from real records-per-hour, not optimism.

Frequently Asked Questions

What is retrospective catalogue conversion?

Retrospective conversion, or retroconversion, is the process of turning legacy finding aids — card indexes, typescript lists, handlists, or old database exports — into structured, standards-compliant electronic catalogue records. The aim is online access without re-surveying the physical material from scratch.

Should I convert exactly what the old catalogue says, or improve it?

Decide and document a policy up front. The cheapest, most defensible approach is faithful conversion with light normalisation; enhancement is a separate, costed activity. Mixing the two silently produces an inconsistent catalogue nobody can trust.

How do I keep quality consistent across a big conversion?

Write a conversion specification, use controlled templates, sample-check a percentage of every batch, and log decisions. Consistency comes from a written rulebook plus QA sampling, not from individual cataloguers' judgement.

What is the best format to convert legacy lists into?

Convert into a structured format your catalogue system imports cleanly — typically a controlled CSV mapped to ISAD(G) or DACS elements, or EAD XML. Spreadsheets with one column per element scale well and are easy to QA.

How do I handle reference codes during conversion?

Preserve original reference codes wherever they exist, because researchers and citations rely on them. Where you must assign new codes, keep a concordance table mapping old to new so nothing is orphaned.

How long does retrospective conversion take?

It varies hugely with source quality, but plan in records-per-hour for your specific sources after a pilot. A typed list converts far faster than a handwritten card index, so always pilot a representative sample before estimating the whole programme.

What exactly are you converting, and to what? ​

How do you decide between faithful conversion and enhancement? ​

How do you write a conversion specification? ​

How do you keep quality consistent at scale? ​

Why pilot before you estimate the whole programme? ​

How do you protect reference codes and citations? ​

A pre-launch checklist ​

Key Takeaways ​

Frequently Asked Questions ​

What is retrospective catalogue conversion? ​

Should I convert exactly what the old catalogue says, or improve it? ​

How do I keep quality consistent across a big conversion? ​

What is the best format to convert legacy lists into? ​

How do I handle reference codes during conversion? ​

How long does retrospective conversion take? ​

Related reading ​