Appearance
To preserve a relational database with SIARD, export it with the Database Preservation Toolkit (DBPTK) into a SIARD 2.1 container, validate that container against the specification, and confirm row counts and checksums match the source before you delete or decommission the original system. SIARD turns a live, software-dependent database into a single self-describing archive of XML, XSD and metadata that any future researcher can open without the original DBMS licence. The discipline is not the conversion itself — that takes minutes — but the documentation and verification around it that make the result defensible years later.
Why SIARD instead of a SQL dump?
A plain pg_dump or mysqldump ties you to a specific dialect and DBMS version. SIARD is software-independent: it stores the schema in standard XSD, the data in UTF-8 XML, and the structural metadata (keys, types, cardinalities, user-supplied descriptions) in a single metadata.xml. The container is a Zip64 archive, so it survives the 4 GB Zip limit and is openable with any unzip tool.
| Aspect | SQL dump | SIARD 2.1 |
|---|---|---|
| Software independence | Tied to DBMS dialect | Vendor-neutral XML/XSD |
| Self-describing metadata | Comments only | Structured metadata.xml |
| Documentation fields | None | Per-table/column descriptions |
| BLOB handling | Inline binary | External files or Base64 |
| Long-term openability | Needs a compatible DBMS | Any unzip + text reader |
How do I create a SIARD file with DBPTK?
Use the CLI for repeatable, scriptable runs. A typical export from PostgreSQL looks like this:
bash
dbptk migrate \
--import postgresql \
--import-hostname db.example.org \
--import-database parish_registers \
--import-username archivist \
--import-password "$PGPASS" \
--export siard-2 \
--export-file parish_registers.siard \
--export-external-lobsThe --export-external-lobs flag pushes BLOBs into a side folder rather than bloating the XML. After the run, open the .siard with unzip -l to confirm header/, content/ and metadata.xml are present.
What metadata should I add before archiving?
DBPTK exports structure but not context. Before sealing the container, edit metadata.xml (or use DBPTK Desktop's editor) to populate:
- A
dbname,descriptionanddataOwnerfor the whole database. - A
descriptionper schema and per table explaining what it held. - Column-level notes for any coded or cryptic field (e.g. status flags).
- The source system name and version, and the extraction date.
These free-text fields are the difference between a future archivist understanding the data and staring at twelve tables of integers.
How do I validate a SIARD file?
Validation is non-negotiable. Run:
bash
dbptk validate --file parish_registers.siard \
--report validation_report.txtThen cross-check independently: compare SELECT count(*) from each source table against the row counts DBPTK reports, and record a SHA-256 of the container. A passing validation that nonetheless drops 3% of rows is a silent failure — count-checking catches it.
What does SIARD not capture?
SIARD preserves declarative structure, not behaviour. It will not carry stored procedures, triggers, functions, scheduled jobs, access-control grants or application logic. Export those as plain-text source files and store them alongside the .siard. Note also that very wide tables with many large CLOBs export slowly; budget time and disk for the external-LOB folder.
A working preservation checklist
- Freeze or snapshot the source database so it cannot change mid-export.
- Run DBPTK with external LOBs; capture the full command in a log.
- Enrich
metadata.xmlwith descriptions and provenance. - Validate against SIARD 2.1 and resolve every error.
- Verify row counts table-by-table against the source.
- Export procedural code separately as documentation.
- Compute and store a SHA-256 checksum of the container.
- Wrap the
.siardplus its sidecar docs in a BagIt bag for transfer.
Key Takeaways
- SIARD packages an entire relational database into one vendor-neutral, self-describing container.
- Prefer SIARD 2.1 (eCH-0165) unless your national archive mandates SIARD DK.
- DBPTK is the standard open-source tool; use the CLI for repeatable runs.
- Always export BLOBs externally to keep the XML manageable.
- Enrich
metadata.xmlwith human descriptions before sealing. - Validate, then independently confirm row counts and a checksum.
- SIARD does not preserve procedural logic — document it separately.
Frequently Asked Questions
What is SIARD and why use it for database preservation?
SIARD (Software Independent Archiving of Relational Databases) is an open, vendor-neutral format that packages a relational database — schema, data, primary and foreign keys, views and metadata — into a single Zip64 container of XML and XSD files. It lets you keep a database readable without the original DBMS.
What is the difference between SIARD 2.1 and SIARD DK?
SIARD 2.1 is the eCH-0165 standard maintained by the Swiss Federal Archives and supports BLOBs/CLOBs stored externally and richer metadata. SIARD DK is the Danish National Archives variant with stricter submission rules; pick the one your repository or national archive mandates.
Which tool should I use to create a SIARD file?
The Database Preservation Toolkit (DBPTK) by the KEEP SOLUTIONS / E-ARK project is the de facto open-source converter, with a CLI and the DBPTK Desktop GUI. SIARD Suite from the Swiss Federal Archives is the reference implementation.
Does SIARD preserve stored procedures and triggers?
No. SIARD captures schema, tables, keys, views and metadata but does not execute or preserve procedural logic such as stored procedures, triggers or functions. Export their source code separately as documentation.
How do I validate a SIARD file after export?
Run the DBPTK validation command (dbptk validate) against the SIARD 2.1 specification, then independently verify row counts and a checksum of the container. Treat a passing validation plus matching row counts as your minimum acceptance gate.
How are large binary objects (BLOBs) handled in SIARD?
SIARD 2.1 can store BLOBs and CLOBs either inline as Base64 or externally as separate files referenced from the XML, which keeps the main content files smaller and avoids ballooning a single XML document.