Appearance
Record provenance metadata by capturing, for every object, the chain of custody (who owned, created, transferred or modified it) and the processing events applied to it, each with an agent, a date and an outcome. Use PREMIS for digital preservation events, a structured custodial-history note for ownership, and PROV-O or CIDOC CRM where you need linked-data or event-rich modelling. The discipline is consistency: the same fields, the same vocabulary, for every record, so the history is defensible across the whole collection.
What are the two kinds of provenance?
Conflating these causes most provenance confusion:
- Custodial provenance — the human history of ownership, donation, sale and transfer. This is what restitution and due-diligence research depends on.
- Digital provenance — the automated processing history of a digital file: ingest, format migration, fixity check, normalisation. This is what authenticity of the bitstream depends on.
A digitised painting has both: a custodial chain back to the artist and a digital chain of processing events on its TIFF.
Which standard fits which job?
| Need | Standard | Granularity |
|---|---|---|
| Digital preservation events | PREMIS | Event-level (per action) |
| Linked-data provenance | PROV-O (W3C) | Entity / activity / agent |
| Rich cultural-object history | CIDOC CRM | Event-based, highly expressive |
| Quick narrative custody | Free-text note | Human-readable summary |
A pragmatic default for most collections: PREMIS events for the digital side, plus a structured custodialHistory note for ownership, upgrading to PROV-O when you publish linked data.
How do I structure a single provenance event?
Every event answers four questions: what happened, when, who did it, and with what result. In PREMIS:
xml
<premis:event>
<premis:eventType>migration</premis:eventType>
<premis:eventDateTime>2025-03-11T09:42:00Z</premis:eventDateTime>
<premis:eventDetail>TIFF 6.0 to JP2 via OpenJPEG 2.5</premis:eventDetail>
<premis:eventOutcomeInformation>
<premis:eventOutcome>success</premis:eventOutcome>
</premis:eventOutcomeInformation>
<premis:linkingAgentIdentifier>
<premis:linkingAgentIdentifierValue>OpenJPEG 2.5</premis:linkingAgentIdentifierValue>
</premis:linkingAgentIdentifier>
</premis:event>The same four-part shape applies to a custodial event ("acquired by gift from the Smith estate, 1987, recorded by J. Doe") — only the vocabulary changes.
A working checklist for the whole collection
Run this before you scale beyond a pilot:
- Decide your standards split (e.g. PREMIS + custodial note) and write it into the application profile.
- Define a closed list of event types (
ingest,migration,fixity check,acquisition,transfer,deaccession). - Require an agent, a UTC timestamp and an outcome on every event — no exceptions.
- Use persistent identifiers for agents so "J. Smith" is unambiguous.
- Distinguish certainty: mark inferred custody steps as such rather than asserting them.
- Automate digital events at ingest; template custodial events for consistency.
- Validate: every object should have at least an acquisition event and an ingest event.
bash
# objects missing a baseline provenance event
comm -23 <(sort all_objects.txt) <(sort objects_with_ingest_event.txt)How do I keep custodial history honest?
Provenance is where uncertainty must be visible, not hidden. Distinguish documented facts from inference:
text
1923 Painted by the artist (documented: signature, dated)
1923–1961 [gap, no records]
1961 Sold at auction, Lot 44 (documented: catalogue)
1961–1987 Private collection (inferred from later donor statement)
1987 Gift to the museum (documented: deed of gift)Recording the gap explicitly is itself good provenance. A chain that hides its uncertainty is worse than one that admits it, because future researchers cannot tell assertion from evidence.
Can I automate it?
Digital provenance: yes. Archivematica writes PREMIS events automatically for every ingest, migration and fixity check, producing a defensible machine record with no manual effort. Custodial provenance still needs human research, but you can template the event structure so each entry has the same agent/date/outcome/certainty fields, keeping a hand-built history as consistent as an automated one.
Key Takeaways
- Capture both custodial provenance (ownership) and digital provenance (processing events).
- Every event needs an agent, a date and an outcome — the four-question shape.
- Use PREMIS for digital events, a structured custodial note for ownership, PROV-O/CIDOC CRM for linked or rich modelling.
- Constrain event types to a closed list and use persistent identifiers for agents.
- Make uncertainty explicit: record gaps and mark inferred steps rather than asserting them.
- Automate digital provenance with tools like Archivematica; template custodial entries for consistency.
Frequently Asked Questions
What is provenance metadata?
Provenance metadata records the chain of custody and the history of who created, owned, modified or transferred an object over time. For digital objects it also captures the processing events that produced the current file.
What standard should I use for provenance?
Use PREMIS for digital preservation provenance, PROV-O when you need linked-data provenance, and CIDOC CRM for rich event-based cultural-object histories. Many collections use PREMIS events plus a free-text custodial history note.
What is the difference between custodial and digital provenance?
Custodial provenance is the human ownership and transfer history of an object. Digital provenance is the record of automated processing events, such as format migrations and fixity checks, applied to a digital file.
How detailed should provenance records be?
Record every event that changes ownership, location or the bitstream, with an agent, a date and an outcome. You do not need to log routine reads, but any transformation, transfer or rights change must be captured.
Why is provenance metadata important?
It underpins authenticity, supports due-diligence and restitution research, and lets future curators trust or question a record. Without it, a digital object's integrity and a cultural object's title cannot be defended.
Can I automate provenance capture?
Digital provenance, yes — tools like Archivematica write PREMIS events automatically during ingest. Custodial provenance still requires human research and structured recording, though it can be templated.