Appearance
Crosswalk metadata when you genuinely have to move records into a system that speaks a different schema — and not before. The right question is not "how do I crosswalk?" but "is the loss worth it?" Crosswalking from a rich schema (MARC, MODS) to a simple one (Dublin Core) is nearly always lossy and one-way. Do it for harvesting, aggregation and migration; avoid it when both ends share a schema or when the distinctions you would lose are the whole point of your data.
What exactly is a crosswalk?
A crosswalk is a documented element-to-element mapping. MARC 100$a (main entry, personal name) maps to dc:creator; MARC 245$a maps to dc:title. It is a translation table plus rules for what happens when no clean equivalent exists. The mapping is the deliverable — the conversion script merely executes it.
When is crosswalking the right call?
Three situations justify it:
- Migration. You are moving a catalogue from a MARC-based ILS into a MODS-based digital repository.
- Aggregation. You harvest from many institutions whose records arrive in different schemas and you need one searchable index.
- Exposure. You hold rich MODS but must publish simple
oai_dcfor harvesters.
If both systems already use the same schema, or the consuming system can ingest your native format, do not crosswalk — you only add a lossy transformation step.
When should you NOT crosswalk?
Skip it when the cost outweighs the gain:
| Signal | Implication |
|---|---|
| Distinctions you would lose are research-critical | Loss is unacceptable; keep the rich schema |
| Target system can read your native schema | Crosswalk is redundant |
| One-off, tiny collection | Manual re-entry may be cheaper than building a mapping |
| Round-trip required (A to B and back) | Lossy crosswalks cannot round-trip faithfully |
The round-trip trap is the worst: crosswalk MODS to DC and back, and you do not recover the original — roles, multiple titles and hierarchy are gone for good.
How do you handle the inevitable loss?
Loss when mapping rich to simple is a feature of the schema gap, not a bug in your work. Manage it:
- Map to a note, never to nothing. Source elements with no target home go into a general note so the information survives, even if unstructured.
- Document unmapped elements. Keep a list of what was dropped or coarsened.
- Keep the source record. The rich original remains the source of truth; the crosswalked record is a derived view.
Should you use a hub schema?
For many-to-many problems, yes. Mapping five schemas to each other directly needs up to twenty crosswalks; mapping each to and from one hub needs ten. Pick a hub rich enough to absorb the inputs — MODS is a common choice because it sits between MARC and Dublin Core:
text
MARC ─┐
EAD ─┼──► MODS (hub) ──► Dublin Core (for harvesting)
CSV ─┘You maintain "X to hub" and "hub to Y" mappings, not every pair.
What tools do the work?
- XSLT for XML-to-XML. The reference stylesheets are worth starting from:
bash
xsltproc MARC21slim2MODS3-7.xsl marc-record.xml > record-mods.xml
xsltproc MODS3-7_DC_XSLT1-0.xsl record-mods.xml > record-dc.xml- MarcEdit for MARC binary/MARCXML conversions and bulk edits.
- OpenRefine for reshaping and reconciling tabular metadata before mapping.
Always validate the output against the target schema (xmllint --schema target.xsd) so a mapping bug surfaces immediately rather than at ingest.
What does a small mapping look like in practice?
A fragment of a MARC-to-DC crosswalk, written as documentation before any code:
| Source (MARC) | Target (DC) | Note |
|---|---|---|
245 $a $b | dc:title | Concatenate subfields with a space |
100 $a | dc:creator | Drop relator; loss noted |
260 $c / 264 $c | dc:date | Prefer 264 (RDA) when present |
650 $a | dc:subject | One dc:subject per 650 |
856 $u | dc:identifier | URL of resource |
Key Takeaways
- Crosswalk only to move between different schemas — migration, aggregation, harvesting.
- Rich-to-simple crosswalks are lossy and one-way; never expect a faithful round-trip.
- Do not crosswalk when both systems share a schema or the loss is research-critical.
- Route unmappable elements into a note; document everything you drop.
- Always keep the original rich record as the source of truth.
- Use a hub schema (often MODS) to cut N-times-N mappings down to 2N.
- XSLT, MarcEdit and OpenRefine are the standard tools; validate output against the target XSD.
Frequently Asked Questions
What is a metadata crosswalk?
A crosswalk is a documented mapping of elements in one metadata schema to the equivalent elements in another, for example MARC 245 to Dublin Core dc:title. It lets records move between systems while keeping as much meaning as possible.
When should I crosswalk metadata?
Crosswalk when you must move records into a system that uses a different schema, aggregate heterogeneous collections, or expose richer records as simpler ones for harvesting. Avoid it when both systems already share a schema or when the loss would be unacceptable.
Is crosswalking lossy?
Almost always, when mapping from a richer schema to a simpler one. MARC and MODS carry distinctions (roles, indicators, subfields) that flatten or vanish in Dublin Core, so plan for one-way loss and keep the source record.
Should I crosswalk in one step or via an intermediate schema?
For many-to-many situations, map every schema to and from one hub schema (often MODS or Dublin Core) instead of writing N-times-N direct mappings. It cuts the number of crosswalks you maintain dramatically.
What tools help with crosswalking?
XSLT is the workhorse for XML-to-XML crosswalks; MarcEdit handles MARC conversions; OpenRefine helps reconcile and reshape tabular metadata. The Library of Congress publishes reference XSLT stylesheets for common pairs.
How do I avoid silent data loss in a crosswalk?
Document every unmapped source element, route anything you cannot place into a note rather than dropping it, validate the output against the target schema, and always retain the original record as the source of truth.