When to Crosswalk metadata between schemas

Crosswalk metadata when you genuinely have to move records into a system that speaks a different schema — and not before. The right question is not "how do I crosswalk?" but "is the loss worth it?" Crosswalking from a rich schema (MARC, MODS) to a simple one (Dublin Core) is nearly always lossy and one-way. Do it for harvesting, aggregation and migration; avoid it when both ends share a schema or when the distinctions you would lose are the whole point of your data.

What exactly is a crosswalk?

A crosswalk is a documented element-to-element mapping. MARC 100$a (main entry, personal name) maps to dc:creator; MARC 245$a maps to dc:title. It is a translation table plus rules for what happens when no clean equivalent exists. The mapping is the deliverable — the conversion script merely executes it.

When is crosswalking the right call?

Three situations justify it:

Migration. You are moving a catalogue from a MARC-based ILS into a MODS-based digital repository.
Aggregation. You harvest from many institutions whose records arrive in different schemas and you need one searchable index.
Exposure. You hold rich MODS but must publish simple oai_dc for harvesters.

If both systems already use the same schema, or the consuming system can ingest your native format, do not crosswalk — you only add a lossy transformation step.

When should you NOT crosswalk?

Skip it when the cost outweighs the gain:

Signal	Implication
Distinctions you would lose are research-critical	Loss is unacceptable; keep the rich schema
Target system can read your native schema	Crosswalk is redundant
One-off, tiny collection	Manual re-entry may be cheaper than building a mapping
Round-trip required (A to B and back)	Lossy crosswalks cannot round-trip faithfully

The round-trip trap is the worst: crosswalk MODS to DC and back, and you do not recover the original — roles, multiple titles and hierarchy are gone for good.

How do you handle the inevitable loss?

Loss when mapping rich to simple is a feature of the schema gap, not a bug in your work. Manage it:

Map to a note, never to nothing. Source elements with no target home go into a general note so the information survives, even if unstructured.
Document unmapped elements. Keep a list of what was dropped or coarsened.
Keep the source record. The rich original remains the source of truth; the crosswalked record is a derived view.

Should you use a hub schema?

For many-to-many problems, yes. Mapping five schemas to each other directly needs up to twenty crosswalks; mapping each to and from one hub needs ten. Pick a hub rich enough to absorb the inputs — MODS is a common choice because it sits between MARC and Dublin Core:

text

MARC  ─┐
EAD   ─┼──►  MODS (hub)  ──►  Dublin Core (for harvesting)
CSV   ─┘

You maintain "X to hub" and "hub to Y" mappings, not every pair.

What tools do the work?

XSLT for XML-to-XML. The reference stylesheets are worth starting from:

bash

xsltproc MARC21slim2MODS3-7.xsl marc-record.xml > record-mods.xml
xsltproc MODS3-7_DC_XSLT1-0.xsl record-mods.xml > record-dc.xml

MarcEdit for MARC binary/MARCXML conversions and bulk edits.
OpenRefine for reshaping and reconciling tabular metadata before mapping.

Always validate the output against the target schema (xmllint --schema target.xsd) so a mapping bug surfaces immediately rather than at ingest.

What does a small mapping look like in practice?

A fragment of a MARC-to-DC crosswalk, written as documentation before any code:

Source (MARC)	Target (DC)	Note
`245 $a $b`	`dc:title`	Concatenate subfields with a space
`100 $a`	`dc:creator`	Drop relator; loss noted
`260 $c` / `264 $c`	`dc:date`	Prefer 264 (RDA) when present
`650 $a`	`dc:subject`	One `dc:subject` per 650
`856 $u`	`dc:identifier`	URL of resource

Key Takeaways

Crosswalk only to move between different schemas — migration, aggregation, harvesting.
Rich-to-simple crosswalks are lossy and one-way; never expect a faithful round-trip.
Do not crosswalk when both systems share a schema or the loss is research-critical.
Route unmappable elements into a note; document everything you drop.
Always keep the original rich record as the source of truth.
Use a hub schema (often MODS) to cut N-times-N mappings down to 2N.
XSLT, MarcEdit and OpenRefine are the standard tools; validate output against the target XSD.

Frequently Asked Questions

What is a metadata crosswalk?

A crosswalk is a documented mapping of elements in one metadata schema to the equivalent elements in another, for example MARC 245 to Dublin Core dc:title. It lets records move between systems while keeping as much meaning as possible.

When should I crosswalk metadata?

Crosswalk when you must move records into a system that uses a different schema, aggregate heterogeneous collections, or expose richer records as simpler ones for harvesting. Avoid it when both systems already share a schema or when the loss would be unacceptable.

Is crosswalking lossy?

Almost always, when mapping from a richer schema to a simpler one. MARC and MODS carry distinctions (roles, indicators, subfields) that flatten or vanish in Dublin Core, so plan for one-way loss and keep the source record.

Should I crosswalk in one step or via an intermediate schema?

For many-to-many situations, map every schema to and from one hub schema (often MODS or Dublin Core) instead of writing N-times-N direct mappings. It cuts the number of crosswalks you maintain dramatically.

What tools help with crosswalking?

XSLT is the workhorse for XML-to-XML crosswalks; MarcEdit handles MARC conversions; OpenRefine helps reconcile and reshape tabular metadata. The Library of Congress publishes reference XSLT stylesheets for common pairs.

How do I avoid silent data loss in a crosswalk?

Document every unmapped source element, route anything you cannot place into a note rather than dropping it, validate the output against the target schema, and always retain the original record as the source of truth.

What exactly is a crosswalk? ​

When is crosswalking the right call? ​

When should you NOT crosswalk? ​

How do you handle the inevitable loss? ​

Should you use a hub schema? ​

What tools do the work? ​

What does a small mapping look like in practice? ​

Key Takeaways ​

Frequently Asked Questions ​

What is a metadata crosswalk? ​

When should I crosswalk metadata? ​

Is crosswalking lossy? ​

Should I crosswalk in one step or via an intermediate schema? ​

What tools help with crosswalking? ​

How do I avoid silent data loss in a crosswalk? ​

Related reading ​