Appearance
To map MARC to Dublin Core, apply the Library of Congress crosswalk: 245 becomes dc:title, 100/110/111 become dc:creator, 650 becomes dc:subject, 260/264 and the 008 date become dc:date, and 856 becomes dc:identifier. The mapping is deliberately lossy — MARC's hundreds of fields collapse into Dublin Core's fifteen flat elements — so treat it as a one-way export for discovery and aggregation, not a round trip.
This guide walks the workflow end to end with reusable examples.
What is the core field mapping?
Start from the canonical crosswalk. The fields you will use 95% of the time:
| MARC | Subfields | Dublin Core |
|---|---|---|
| 245 | a, b | dc:title |
| 100 / 110 / 111 | a | dc:creator |
| 700 / 710 | a | dc:contributor |
| 260 / 264 | c (or 008) | dc:date |
| 650 / 651 | a | dc:subject |
| 520 | a | dc:description |
| 300 | a | dc:format |
| 041 / 008 | — | dc:language |
| 856 | u | dc:identifier |
| 506 / 540 | a | dc:rights |
Note the 100 vs 700 distinction: a main entry is the dc:creator; added entries are dc:contributor. Conflating them is the most common mapping error.
How do I handle subfields — concatenate or split?
Decide per field. Two rules cover most cases:
- Concatenate where subfields form one logical value.
245 $a Title : $b subtitlebecomes a singledc:titleof "Title : subtitle". - Split where each occurrence is independent. Two
650fields become two separatedc:subjectelements, never one merged string.
Getting this wrong produces either run-together titles or subjects that cannot be faceted.
Which tool should I use?
For interactive work, MarcEdit is the standard: its MARC21-to-Dublin-Core task applies the LC crosswalk out of the box and lets you tweak it. For repeatable pipelines, pymarc in Python gives full control:
python
from pymarc import MARCReader
def marc_to_dc(record):
dc = {"title": [], "creator": [], "subject": [], "date": [],
"identifier": [], "rights": []}
if record["245"]:
dc["title"].append(" ".join(record["245"].get_subfields("a", "b")).strip())
if record["100"]:
dc["creator"].append(record["100"]["a"])
for f in record.get_fields("650", "651"):
if f["a"]:
dc["subject"].append(f["a"].rstrip(". "))
# date: prefer 008 (chars 7-10), fall back to 264 $c
if record["008"] and record["008"].data[7:11].strip():
dc["date"].append(record["008"].data[7:11])
elif record["264"] and record["264"]["c"]:
dc["date"].append(record["264"]["c"])
if record["856"]:
dc["identifier"].append(record["856"]["u"])
return dc
with open("catalogue.mrc", "rb") as fh:
for rec in MARCReader(fh):
print(marc_to_dc(rec))This reproduces MarcEdit's defaults but lets you encode local rules.
Where should dates come from — 008 or 264?
Prefer the 008 fixed field (characters 7–10) for a clean four-digit machine date, and fall back to 264 subfield c for the transcribed form when 008 is blank or coded uuuu. Normalise whatever you get to EDTF so dc:date stays consistent:
text
008 date "1923" -> dc:date 1923
264 $c "[ca. 1923]" -> dc:date 1923~ (EDTF "approximate")Why is this conversion lossy, and how do I manage it?
MARC encodes role relator codes, indicators, linking fields and dozens of note types that Dublin Core simply has no slot for. When you flatten 700 $a Smith, John $e photographer, the relator "photographer" usually disappears — Smith just becomes a dc:contributor. Manage the loss by:
- Mapping only what Dublin Core can faithfully hold.
- Keeping the original MARC as the master of record.
- Logging dropped fields so you know what discovery users will not see.
Because of this, treat MARC-to-Dublin-Core as an export for aggregators (Europeana, DPLA, OAI-PMH harvesting), not as your authoritative catalogue.
A reusable end-to-end workflow
- Export records from the ILS as
.mrc(MARC21 binary) or MARCXML. - Run MarcEdit's MARC-to-DC task, or your pymarc script, with the LC crosswalk.
- Normalise dates to EDTF and split multi-valued subjects.
- Add a
dc:rightsURI where 506/540 is missing. - Validate the output against your Dublin Core application profile.
- Expose via OAI-PMH for harvesting; keep MARC as the master.
Key Takeaways
- Use the Library of Congress MARC-to-Dublin-Core crosswalk as the authoritative mapping.
100/110/111map todc:creator;700/710map todc:contributor— keep them distinct.- Concatenate subfields that form one value (title); split repeating fields (subjects).
- MarcEdit suits interactive work; pymarc suits scripted, repeatable pipelines.
- Prefer the
008date, fall back to264 $c, and normalise everything to EDTF. - The mapping is lossy and one-way: keep MARC as the master and use Dublin Core for discovery and harvesting.
Frequently Asked Questions
Which MARC fields map to Dublin Core title and creator?
MARC 245 (subfields a and b) maps to dc:title, and MARC 100, 110 or 111 maps to dc:creator. Added entries in 700 and 710 typically map to dc:contributor rather than dc:creator.
Is there an official MARC to Dublin Core crosswalk?
Yes. The Library of Congress publishes the MARC to Dublin Core crosswalk, which is the standard reference. Most tools, including MarcEdit, base their default mappings on it.
What is the best tool to convert MARC to Dublin Core?
MarcEdit is the de facto standard for librarians and works on MARC21 directly. For scripted pipelines, pymarc in Python gives you full control over how each field maps.
Why is mapping MARC to Dublin Core lossy?
MARC has hundreds of fields and indicators encoding fine distinctions that Dublin Core's fifteen flat elements cannot hold. Roles, relationships and many notes collapse or are dropped, so the conversion is one-way in practice.
How do I handle MARC subfields when mapping?
Decide per field whether to concatenate subfields into one Dublin Core value or split them. Title usually concatenates 245 a and b; subjects in 650 often split each occurrence into a separate dc:subject.
Should I map MARC dates from 008 or 260/264?
Use the 008 fixed field for a clean machine-readable date when present, and fall back to 264 subfield c for the transcribed date. Normalising both to EDTF keeps dc:date consistent.