Create MODS records for digitised items: A Practical Guide

Q: What are the minimum MODS elements for a digitised item?

A defensible minimum is titleInfo, name with a role, typeOfResource, originInfo (with a structured date), language, physicalDescription, location (the digital object URL) and recordInfo. Anything thinner is hard to disambiguate later.

Q: How do I record the digitised file location in MODS?

Use the location element with a url child. Add access='object in context' or access='raw object' attributes so a viewer can tell a landing page from a direct image link.

Q: Should dates go in a date element or as plain text?

Always use a typed date element such as dateIssued or dateCreated with an encoding attribute (encoding w3cdtf or edtf). Plain-text dates cannot be sorted or range-queried reliably.

Q: How do I validate a MODS record?

Run xmllint with the schema, e.g. xmllint --noout --schema mods-3-7.xsd record.xml. A clean exit code zero means the record is schema-valid; fix every reported error before ingest.

To create a MODS record for a digitised item, capture eight things at minimum: the title, the responsible name with its role, the resource type, a structured date, the language, the physical description, the URL of the digital object, and record-source notes. Author them in MODS 3.7 XML, validate against the official schema, then ingest. Below is the end-to-end workflow I use for map, photograph and manuscript digitisation, with a complete worked record.

What goes into a complete MODS record?

Start from a checklist rather than a blank file. For a single digitised photograph the skeleton is:

xml

<mods xmlns="http://www.loc.gov/mods/v3" version="3.7"
      xmlns:xlink="http://www.w3.org/1999/xlink">
  <titleInfo>
    <title>High Street, Ipswich, looking west</title>
  </titleInfo>
  <name type="personal">
    <namePart>Cobbold, Felix</namePart>
    <role><roleTerm type="text" authority="marcrelator">Photographer</roleTerm></role>
  </name>
  <typeOfResource>still image</typeOfResource>
  <genre authority="aat">photographs</genre>
  <originInfo>
    <dateCreated encoding="w3cdtf" keyDate="yes">1903</dateCreated>
    <place><placeTerm type="text">Ipswich, Suffolk</placeTerm></place>
  </originInfo>
  <language><languageTerm authority="iso639-2b">eng</languageTerm></language>
  <physicalDescription>
    <form authority="marcform">print</form>
    <extent>1 photograph : gelatin silver ; 12 x 18 cm</extent>
  </physicalDescription>
  <location>
    <url access="object in context">https://digitalrelics.uk/items/ips-0903</url>
  </location>
  <recordInfo>
    <recordContentSource>Aether Forge Archive</recordContentSource>
    <recordCreationDate encoding="w3cdtf">2024-12-03</recordCreationDate>
  </recordInfo>
</mods>

Every element here earns its place: keyDate="yes" tells aggregators which date to sort on; access="object in context" distinguishes a landing page from a raw file.

How do you handle names and roles correctly?

The single biggest quality win in MODS is typed names. Give every name a type (personal, corporate, conference) and a role/roleTerm drawn from the MARC relator list. Where an authority record exists, add valueURI and authority:

xml

<name type="personal" valueURI="http://id.loc.gov/authorities/names/n79021164"
      authority="naf">
  <namePart>Constable, John</namePart>
  <role><roleTerm type="code" authority="marcrelator">art</roleTerm></role>
</name>

Linking to a Library of Congress NAF or VIAF identifier is what later lets you crosswalk to linked data without re-disambiguating people.

How do you record the digital object itself?

A descriptive record that does not point at the file is half a record. Use location/url for access copies and relatedItem type="original" for the analogue source if you describe both. Add physicalLocation for the shelfmark of the original:

xml

<location>
  <physicalLocation>SRO Ipswich, HD2418/4</physicalLocation>
  <url access="raw object" usage="primary display">https://digitalrelics.uk/iiif/ips-0903/full/full/0/default.jpg</url>
</location>

Can you build MODS records in bulk?

Yes, and you should for anything over a few dozen items. Keep cataloguers in a spreadsheet (one row per item) and template the XML. A compact Python approach:

python

import csv
from jinja2 import Template

tmpl = Template(open("mods.xml.j2").read())
with open("catalogue.csv", newline="", encoding="utf-8") as f:
    for row in csv.DictReader(f):
        xml = tmpl.render(**row)
        open(f"mods/{row['id']}.xml", "w", encoding="utf-8").write(xml)

This separates cataloguing (the spreadsheet) from serialisation (the template), so a non-XML person can do the description.

How do you validate before ingest?

Never ingest unvalidated records. Run the official schema over every file:

bash

xmllint --noout --schema mods-3-7.xsd mods/*.xml

A non-zero exit code lists each error with a line number. Common failures: wrong element order (MODS is order-sensitive within some sequences), a missing namespace declaration, or an encoding value that is not a permitted token.

What are the most common mistakes?

Plain-text dates with no encoding — unsortable and unqueryable.
Untyped names — you lose the surveyor/engraver/photographer distinction.
Stuffing everything into note instead of the right element.
Forgetting keyDate="yes" so aggregators guess your sort date.
No recordInfo, so provenance of the metadata itself is lost.

Key Takeaways

Author MODS 3.7 and validate against mods-3-7.xsd before every ingest.
Minimum viable record = title, typed name+role, resource type, structured date, language, physical description, URL, record info.
Always type names and link to NAF/VIAF identifiers for future linked-data work.
Use encoding="w3cdtf" or edtf on every date and mark one keyDate="yes".
Distinguish access copies from landing pages with location/url access attributes.
For volume work, catalogue in a spreadsheet and template the XML.
Record metadata provenance in recordInfo, not just object provenance.

Frequently Asked Questions

What are the minimum MODS elements for a digitised item?

A defensible minimum is titleInfo, name with a role, typeOfResource, originInfo (with a structured date), language, physicalDescription, location (the digital object URL) and recordInfo. Anything thinner is hard to disambiguate later.

Which MODS version should I use?

Use MODS 3.7, the current release from the Library of Congress (2018). Declare version="3.7" on the root element and validate against mods-3-7.xsd.

How do I record the digitised file location in MODS?

Use the location element with a url child. Add access="object in context" or access="raw object" attributes so a viewer can tell a landing page from a direct image link.

Should dates go in a date element or as plain text?

Always use a typed date element such as dateIssued or dateCreated with an encoding attribute (encoding="w3cdtf" or edtf). Plain-text dates cannot be sorted or range-queried reliably.

How do I validate a MODS record?

Run xmllint with the schema, e.g. xmllint --noout --schema mods-3-7.xsd record.xml. A clean exit code zero means the record is schema-valid; fix every reported error before ingest.

Can I generate MODS in bulk from a spreadsheet?

Yes. Keep one row per item with columns mapped to MODS paths, then template the XML with a script (Python plus lxml or a Jinja2 template). This keeps cataloguers in a spreadsheet while producing valid XML.

What goes into a complete MODS record? ​

How do you handle names and roles correctly? ​

How do you record the digital object itself? ​

Can you build MODS records in bulk? ​

How do you validate before ingest? ​

What are the most common mistakes? ​

Key Takeaways ​

Frequently Asked Questions ​

What are the minimum MODS elements for a digitised item? ​

Which MODS version should I use? ​

How do I record the digitised file location in MODS? ​

Should dates go in a date element or as plain text? ​

How do I validate a MODS record? ​

Can I generate MODS in bulk from a spreadsheet? ​

Related reading ​