Beginner's Guide to Metadata quality

Metadata quality means your records are complete, accurate, consistent, conformant and current — the five dimensions every heritage audit measures against. Auditing it is simply checking each dimension and writing down where records fall short. You can start with nothing more than a CSV export and a spreadsheet, and a single counting pass over your mandatory fields will surface most of the problems.

This guide walks a beginner through the ideas in plain language and a small worked example you can repeat on your own data today.

What are the five dimensions of metadata quality?

Think of quality as five separate questions about each field:

Dimension	The question it asks	Example failure
Completeness	Is the field filled at all?	800 records with no date
Accuracy	Is the value correct?	Date recorded as 1290 instead of 1920
Consistency	Are values formatted the same way?	"London" vs "london, UK" vs "Lon."
Conformance	Does it match the schema/vocabulary?	Free-text type where DCMI Type is required
Currency	Is the value up to date?	Old rights statement after a copyright change

A record can pass one dimension and fail another, which is why beginners who only check "is it filled in?" miss most real problems.

How do I run my first audit with no special tools?

Export to CSV and count. The fastest free tool is csvstat from csvkit:

bash

# How many blanks in each column?
csvstat --null-value "" collection.csv

# Just the null counts for the fields that matter
csvstat --null-value "" collection.csv | grep -B1 "Nulls"

Read off the null counts for identifier, title, date and rights. Any non-zero number in a mandatory column is a completeness gap to log. That one command is a legitimate first audit.

How do I check consistency, not just completeness?

Completeness only tells you a field is filled — consistency tells you it is filled the same way. Open the same CSV in OpenRefine and use a text facet on the place or creator column:

Import the CSV into a new OpenRefine project.
On the column, choose Facet → Text facet.
Scan the list: "London", "london", "London (England)" appearing separately are the same place recorded three ways.
Use Cluster to group near-duplicates and merge them.

Seeing 2,000 records collapse into 40 real place names is the moment metadata quality clicks for most beginners.

A small worked example

Suppose a 500-row export of postcards. A quick pass finds:

text

identifier   0 blanks      -> complete, good
title        12 blanks     -> completeness gap
date         0 blanks but  -> 30 values like "c.1910?" (inconsistent format)
place        0 blanks but  -> 96 distinct spellings for ~25 towns
rights       210 blanks    -> serious gap, legal risk

The audit's output is this short list of defects, ranked by impact. You would fix rights first (legal exposure), then normalise place in OpenRefine, then fill the 12 titles, then standardise dates to EDTF (1910~ for "about 1910").

Which fields should a beginner audit first?

Prioritise by impact, not by column order. Mandatory fields — identifier, title, date and rights — carry the highest retrieval and legal weight, so errors there cost the most. Descriptive extras like subject matter, but a missing rights statement can make an entire collection legally unusable, so it outranks them.

How often, and can I automate it?

Run a quick automated pass at every ingest and a fuller manual review quarterly or after any bulk import or migration — those are the moments quality slips. For ongoing work, the Metadata Quality Assessment Framework (MQAF) and similar tools can score completeness and flag inconsistency automatically. Automation handles completeness and consistency well; accuracy still needs a human eye on a sample.

Key Takeaways

Metadata quality has five dimensions: completeness, accuracy, consistency, conformance and currency.
You can start an audit with just a CSV export and csvstat — count blanks in mandatory fields first.
Completeness ("is it filled?") is not accuracy ("is it correct?"); check both.
Use OpenRefine text facets and clustering to catch inconsistent values that completeness checks miss.
Audit identifier, title, date and rights first — they carry the most retrieval and legal weight.
Audit at every ingest and fully after bulk imports or migrations; automate completeness, but spot-check accuracy by hand.

Frequently Asked Questions

What does metadata quality actually mean?

It means records are complete, accurate, consistent, conformant to their schema, and current. These five dimensions are the standard lens used in cultural-heritage metadata audits.

How do I start a metadata quality audit with no tools?

Export your records to CSV and open them in a spreadsheet or run csvstat. Count blanks in mandatory columns and scan for inconsistent values; that single pass surfaces most quality problems.

What is the difference between completeness and accuracy?

Completeness asks whether a field is filled at all; accuracy asks whether the value is correct. A date field can be 100% complete and still riddled with wrong dates, so you must check both.

Which metadata fields should I audit first?

Start with your mandatory fields: identifier, title, date and rights. They have the highest retrieval and legal impact, so errors there cost the most and are the most worthwhile to fix.

How often should I audit metadata quality?

Run a quick automated pass at every ingest and a fuller manual review quarterly or after any bulk import or migration, which are the moments quality most often slips.

Can I automate a metadata quality audit?

Yes. Tools like csvstat, OpenRefine and the Metadata Quality Assessment Framework can measure completeness and flag inconsistencies, though human judgement is still needed for accuracy.

What are the five dimensions of metadata quality? ​

How do I run my first audit with no special tools? ​

How do I check consistency, not just completeness? ​

A small worked example ​

Which fields should a beginner audit first? ​

How often, and can I automate it? ​

Key Takeaways ​

Frequently Asked Questions ​

What does metadata quality actually mean? ​

How do I start a metadata quality audit with no tools? ​

What is the difference between completeness and accuracy? ​

Which metadata fields should I audit first? ​

How often should I audit metadata quality? ​

Can I automate a metadata quality audit? ​

Related reading ​