Skip to content
Data Cleaning with OpenRefine

A facet in OpenRefine is an interactive summary of a column: it lists every distinct value alongside how many rows contain it, so inconsistencies jump out and you can click any value to filter to just those rows. Facets are the single fastest way for a beginner to find dirty data — misspellings, stray whitespace, blanks, and outliers — before fixing anything. They never change your data; they only change what you see.

What exactly is a facet?

Imagine a country column with 5,000 rows. A text facet collapses it into a sidebar showing each unique value and its count:

text
United Kingdom   3120
UK                890
U.K.              210
United Kingdom    140   <- trailing space
england            95   <- lowercase

Those four lines are the same country recorded five ways. You did not have to scroll 5,000 rows — the facet revealed the mess in seconds. Create one via the column dropdown ▸ Facet ▸ Text facet.

Which facet should a beginner reach for first?

Start with a text facet on any categorical column (country, status, document type, archive name). It groups identical strings, so near-duplicates line up next to each other. From the facet panel you can click a value to inspect its rows, or use the include/exclude links to build a working subset.

Sort the facet by count to find rare values (often typos) or by name to spot alphabetical neighbours like Berlin and Berlín.

How do I find blank or empty cells?

Blanks hide easily. Use a custom text facet with a GREL expression instead of the raw value:

grel
isBlank(value)

The facet now shows just true and false. Click true to isolate every empty cell. A richer version flags whitespace-only cells too:

grel
if(isBlank(value), "empty", if(value != value.trim(), "has whitespace", "ok"))

This single facet sorts your column into three buckets you can clean one at a time.

What is the difference between a facet and a filter?

Beginners mix these up. Here is the distinction:

ToolWhat it doesBest for
Text facetSummarises a column, click values to narrowSeeing the distribution, spotting variants
Text filterFree-text search box, keeps rows containing your termFinding a known substring fast
Numeric facetHistogram + range sliderOutliers in numbers
Timeline facetPlots dates on a timelineImpossible or out-of-range dates

Use a facet when you do not yet know what is wrong; use a text filter when you are hunting a specific string.

Can facets catch bad numbers and dates?

Yes, and they are excellent at it. A numeric facet draws a histogram with a slider — drag it and any bars far from the main cluster reveal outliers, like a year of 9999 or 0. A timeline facet plots date values so an entry from 1066 next to a column of 19th-century records is obvious at a glance.

grel
value.toNumber()   // force a numeric facet on a column stored as text

If the histogram looks empty, your numbers are stored as strings — wrap them with toNumber() first.

A tiny worked example, start to finish

  1. Load a 200-row spreadsheet of parish records.
  2. Text-facet place → see St. Mary's, St Marys, st. mary's.
  3. Click each variant to confirm they are the same place.
  4. Use Cluster from inside the facet to merge them.
  5. Custom-facet isBlank(value) on date → find 12 blanks.
  6. Numeric-facet year → spot one 1066 typo for 1866.

In five minutes, facets surfaced three classes of dirty data you would never have found by scrolling.

Key Takeaways

  • A facet is a clickable summary of a column showing every distinct value and its count.
  • Text facets are the beginner's first tool for spotting spelling and casing variants.
  • A custom facet on isBlank(value) finds empty cells instantly.
  • Facets differ from filters: facets show the distribution, filters search for a known string.
  • Numeric and timeline facets catch outliers and impossible dates.
  • Facets only change the view — your data is untouched until you apply a transformation.

Frequently Asked Questions

What is a facet in OpenRefine?

A facet is an interactive summary of a column that lists every distinct value with its row count, letting you see at a glance which values are inconsistent, mistyped, or unexpected, and click to filter to them.

Which facet should a beginner use first to spot dirty data?

Start with a text facet on a categorical column. It groups identical values and surfaces near-duplicates such as 'London', 'london', and 'Londn ' as separate entries you can immediately see and fix.

How do I find blank or empty cells with facets?

Use a customised facet, or a text facet on an expression like isBlank(value), which returns true and false. Filtering to true isolates every empty cell in that column instantly.

What is the difference between a facet and a filter in OpenRefine?

A facet summarises a column and lets you select values to narrow the view; a text filter is a free-text search box that keeps only rows whose cells contain your search term. Both restrict the visible rows but facets show you the distribution first.

Can I use facets on numbers and dates, not just text?

Yes. A numeric facet draws a histogram with a range slider to catch outliers, and a timeline facet plots dates so you can spot impossible or out-of-range values such as a death before a birth.

Do facets change my data?

No. Facets only filter what you see; they never alter cells. Editing only happens when you apply a transformation, mass edit, or clustering operation to the rows a facet has selected.