Appearance
A facet in OpenRefine is an interactive summary of a column: it lists every distinct value alongside how many rows contain it, so inconsistencies jump out and you can click any value to filter to just those rows. Facets are the single fastest way for a beginner to find dirty data — misspellings, stray whitespace, blanks, and outliers — before fixing anything. They never change your data; they only change what you see.
What exactly is a facet?
Imagine a country column with 5,000 rows. A text facet collapses it into a sidebar showing each unique value and its count:
text
United Kingdom 3120
UK 890
U.K. 210
United Kingdom 140 <- trailing space
england 95 <- lowercaseThose four lines are the same country recorded five ways. You did not have to scroll 5,000 rows — the facet revealed the mess in seconds. Create one via the column dropdown ▸ Facet ▸ Text facet.
Which facet should a beginner reach for first?
Start with a text facet on any categorical column (country, status, document type, archive name). It groups identical strings, so near-duplicates line up next to each other. From the facet panel you can click a value to inspect its rows, or use the include/exclude links to build a working subset.
Sort the facet by count to find rare values (often typos) or by name to spot alphabetical neighbours like Berlin and Berlín.
How do I find blank or empty cells?
Blanks hide easily. Use a custom text facet with a GREL expression instead of the raw value:
grel
isBlank(value)The facet now shows just true and false. Click true to isolate every empty cell. A richer version flags whitespace-only cells too:
grel
if(isBlank(value), "empty", if(value != value.trim(), "has whitespace", "ok"))This single facet sorts your column into three buckets you can clean one at a time.
What is the difference between a facet and a filter?
Beginners mix these up. Here is the distinction:
| Tool | What it does | Best for |
|---|---|---|
| Text facet | Summarises a column, click values to narrow | Seeing the distribution, spotting variants |
| Text filter | Free-text search box, keeps rows containing your term | Finding a known substring fast |
| Numeric facet | Histogram + range slider | Outliers in numbers |
| Timeline facet | Plots dates on a timeline | Impossible or out-of-range dates |
Use a facet when you do not yet know what is wrong; use a text filter when you are hunting a specific string.
Can facets catch bad numbers and dates?
Yes, and they are excellent at it. A numeric facet draws a histogram with a slider — drag it and any bars far from the main cluster reveal outliers, like a year of 9999 or 0. A timeline facet plots date values so an entry from 1066 next to a column of 19th-century records is obvious at a glance.
grel
value.toNumber() // force a numeric facet on a column stored as textIf the histogram looks empty, your numbers are stored as strings — wrap them with toNumber() first.
A tiny worked example, start to finish
- Load a 200-row spreadsheet of parish records.
- Text-facet
place→ seeSt. Mary's,St Marys,st. mary's. - Click each variant to confirm they are the same place.
- Use Cluster from inside the facet to merge them.
- Custom-facet
isBlank(value)ondate→ find 12 blanks. - Numeric-facet
year→ spot one1066typo for1866.
In five minutes, facets surfaced three classes of dirty data you would never have found by scrolling.
Key Takeaways
- A facet is a clickable summary of a column showing every distinct value and its count.
- Text facets are the beginner's first tool for spotting spelling and casing variants.
- A custom facet on
isBlank(value)finds empty cells instantly. - Facets differ from filters: facets show the distribution, filters search for a known string.
- Numeric and timeline facets catch outliers and impossible dates.
- Facets only change the view — your data is untouched until you apply a transformation.
Frequently Asked Questions
What is a facet in OpenRefine?
A facet is an interactive summary of a column that lists every distinct value with its row count, letting you see at a glance which values are inconsistent, mistyped, or unexpected, and click to filter to them.
Which facet should a beginner use first to spot dirty data?
Start with a text facet on a categorical column. It groups identical values and surfaces near-duplicates such as 'London', 'london', and 'Londn ' as separate entries you can immediately see and fix.
How do I find blank or empty cells with facets?
Use a customised facet, or a text facet on an expression like isBlank(value), which returns true and false. Filtering to true isolates every empty cell in that column instantly.
What is the difference between a facet and a filter in OpenRefine?
A facet summarises a column and lets you select values to narrow the view; a text filter is a free-text search box that keeps only rows whose cells contain your search term. Both restrict the visible rows but facets show you the distribution first.
Can I use facets on numbers and dates, not just text?
Yes. A numeric facet draws a histogram with a range slider to catch outliers, and a timeline facet plots dates so you can spot impossible or out-of-range values such as a death before a birth.
Do facets change my data?
No. Facets only filter what you see; they never alter cells. Editing only happens when you apply a transformation, mass edit, or clustering operation to the rows a facet has selected.