Appearance
Split multi-valued cells when you need to count, facet, reconcile or analyse each value on its own — for example to match every author in a creator column to Wikidata. Keep them joined when the values only ever function as a single display label. The decision hinges on whether downstream work treats the values as individuals or as one unit, and splitting is reversible, so it is a low-risk experiment.
What problem does splitting actually solve?
A cell like Austen, Jane; Brontë, Charlotte; Eliot, George is invisible to faceting, counting and reconciliation — OpenRefine sees one string, not three people. Splitting it into three rows lets you facet authors, count how often each appears, and reconcile each to an identifier. The cost is structural complexity: one bibliographic record now spans three rows.
When should you split, and when should you not?
| Situation | Split? | Reason |
|---|---|---|
| Reconciling each value to Wikidata/VIAF | Yes | Each value needs its own match |
| Counting frequency of each value | Yes | Faceting needs separate rows |
| Values only shown as a combined label | No | Splitting adds complexity for nothing |
| Fixed positions (e.g. first/second author matters) | Split into columns | Position is meaningful |
| Variable-length lists | Split into rows | Row count adapts to list length |
| Separator also appears inside a value | Caution | Naive split corrupts data |
If none of your planned analyses touch individual values, leave the cell intact. Splitting that you never exploit is pure overhead.
Rows or columns — which split do you need?
OpenRefine offers both under Edit cells > Split multi-valued cells:
- Split into rows keeps one column and multiplies rows. Best for lists whose length varies (authors, subjects, keywords).
- Split into several columns (via
Edit column > Split into several columns) keeps one row and adds columns. Best when there is a small, fixed number of positions and the position carries meaning.
A list of subject headings should become rows; a coordinates field of lat, lon should become two columns.
How does splitting interact with records?
This is the part newcomers miss. After splitting into rows, OpenRefine creates a record: the first row retains the other columns; the additional rows leave them blank but stay grouped under the same record. Switch the view from Rows to Records (top-left toggle) or the data looks broken.
text
Rows mode: Records mode:
title author title author
The X | Austen, Jane The X | Austen, Jane
(blank) | Brontë, Charlotte | Brontë, Charlotte
(blank) | Eliot, George | Eliot, GeorgeFaceting in Records mode counts one per record; in Rows mode it counts every value. Choose the mode that matches your question.
What separator do you split on, and what can go wrong?
Inspect first with a text facet. Common delimiters are ;, | and ,. The danger is a separator that also occurs inside a value:
- Splitting
Austen, Jane; Eliot, Georgeon a comma produces four fragments, not two names. - Splitting on
;here works correctly.
When the delimiter is ambiguous, normalise it first with a GREL transform, then split:
text
value.replace("; ", ";").replace(", and ", ";")Is splitting reversible?
Yes — and this is why it is safe to try. Edit cells > Join multi-valued cells concatenates the rows of a record back into one cell with a separator you pick. So you can split to reconcile, capture the QIDs, then rejoin for export if a downstream system expects a single field. Combined with the undo history, nothing about splitting is a one-way door.
Key Takeaways
- Split when individual values must be counted, faceted or reconciled; keep joined when they act as one label.
- Use split-into-rows for variable-length lists and split-into-columns for fixed, meaningful positions.
- Always switch to Records mode after splitting into rows, or the grouped data looks broken.
- Inspect the data first; never split on a separator that also appears inside a value.
- Normalise inconsistent delimiters with GREL before splitting to avoid corrupt fragments.
- Splitting is fully reversible via Join multi-valued cells, so treat it as a low-risk step.
Frequently Asked Questions
What is a multi-valued cell?
A multi-valued cell holds several distinct values in one field, usually joined by a separator such as a semicolon or pipe, for example three authors listed in a single creator column. Splitting turns each value into its own row or column.
When should I split multi-valued cells into rows?
Split into rows when you need to count, facet, reconcile or analyse each value independently, such as reconciling every author to Wikidata. If the values are only ever displayed together as a label, keep them in one cell.
What is the difference between split into rows and split into columns?
Split into rows creates one new record per value, keeping a single column. Split into columns spreads values across several new columns in the same row. Use rows for variable-length lists and columns for a fixed, small number of positions.
How do records and rows work after splitting?
After Split multi-valued cells, OpenRefine groups the new rows under one record, shown in Records mode. The first row keeps the other columns and the extra rows leave them blank, so always switch to Records mode to see the grouping.
Can I rejoin split cells later?
Yes. Join multi-valued cells reverses the operation, concatenating the rows back into one cell with a separator you choose. This makes splitting a safe, reversible exploratory step rather than a one-way commitment.
What separator should I split on?
Split on the actual delimiter in your data, commonly a semicolon, pipe or comma. Inspect a text facet first, because inconsistent or nested separators such as a comma inside a name will produce wrong splits if you guess.