Skip to content
Data Cleaning with OpenRefine

Export OpenRefine output with the templating exporter when your downstream target is structured — JSON, XML, RDF, JSON-LD, or any custom markup where one row must become a nested record. For flat tabular results, skip templating entirely and use the built-in CSV, TSV, or Excel exporters, which are simpler and far harder to break. The deciding question is shape: tabular out means plain export; structured-or-nested out means templating.

When does templating actually pay off?

Templating earns its complexity in a few clear cases:

  • You need one JSON or XML record per row with named keys, not columns.
  • You are emitting RDF/Turtle or JSON-LD for a linked-data pipeline.
  • Your consumer expects a bespoke text format (a fixed-field flat file, a SQL INSERT batch, a Markdown table fragment).

If none of those apply — your endpoint is a spreadsheet, a database bulk-load of a CSV, or a pandas script — templating adds escaping bugs for no benefit. Choose plain export.

How is an OpenRefine template structured?

The Export ▸ Templating dialog gives you four editable parts:

PartRunsPurpose
PrefixOnce, at startOpening bracket, @context, XML header
Row TemplateOnce per rowThe repeating record, with embedded GREL
Row SeparatorBetween rowsComma, newline
SuffixOnce, at endClosing bracket

Inside the row template you embed GREL in double curly braces. To keep this prose safe I show it only in a code block:

text
{
  "id": {{escape(cells["id"].value, "json")}},
  "name": {{escape(cells["name"].value, "json")}},
  "viaf": {{escape(cells["viaf_id"].value, "json")}}
}

Prefix [, separator ,\n, suffix ] yields a JSON array.

How do you stop the export producing broken JSON or XML?

Three rules prevent almost every malformed-output bug:

  1. Always escape. Wrap every value in escape(value, "json") or escape(value, "xml"). A single unescaped quote breaks the whole file.
  2. Handle nulls. Use if(isBlank(cells["x"].value), "null", escape(...)) so empty cells do not emit a bare comma or "null" string where you wanted a JSON null.
  3. Validate. Pipe the file through jq . for JSON or xmllint --noout for XML before anyone ingests it.
bash
openrefine_export.json | jq empty   # exits non-zero on malformed JSON
xmllint --noout export.xml          # silent if well-formed

Should you template in OpenRefine or post-process with a script?

This is the central trade-off. Templating is row-local: each row renders independently, with no grouping, no cross-row aggregation, and only the conditional logic GREL can express. The moment you need to:

  • group several rows into one nested object,
  • build arrays of children under a parent,
  • or enforce a schema,

stop templating and export plain CSV, then transform with Python or jq. A short script is more maintainable than a sprawling, brittle template.

python
import csv, json
rows = list(csv.DictReader(open("cleaned.csv", encoding="utf-8")))
records = [{"id": r["id"], "name": r["name"], "aliases": r["aliases"].split(";")} for r in rows]
json.dump(records, open("out.json", "w"), ensure_ascii=False, indent=2)

Does the export respect facets and filters?

Yes — and this bites people. The templating exporter (and every exporter) only writes the rows currently visible under your active facets. If you exported a subset by accident, the cause is almost always a leftover facet. Remove all facets to export the full dataset, or keep them deliberately to export a curated slice.

A quick decision checklist

  • Tabular target? Use CSV/TSV/Excel, not templating.
  • Nested/structured target? Templating, with full escaping.
  • Cross-row grouping needed? Export CSV, post-process with a script.
  • Always validate JSON/XML before downstream use.
  • Confirm your facet state matches the rows you intend to export.

Key Takeaways

  • Reach for the templating exporter only when output is structured (JSON, XML, RDF, JSON-LD); use plain CSV for flat data.
  • A template has four parts: prefix, row template, row separator, suffix, with GREL in double braces.
  • Escape every embedded value and handle nulls, or you will produce malformed files.
  • Templating is row-local — for grouping or schema validation, export CSV and post-process.
  • Exports respect active facets, so check your filter state before exporting.
  • Validate with jq or xmllint as a non-negotiable last step.

Frequently Asked Questions

When should I use OpenRefine's templating export instead of CSV?

Use templating export when your target is a nested or non-tabular format such as JSON, XML, RDF, or a custom markup, where one row must become a structured record. For flat spreadsheets, plain CSV or TSV export is simpler and less error-prone.

What languages does the OpenRefine templating exporter use?

The template uses GREL expressions inside double curly-brace placeholders, surrounded by a prefix, row template, row separator, and suffix. Each part is plain text into which you embed GREL to pull cell values.

Can OpenRefine export valid JSON-LD for linked data?

Yes. The templating exporter can produce JSON-LD by writing the @context in the prefix and emitting one JSON object per row, but you must escape strings carefully and validate the output, since OpenRefine does not guarantee well-formed JSON on its own.

How do I avoid broken JSON or XML from the templating exporter?

Wrap every embedded value in an escaping function such as escape(value, 'json') or escape(value, 'xml'), handle nulls explicitly, and validate the file with a linter before ingesting it downstream.

Should I template inside OpenRefine or post-process with a script?

Template inside OpenRefine for one-off or modest exports where the structure is regular. Switch to a Python or jq post-processing step when you need conditional nesting, grouping across rows, or schema validation that the templating exporter cannot express.

Does the templating export include only filtered rows?

Yes, the templating exporter respects your current facets and filters, so it exports only the rows visible in the current view. Remove all facets first if you intend to export the whole dataset.