When to Upload data to Wikidata via OpenRefine

Upload to Wikidata via OpenRefine when your source data needs reconciliation, cleaning or multi-column mapping before it is fit to publish — that is, almost any real heritage catalogue. Choose QuickStatements alone only for small, already-clean batches you have prepared by hand. OpenRefine's killer feature is reconciliation: it matches your messy values against existing Wikidata items so you enrich rather than duplicate. If your data is already clean Q-ids in a tab file, OpenRefine adds overhead you do not need.

When does OpenRefine clearly beat the alternatives?

Reach for OpenRefine when any of these are true:

Your values are names, places or terms that must be matched to existing items, not created blind.
The catalogue has dozens of columns to map onto properties and qualifiers.
The data is dirty — inconsistent dates, trailing whitespace, variant spellings — and needs clustering first.
You want a reviewable schema you can re-run as the source updates.

If none of these apply and you have fifty clean rows, a hand-written QuickStatements file is faster.

What does the end-to-end workflow look like?

Import the CSV/TSV into a new OpenRefine project.
Clean: trim whitespace, cluster-and-merge variant spellings, normalise dates with GREL.
Reconcile key columns against the Wikidata service, constrained by type.
Review matches — accept high-confidence, judge medium, create-new only where justified.
Build the schema mapping columns to P-codes, qualifiers and references.
Preview, then either upload directly or export a QuickStatements file.

text

GREL to coerce a year into ISO with precision:
value.toDate("yyyy").toString("+yyyy-01-01T00:00:00Z") + "/9"

How does reconciliation actually prevent duplicates?

Reconciliation scores each of your values against Wikidata candidates and returns a ranked list. Constrain it to a type (for example, human or archival creator) and add property hints (date of birth, occupation) so it disambiguates the right "John Smith". The trade-off:

Confidence	OpenRefine default	Your action
High (auto-matched)	accepted	spot-check a sample
Medium (candidates shown)	left for review	judge each manually
No match	unmatched	decide: create new or leave out

The manual review of the medium band is where duplicates are caught. Skipping it is the single biggest cause of polluted uploads.

When should I NOT use OpenRefine?

Avoid it when the data is so small that setup outweighs benefit, when the upload is a handful of new items with no reconciliation need, or when your values cannot legally or ethically go onto Wikidata at all (personal data about living people, rights-restricted descriptions). OpenRefine cannot fix a data-suitability problem — it will just upload the wrong thing faster.

Direct upload or export QuickStatements — which is safer?

Both end in the same place, but the audit trail differs. Direct upload commits immediately from within OpenRefine; convenient but unforgiving. Exporting a QuickStatements file lets you inspect every line, diff it against expectations, run a pilot, and keep the file as documentation. For institutional work where you must be able to explain every edit, export-and-review wins. Reserve direct upload for low-stakes enrichment you trust.

What are the real costs to budget for?

The tool is free, but the time is not. Reconciling and reviewing a few thousand creator names can take a day or two of skilled attention. Memory matters: large projects want 4 GB or more allocated to OpenRefine. And there is a maintenance cost — schemas need updating as Wikidata's property landscape shifts. Treat the first upload as 70% preparation, 30% upload.

Key Takeaways

Use OpenRefine when data needs reconciliation, cleaning or multi-column mapping.
Use QuickStatements alone for small, already-clean, hand-prepared batches.
Reconciliation with type and property constraints is what prevents duplicates.
Always review the medium-confidence match band by hand before committing.
Prefer export-and-review over direct upload when an audit trail matters.
Budget time as roughly 70% preparation, 30% upload, plus schema maintenance.

Frequently Asked Questions

When is OpenRefine better than QuickStatements for uploading?

OpenRefine wins when your data needs reconciliation against existing items, has many columns to map, or benefits from clustering and cleaning first. QuickStatements is simpler for small, already-clean batches you have hand-prepared.

Does OpenRefine reconcile against Wikidata out of the box?

Yes. Recent OpenRefine ships with a Wikidata reconciliation service so you can match your values to existing Q-ids, with type and property constraints to improve precision.

Do I need a Wikidata account to upload from OpenRefine?

Yes, an autoconfirmed account. OpenRefine authenticates you and can create or update items directly, or export a QuickStatements file you upload separately.

What is the biggest risk of bulk uploading?

Creating duplicates of items that already exist. Reconciliation reduces this, but you must review medium-confidence matches by hand before committing the upload.

How large a dataset can OpenRefine handle?

Comfortably tens of thousands of rows on a normal machine, given enough memory. Beyond that, reconciliation slows and you should batch the work or script with the API.

Should I upload directly or export QuickStatements?

For a reviewable, auditable workflow, exporting QuickStatements and inspecting it before submission is safer. Direct upload is faster but commits immediately.

When does OpenRefine clearly beat the alternatives? ​

What does the end-to-end workflow look like? ​

How does reconciliation actually prevent duplicates? ​

When should I NOT use OpenRefine? ​

Direct upload or export QuickStatements — which is safer? ​

What are the real costs to budget for? ​

Key Takeaways ​

Frequently Asked Questions ​

When is OpenRefine better than QuickStatements for uploading? ​

Does OpenRefine reconcile against Wikidata out of the box? ​

Do I need a Wikidata account to upload from OpenRefine? ​

What is the biggest risk of bulk uploading? ​

How large a dataset can OpenRefine handle? ​

Should I upload directly or export QuickStatements? ​

Related reading ​