Appearance
Reconciling names to VIAF means matching each name string in your dataset to a Virtual International Authority File cluster, giving you a stable identifier instead of a free-text label. Because VIAF has no official reconciliation API, the standard approach is to point OpenRefine at a community W3C-conformant endpoint (commonly https://refine.codefork.com/reconcile/viaf) or to reconcile against Wikidata and harvest the VIAF property P214. The best practice that separates a defensible result from a guess is simple: add disambiguating properties, set a review threshold, and record the cluster ID you matched to.
Which VIAF endpoint should you point OpenRefine at?
VIAF aggregates national authority files (LC, BnF, DNB, and others), but it does not publish a native reconciliation service. Two reliable routes exist:
- Direct VIAF endpoint — add the codefork service via Reconcile to Standard Service. It supports a generic
Persontype and a few sub-types. - Via Wikidata — reconcile against
https://wikidata.reconci.link/en/api, then fetchP214to get the VIAF ID. This is often cleaner because Wikidata gives you dates and occupations to disambiguate against in the same pass.
Pick one route per project and write it down. Mixing endpoints mid-collection produces inconsistent IDs that are painful to audit later.
How do you stop matching the wrong person?
The biggest risk is two people sharing a name. Mitigate it by reconciling with additional properties, not on the bare name:
text
Column: author_name
Reconcile against: Person
Property: date of birth -> column "birth_year"
Property: occupation -> column "role"In the reconciliation dialog, attach your birth_year column as a property. A candidate "John Smith" born 1801 will outscore one born 1920, and the score gap makes auto-match safe.
What confidence threshold is defensible?
OpenRefine returns a candidate score per row. Adopt a banded policy and document it:
| Score band | Action | Rationale |
|---|---|---|
| 90–100 | Auto-match | Strong string + property agreement |
| 70–90 | Manual review | Likely but verify dates |
| < 70 | Reject / leave unreconciled | High false-positive risk |
Lower the Auto-match candidates with high confidence aggressiveness if you see false positives. There is no magic number — the point is that your threshold is explicit and repeatable.
How do you extract and keep the VIAF cluster ID?
Once cells are reconciled, materialise the identifier so it survives export. Add a column based on the reconciled column with GREL:
grel
cell.recon.match.idFor a full URI:
grel
if(cell.recon.matched, "https://viaf.org/viaf/" + cell.recon.match.id, "")Add a second column capturing the score for your audit trail:
grel
cell.recon.best.scoreHow do you reconcile thousands of names without timeouts?
Public endpoints throttle. Do not fire 40,000 rows at once. Instead:
- Facet the column and reconcile in batches of roughly 2,000–5,000 cells.
- Use Facet by reconciliation judgment to isolate
noneand retry only the unmatched. - Save the project (
Ctrl+Sis automatic, but export the project file before long runs). - Expect HTTP 429 responses; pause and resume rather than hammering.
A working VIAF reconciliation checklist
- [ ] Endpoint chosen and recorded (codefork VIAF vs Wikidata→
P214). - [ ] At least one disambiguating property attached (dates or occupation).
- [ ] Score-band policy documented.
- [ ]
cell.recon.match.idandcell.recon.best.scoreextracted to columns. - [ ] Unmatched names exported for human follow-up.
- [ ]
operations.jsonsaved so the run can be replayed.
Key Takeaways
- VIAF has no native reconciliation API — use the codefork endpoint or go via Wikidata
P214. - Always reconcile with a disambiguating property; bare names collide constantly.
- Adopt an explicit score-band policy (e.g. auto-match ≥ 90, review 70–90).
- Materialise
cell.recon.match.idinto a real column so the ID survives export. - Batch large jobs and tolerate rate-limiting rather than overloading public services.
- Document your endpoint, threshold, and date so the whole collection is consistent and auditable.
Frequently Asked Questions
Does VIAF have an official OpenRefine reconciliation endpoint?
VIAF itself does not ship a native reconciliation service, so practitioners use a community W3C-conformant endpoint such as the one at refine.codefork.com, or reconcile against Wikidata and pull the VIAF property P214 afterwards.
Should I reconcile against VIAF or Wikidata for personal names?
Reconcile against Wikidata when you want a rich, queryable graph and stable QIDs, and pull VIAF IDs via property P214. Reconcile directly against a VIAF endpoint when VIAF clustering of library authority files is your primary target.
How do I stop VIAF reconciliation from matching the wrong person?
Add a date or occupation column as a reconciliation property, lower auto-match aggressiveness, and never accept matches below roughly 70 on the candidate score without a human glance at birth and death dates.
What confidence score is safe to auto-match in VIAF reconciliation?
There is no universal threshold, but treating candidates scoring 90 or above as likely-correct and manually reviewing the 60 to 90 band is a defensible policy you should document per project.
How do I record which VIAF cluster I matched a name to?
After reconciliation, add a column from the reconciled column extracting cell.recon.match.id, which gives you the VIAF cluster number you can store and cite permanently.
Can I reconcile thousands of names to VIAF in one batch?
Yes, but throttle it: reconcile in facet-filtered batches of a few thousand rows, expect the public endpoint to rate-limit, and save your project frequently so a timeout does not lose work.