Appearance
A federated SPARQL query asks one query to pull data from two or more endpoints at once, using the SERVICE keyword to delegate a sub-pattern to a remote store and join the results locally. For historians this means you can, in a single query, take people from your own collection and enrich them with birth dates, occupations or images held in Wikidata, without copying anything. The skill that separates a query that returns in two seconds from one that times out is controlling where the join happens: bind your local values first, then send only those specific values to the remote service.
What is federation and when is it worth it?
Federation is worth it when the authoritative data lives elsewhere and you want it live, not snapshotted. Typical heritage cases:
- Enrich local person records with Wikidata occupations and dates.
- Cross-reference your places against a gazetteer endpoint.
- Pull VIAF or GeoNames attributes alongside your catalogue.
If you only need the data once, a dump and a local load may be simpler. Federate when freshness matters or the remote dataset is too large to mirror.
How does the SERVICE keyword work?
SERVICE <endpoint> { ... } runs the enclosed graph pattern on the remote endpoint and merges the bindings into the surrounding query.
sparql
PREFIX wd: <http://www.wikidata.org/entity/>
PREFIX wdt: <http://www.wikidata.org/prop/direct/>
PREFIX dcterms: <http://purl.org/dc/terms/>
SELECT ?object ?creator ?birth WHERE {
?object dcterms:creator ?creator . # local graph
SERVICE <https://query.wikidata.org/sparql> {
?creator wdt:P569 ?birth . # remote: date of birth
}
}The local engine resolves ?creator first, then asks Wikidata only about those specific creators.
Why does my federated query time out?
The number one cause is an unbound remote pattern. If ?creator is not yet bound when the SERVICE block runs, the engine may pull every person in Wikidata before joining. Fixes, in order of impact:
- Bind locally first so the remote side receives concrete values.
- Add
LIMIT 50while developing to keep test runs cheap. - Use
VALUESto inject a small explicit set into the remote call.
sparql
SERVICE <https://query.wikidata.org/sparql> {
VALUES ?creator { wd:Q41264 wd:Q5598 }
?creator wdt:P569 ?birth .
}This pattern turns a query that scans millions of rows into one that touches two.
How do I bridge different vocabularies?
The two endpoints rarely share property names. Map each side explicitly, and join on a shared identifier or a sameAs link.
| Local concept | Local property | Remote (Wikidata) |
|---|---|---|
| creator | dcterms:creator | the entity URI itself |
| birth date | n/a | wdt:P569 |
| place | dcterms:spatial | wdt:P625 (coordinates) |
If your local URIs are not Wikidata URIs, add the bridge:
sparql
?localPerson owl:sameAs ?wdPerson .
SERVICE <https://query.wikidata.org/sparql> {
?wdPerson wdt:P106 ?occupation .
}How do I make federation resilient?
Remote endpoints fail, throttle or go down. SERVICE SILENT lets the rest of the query succeed when a remote call fails, returning partial results rather than nothing.
sparql
SERVICE SILENT <https://query.wikidata.org/sparql> {
?creator wdt:P18 ?image .
}Use it deliberately: it hides failures, so log when a SILENT block returns empty rather than treating gaps as "no data".
How do I keep results reproducible?
Federated results change because remote data changes. To let a reviewer reconstruct your numbers:
- Pin exact endpoint URLs and save the full query text in version control.
- Record the run date; remote facts on that date may differ later.
- Note dataset versions or use a dated dump endpoint where one exists.
- Add a polite, identifying user-agent and respect rate limits.
A complete reusable workflow
- Write and test the local half until it returns the right entities.
- Add a tiny
VALUES-boundSERVICEblock; confirm it returns fast. - Generalise from
VALUESto the live local binding. - Wrap fragile remote calls in
SERVICE SILENTand log gaps. - Commit the query, record the date, and document endpoints.
Key Takeaways
SERVICEdelegates a sub-pattern to a remote endpoint and joins results locally.- Bind local variables first so the remote call receives specific values, not an open scan.
- Use
VALUESand aLIMITto keep development queries fast and cheap. - Bridge differing vocabularies with explicit property maps and
owl:sameAsjoins. SERVICE SILENTadds resilience but hides failures, so log when it triggers.- Pin endpoints, save the query, and record the run date for reproducibility.
- Federate when freshness matters; dump and load locally when you only need a snapshot.
Frequently Asked Questions
What does the SERVICE keyword do in SPARQL?
SERVICE tells the local query engine to delegate a sub-pattern to a remote SPARQL endpoint and fold the results back into the main query. It is the mechanism that makes a query federated across two or more datasets.
Why is my federated query timing out?
Most often because the remote pattern is unbound and pulls millions of rows before the join. Bind variables locally first, send only specific values to the remote SERVICE, and add a LIMIT while testing so the remote endpoint returns quickly.
Can I federate against Wikidata from my own endpoint?
Yes, by adding SERVICE <https://query.wikidata.org/sparql> inside your query, provided your engine allows outbound SERVICE calls and you respect Wikidata's rate limits and user-agent policy.
Do both endpoints need to use the same vocabulary?
No, but you must bridge them. Either the datasets already share URIs, or you join through an owl:sameAs or matching identifier, mapping each side's properties explicitly in your query.
Is SERVICE SILENT safe to use?
SERVICE SILENT stops a failed remote call from aborting the whole query, returning partial results instead. It is useful for resilience against a flaky endpoint, but it hides failures, so log when it triggers rather than relying on silent gaps.
How do I keep federated queries reproducible?
Pin endpoint URLs, record the query date because remote data changes, save the exact query text in version control, and where possible note dataset versions or dumps so a reviewer can reconstruct the same result later.