Appearance
To query Wikidata with SPARQL reliably, prefix every query with a comment stating its purpose and run date, use wdt: for direct values and p:/pq: only when you need qualifiers, filter early to stay under the 60-second timeout, and add the label service last. The biggest quality risk is not syntax but reproducibility: Wikidata changes daily, so a query that returns 412 items today may return 430 next month. Document and archive your results.
How do I structure a query so it stays readable?
Lead with intent. A maintainable heritage query looks like this:
sparql
# Purpose: manuscripts in Collection Q12345 dated before 1500
# Run: 2025-01-20 by E. Reed
SELECT ?item ?itemLabel ?inception WHERE {
?item wdt:P195 wd:Q12345 ; # in the collection
wdt:P31 wd:Q87167 ; # instance of manuscript
wdt:P571 ?inception .
FILTER( ?inception < "1500-01-01"^^xsd:dateTime )
SERVICE wikibase:label { bd:serviceParam wikibase:language "en,mul". }
}
ORDER BY ?inceptionThe comment block is not optional for serious work — it is the difference between a result you can defend in a footnote and one you cannot reconstruct.
When should I use p: and pq: instead of wdt:?
wdt: jumps straight to the preferred value and hides everything else. Use the verbose path when you need qualifiers or references:
sparql
# Reach the qualifier on a date range
SELECT ?item ?earliest ?latest WHERE {
?item wdt:P195 wd:Q12345 ;
p:P571 ?st .
?st pq:P1319 ?earliest ;
pq:P1326 ?latest .
}| Prefix | Reaches | Use when |
|---|---|---|
wdt: | direct/truthy value | most attribute lookups |
p: | statement node | you need qualifiers or rank |
pq: | qualifier value | dates ranges, attribution circumstances |
pr: | reference value | auditing sources |
wd: | an entity | naming a specific item/property value |
Why does my SPARQL query keep timing out?
The public endpoint cuts queries at 60 seconds. Common heritage culprits and fixes:
- A label service inside a huge join — move it to the outermost block.
- Unbounded
OPTIONALchains — restrict with a leading mandatory triple. - Property paths like
wdt:P361*over deep hierarchies — cap the depth or precompute. - No
LIMITwhile developing — always add one until the shape is right.
If a legitimate analytical query genuinely needs more, run it on Quarry, export a dump subset, or stand up your own QLever endpoint, which answers many queries in milliseconds.
How do I make results consistent across a collection?
Standardise your patterns. Keep a small library of validated query templates — "all items missing a reference", "items with imprecise dates", "duplicate-candidate labels" — and run them across collections rather than writing one-off queries each time. A consistency check might be:
sparql
# Items in the collection with no reference on their date statement
SELECT ?item WHERE {
?item wdt:P195 wd:Q12345 ; p:P571 ?st .
FILTER NOT EXISTS { ?st prov:wasDerivedFrom ?ref . }
}How do I keep a query reproducible for publication?
Three habits make a query citable: shorten and save the URL at query.wikidata.org, record the run date in a comment, and archive the CSV export with your dataset. Because all statements are mutable, state explicitly in your method note that figures reflect Wikidata as of a given date. For frozen reproducibility, cite a specific dump.
What is a sensible pre-flight checklist?
Before trusting a result, verify: the query has a purpose comment and date; wdt:/p: are chosen deliberately; there is a LIMIT during development; labels resolve; date filters account for precision; and the result count is sanity-checked against what you expected. A query that returns suspiciously round or suspiciously zero counts almost always has a triple in the wrong direction.
Key Takeaways
- Open every query with a purpose-and-date comment for defensibility.
- Use
wdt:for values; switch top:/pq:/pr:only for qualifiers and references. - Beat the 60-second timeout by filtering early and moving labels to the outermost block.
- Build a reusable template library so results stay consistent across collections.
- Archive a CSV plus the shortened URL to make published figures reproducible.
- Wikidata is mutable; always state the run date in your methods.
Frequently Asked Questions
What is the difference between wdt: and p: prefixes?
wdt: gives the direct, truthy value of a statement and is what you usually want. p: leads into the full statement node so you can reach qualifiers and references via pq: and prov:/pr:.
Why does my query time out after 60 seconds?
The public Wikidata Query Service enforces a 60-second limit. Add a LIMIT, filter early, avoid SERVICE wikibase:label inside large joins, and consider splitting the query or using a Quarry/dump for very large results.
How do I get human-readable labels?
Add the label service: SERVICE wikibase:label { bd:serviceParam wikibase:language "en,mul". } and then reference ?itemLabel. Put it at the end so it does not bloat the join.
Can I query across a date range?
Yes. Bind the date to a variable with wdt:P571 and use FILTER with xsd:dateTime comparisons, but be aware that mixed date precisions can produce surprising boundary results.
How do I make a query reproducible for a publication?
Save it as a shortened query.wikidata.org URL, record the run date in a comment, and note that Wikidata is mutable so results may change. Archive a CSV export alongside your paper.
Should I use the public endpoint or my own copy?
Use the public endpoint for interactive work and small results. For repeated heavy analytics or guaranteed stability, load a dump into your own Blazegraph or QLever instance.