Skip to content
Wikidata for Heritage

To query Wikidata with SPARQL reliably, prefix every query with a comment stating its purpose and run date, use wdt: for direct values and p:/pq: only when you need qualifiers, filter early to stay under the 60-second timeout, and add the label service last. The biggest quality risk is not syntax but reproducibility: Wikidata changes daily, so a query that returns 412 items today may return 430 next month. Document and archive your results.

How do I structure a query so it stays readable?

Lead with intent. A maintainable heritage query looks like this:

sparql
# Purpose: manuscripts in Collection Q12345 dated before 1500
# Run: 2025-01-20 by E. Reed
SELECT ?item ?itemLabel ?inception WHERE {
  ?item wdt:P195 wd:Q12345 ;       # in the collection
        wdt:P31  wd:Q87167 ;       # instance of manuscript
        wdt:P571 ?inception .
  FILTER( ?inception < "1500-01-01"^^xsd:dateTime )
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en,mul". }
}
ORDER BY ?inception

The comment block is not optional for serious work — it is the difference between a result you can defend in a footnote and one you cannot reconstruct.

When should I use p: and pq: instead of wdt:?

wdt: jumps straight to the preferred value and hides everything else. Use the verbose path when you need qualifiers or references:

sparql
# Reach the qualifier on a date range
SELECT ?item ?earliest ?latest WHERE {
  ?item wdt:P195 wd:Q12345 ;
        p:P571 ?st .
  ?st pq:P1319 ?earliest ;
      pq:P1326 ?latest .
}
PrefixReachesUse when
wdt:direct/truthy valuemost attribute lookups
p:statement nodeyou need qualifiers or rank
pq:qualifier valuedates ranges, attribution circumstances
pr:reference valueauditing sources
wd:an entitynaming a specific item/property value

Why does my SPARQL query keep timing out?

The public endpoint cuts queries at 60 seconds. Common heritage culprits and fixes:

  • A label service inside a huge join — move it to the outermost block.
  • Unbounded OPTIONAL chains — restrict with a leading mandatory triple.
  • Property paths like wdt:P361* over deep hierarchies — cap the depth or precompute.
  • No LIMIT while developing — always add one until the shape is right.

If a legitimate analytical query genuinely needs more, run it on Quarry, export a dump subset, or stand up your own QLever endpoint, which answers many queries in milliseconds.

How do I make results consistent across a collection?

Standardise your patterns. Keep a small library of validated query templates — "all items missing a reference", "items with imprecise dates", "duplicate-candidate labels" — and run them across collections rather than writing one-off queries each time. A consistency check might be:

sparql
# Items in the collection with no reference on their date statement
SELECT ?item WHERE {
  ?item wdt:P195 wd:Q12345 ; p:P571 ?st .
  FILTER NOT EXISTS { ?st prov:wasDerivedFrom ?ref . }
}

How do I keep a query reproducible for publication?

Three habits make a query citable: shorten and save the URL at query.wikidata.org, record the run date in a comment, and archive the CSV export with your dataset. Because all statements are mutable, state explicitly in your method note that figures reflect Wikidata as of a given date. For frozen reproducibility, cite a specific dump.

What is a sensible pre-flight checklist?

Before trusting a result, verify: the query has a purpose comment and date; wdt:/p: are chosen deliberately; there is a LIMIT during development; labels resolve; date filters account for precision; and the result count is sanity-checked against what you expected. A query that returns suspiciously round or suspiciously zero counts almost always has a triple in the wrong direction.

Key Takeaways

  • Open every query with a purpose-and-date comment for defensibility.
  • Use wdt: for values; switch to p:/pq:/pr: only for qualifiers and references.
  • Beat the 60-second timeout by filtering early and moving labels to the outermost block.
  • Build a reusable template library so results stay consistent across collections.
  • Archive a CSV plus the shortened URL to make published figures reproducible.
  • Wikidata is mutable; always state the run date in your methods.

Frequently Asked Questions

What is the difference between wdt: and p: prefixes?

wdt: gives the direct, truthy value of a statement and is what you usually want. p: leads into the full statement node so you can reach qualifiers and references via pq: and prov:/pr:.

Why does my query time out after 60 seconds?

The public Wikidata Query Service enforces a 60-second limit. Add a LIMIT, filter early, avoid SERVICE wikibase:label inside large joins, and consider splitting the query or using a Quarry/dump for very large results.

How do I get human-readable labels?

Add the label service: SERVICE wikibase:label { bd:serviceParam wikibase:language "en,mul". } and then reference ?itemLabel. Put it at the end so it does not bloat the join.

Can I query across a date range?

Yes. Bind the date to a variable with wdt:P571 and use FILTER with xsd:dateTime comparisons, but be aware that mixed date precisions can produce surprising boundary results.

How do I make a query reproducible for a publication?

Save it as a shortened query.wikidata.org URL, record the run date in a comment, and note that Wikidata is mutable so results may change. Archive a CSV export alongside your paper.

Should I use the public endpoint or my own copy?

Use the public endpoint for interactive work and small results. For repeated heavy analytics or guaranteed stability, load a dump into your own Blazegraph or QLever instance.