Troubleshooting: Write SPARQL queries for heritage

When a SPARQL query against heritage data misbehaves, the cause is almost always one of three things: a namespace or case mismatch in your predicates, an unbounded query plan that scans too much, or silent data shape problems like missing language tags. Diagnose by probing one known subject for its real predicates before debugging the whole query. Below are the failures I hit most and the fixes that resolve them.

Why does a query return zero rows when the data is there?

This is the most common heritage SPARQL problem and it is rarely the data's fault. You assumed a property URI that does not match what was loaded. Confirm the subject exists first:

sparql

SELECT ?p ?o WHERE {
  <https://data.myarchive.org/item/MS-0421> ?p ?o .
}

If that returns rows, the subject is fine and your problem is a predicate mismatch. Look at the real ?p values: maybe the data uses dcterms:created but you queried dc:created, or the prefix dct: expands differently than you expect. Predicate URIs are exact strings; one wrong character returns nothing with no error.

How do you discover a resource's actual properties?

Never guess the schema. Ask the data what it contains:

sparql

SELECT DISTINCT ?p WHERE { ?s ?p ?o } LIMIT 100

For class coverage, count instances per type to learn the real shape:

sparql

SELECT ?type (COUNT(?s) AS ?n) WHERE {
  ?s a ?type .
} GROUP BY ?type ORDER BY DESC(?n)

This single query tells you whether your manuscripts are schema:Manuscript, crm:E22, or something else entirely. Heritage datasets often mix classes from several modelling rounds.

Why is my query timing out?

Public endpoints like the Wikidata Query Service or a shared Fuseki enforce time limits. Timeouts mean an unbounded plan. The fixes, in priority order:

Symptom	Fix
No LIMIT	Add `LIMIT 1000` while developing
Slow leading pattern	Put the most selective triple first
`OPTIONAL` early	Move `OPTIONAL` blocks to the end
Unbound `regex`	Replace with `CONTAINS` or a bound value
Huge cross product	Bind a class or specific value before joining

A query that binds a concrete class early runs orders of magnitude faster:

sparql

SELECT ?item ?title WHERE {
  ?item a schema:Manuscript ;        # selective: bind class first
        dct:title ?title .
  OPTIONAL { ?item dct:created ?date }   # optional last
} LIMIT 500

What is behind "malformed query" errors?

These are syntax faults, and the error message gives you a line and column. The usual culprits:

A property used without its PREFIX declaration.
A missing . between two triple statements.
A FILTER placed outside the { } group it belongs to.
An unescaped character inside a <...> URI.

When in doubt, comment out half the query to isolate the failing clause, then bisect.

How do language tags silently break heritage queries?

Multilingual heritage data is full of language-tagged literals. A query like FILTER(?label = "Bristol") matches an untagged literal but misses "Bristol"@en. Be explicit:

sparql

SELECT ?place ?label WHERE {
  ?place rdfs:label ?label .
  FILTER(lang(?label) = "en" || lang(?label) = "")
}

Mixing tagged and untagged values without handling both is a classic source of "where did half my results go".

Why do federated SERVICE queries fail at random?

When you reach into Wikidata or another endpoint with SERVICE, you depend on a remote system that rate-limits and times out. Defensive patterns help:

sparql

SELECT ?item ?wdLabel WHERE {
  ?item schema:sameAs ?wd .
  SERVICE <https://query.wikidata.org/sparql> {
    ?wd rdfs:label ?wdLabel .
    FILTER(lang(?wdLabel) = "en")
  }
} LIMIT 50

Keep the local side selective, push a LIMIT so you never flood the remote, and never assume the remote endpoint uses your local property URIs.

Key Takeaways

Zero rows almost always means a predicate namespace or case mismatch; probe one known subject first.
Ask the data for its real predicates and classes instead of assuming the schema.
Beat timeouts by adding LIMIT, binding selective patterns first, and pushing OPTIONAL to the end.
"Malformed query" is syntax: missing prefix, missing dot, or a misplaced FILTER; the error column points near it.
Handle language tags explicitly or you will silently lose multilingual rows.
Wrap federated SERVICE blocks defensively with small LIMITs and no assumptions about remote URIs.

Frequently Asked Questions

Why does my SPARQL query return zero rows when the data exists?

Almost always a namespace or case mismatch: the predicate in your query uses a different prefix expansion than the data. Run a no-predicate probe query first to confirm the subject exists, then check the exact property URI.

How do I see what properties a resource actually has?

Query for all predicates of one known subject by binding that subject URI and selecting distinct ?p in a single triple pattern. This reveals the real property URIs instead of the ones you assumed.

Why is my query timing out on a public endpoint?

You are probably triggering an unbounded scan. Add LIMIT, push the most selective triple pattern first, avoid leading OPTIONAL and unbound regex FILTERs, and bind a specific class or value early.

What causes 'malformed query' errors?

Usually a missing prefix declaration, an unescaped angle bracket in a URI, a missing dot between triples, or a FILTER outside its graph pattern. Read the column number in the error; it points near the fault.

How do I handle language tags in results?

Filter explicitly with FILTER(lang(?label) = 'en') or use a fallback pattern, because mixing tagged and untagged literals silently drops rows you expected to see.

Why do federated queries fail intermittently?

Remote endpoints rate-limit, time out, or change schema. Wrap SERVICE blocks defensively, add LIMIT inside them, and never assume the remote endpoint mirrors your local property URIs.

Why does a query return zero rows when the data is there? ​

How do you discover a resource's actual properties? ​

Why is my query timing out? ​

What is behind "malformed query" errors? ​

How do language tags silently break heritage queries? ​

Why do federated SERVICE queries fail at random? ​

Key Takeaways ​

Frequently Asked Questions ​

Why does my SPARQL query return zero rows when the data exists? ​

How do I see what properties a resource actually has? ​

Why is my query timing out on a public endpoint? ​

What causes 'malformed query' errors? ​

How do I handle language tags in results? ​

Why do federated queries fail intermittently? ​

Related reading ​

Why does a query return zero rows when the data is there?

How do you discover a resource's actual properties?

Why is my query timing out?

What is behind "malformed query" errors?

How do language tags silently break heritage queries?

Why do federated SERVICE queries fail at random?

Key Takeaways

Frequently Asked Questions

Why does my SPARQL query return zero rows when the data exists?

How do I see what properties a resource actually has?

Why is my query timing out on a public endpoint?

What causes 'malformed query' errors?

How do I handle language tags in results?

Why do federated queries fail intermittently?

Related reading