Skip to content
IIIF & Image Interoperability

Use the IIIF Change Discovery API when an external consumer needs to keep an index of your collection in sync incrementally — re-fetching only manifests that changed since their last visit, and learning about deletions — rather than re-crawling everything. If your collection is small, rarely updated, or has no machine consumers, you almost certainly do not need it; a nightly full crawl or a sitemap is simpler and cheaper. The API is an Activity Streams 2.0 feed of created, updated and deleted resources, ordered by time, designed for resumable harvesting at scale.

What does the IIIF Change Discovery API do, exactly?

It publishes a log of activities against your IIIF resources — chiefly manifests and collections. Each activity has a type (Create, Update, Delete, Add, Remove), an object (the resource URI and its type), and an endTime timestamp. Consumers walk the feed newest-to-oldest and stop when they reach activities they have already processed, then go fetch only the referenced manifests.

json
{
  "type": "OrderedCollectionPage",
  "id": "https://example.org/activity/page-742",
  "items": [
    {
      "type": "Update",
      "object": { "id": "https://example.org/iiif/ms-318/manifest", "type": "Manifest" },
      "endTime": "2025-08-14T09:21:00Z"
    },
    {
      "type": "Delete",
      "object": { "id": "https://example.org/iiif/ms-902/manifest", "type": "Manifest" },
      "endTime": "2025-08-13T17:04:00Z"
    }
  ]
}

The Delete line is the key value: a plain list of current manifests cannot tell a consumer that ms-902 was withdrawn. That single capability is the strongest reason to adopt the API.

When is Change Discovery the right tool?

Reach for it when several of these hold:

  • You have tens of thousands of manifests or more, so full re-crawls are slow or expensive.
  • Content changes often (corrections, new digitisations, takedowns) and consumers must stay current.
  • A named aggregator will harvest you — a national portal, a discovery layer, an AI/ML training pipeline, or a federated search.
  • You need to communicate deletions reliably for rights or privacy reasons.
  • Your manifests are expensive to generate (built on demand from a CMS), so avoiding redundant fetches matters.

If most of those are false, the cost of building and maintaining an activity log outweighs the benefit.

When should you NOT publish a stream?

Skip it — or defer it — when:

  • The collection is under a few thousand items and a wget-style mirror finishes in minutes.
  • Nobody is consuming machine feeds; you would be maintaining infrastructure no one reads.
  • Your manifests are static files on a CDN, where a generated sitemap.xml plus Last-Modified headers already gives consumers most of what they need.
  • You cannot guarantee correct timestamps. A stream with unreliable endTime ordering is worse than no stream, because it silently breaks resumable harvesting.

Change Discovery vs. the alternatives

ApproachTells you what changedExpresses deletesResumableBuild/operate cost
Full re-crawl of Presentation APINo (diff yourself)ImplicitlyNoLow
sitemap.xml + Last-ModifiedPartlyNoNoLow
Change Discovery Level 0YesYesLimitedMedium
Change Discovery Level 1YesYesYesMedium-high
Custom webhook/push feedYesYesDependsHigh, non-standard

The standard wins over a custom webhook because consumers already know how to read it; you do not have to document a bespoke protocol.

Which compliance level should you pick?

There are three levels in the spec. Level 0 is a single OrderedCollection you regenerate and the consumer reads in full each pass — fine up to a few thousand activities. Level 1 paginates into dated OrderedCollectionPage documents so a consumer can stop mid-page once it reaches a timestamp it already harvested; this is the practical target for large collections. Level 2 adds the same delete semantics with stricter ordering guarantees for very high-churn publishers. Start at Level 0 to prove the consumer relationship, then graduate to Level 1 only when full-feed reads become the bottleneck.

How do you generate the feed in practice?

Most publishers emit it from the same database that drives their Presentation API. A minimal pipeline:

sql
SELECT manifest_uri, action, changed_at
FROM iiif_activity_log
WHERE changed_at >= :since
ORDER BY changed_at DESC;
python
# Emit one OrderedCollectionPage of recent activities
page = {
    "@context": "http://iiif.io/api/discovery/1/context.json",
    "type": "OrderedCollectionPage",
    "id": f"https://example.org/activity/page-{n}",
    "partOf": [{"id": "https://example.org/activity/all", "type": "OrderedCollection"}],
    "items": [
        {"type": row.action, "endTime": row.changed_at.isoformat() + "Z",
         "object": {"id": row.manifest_uri, "type": "Manifest"}}
        for row in rows
    ],
}

The hard part is not the JSON — it is reliably recording every create/update/delete as it happens. If you cannot instrument your ingest and takedown workflows to write that log, do not start; a half-recorded log gives consumers false confidence.

Key Takeaways

  • Change Discovery is an incremental sync mechanism: harvest only what changed, including deletions.
  • It pays off at scale (tens of thousands of manifests) and when a real consumer exists.
  • Small or static collections are usually better served by a full crawl or sitemap.
  • The unique capability is signalling deletes — something a manifest list cannot do.
  • Start at Level 0, move to Level 1 pagination only when full reads become slow.
  • The feed is easy; the activity log that feeds it is the real engineering commitment.
  • An inaccurate or badly ordered stream is worse than none, because it breaks resumable harvesting silently.

Frequently Asked Questions

What problem does the IIIF Change Discovery API actually solve?

It lets a consumer learn which resources in a collection were created, updated or deleted since they last harvested, so they re-fetch only the changed manifests instead of crawling the whole collection. It is an incremental sync mechanism, not a search or query API.

Is the IIIF Change Discovery API the same as a sitemap?

No. A sitemap lists what exists now; an Activity Streams OrderedCollection lists what changed and when, including Delete activities. Change Discovery is time-ordered and supports resumable harvesting, which a static sitemap does not.

Do I need the Change Discovery API if my collection is small?

Usually not. Below roughly a few thousand manifests, a nightly full re-crawl of your Presentation API endpoints is simpler to operate and cheaper to reason about than maintaining an activity log.

What is the difference between Level 0 and Level 1 implementations?

Level 0 is a single OrderedCollection you re-read in full each time. Level 1 splits activities into dated OrderedCollectionPage documents so consumers can stop once they reach activities they have already seen.

How do consumers handle deletions?

A Delete activity in the stream signals the consumer should remove the referenced resource from its index. This is the main reason Change Discovery beats a plain manifest list, which cannot express removal.

Which big institutions publish a Change Discovery stream?

Publishers that need machine harvesting at scale, such as national libraries and large aggregators feeding portals, are the typical adopters. If no aggregator will consume your stream, you may not need to publish one.