Best Practices to Use LOCKSS for distributed preservation

LOCKSS — "Lots Of Copies Keep Stuff Safe" — is a peer-to-peer preservation system in which independent nodes hold copies of the same content and continuously audit each other, voting to detect damage and repairing it from peers. The best practice for using it is to build a network of genuinely independent partners (a Private LOCKSS Network for your own content), run enough nodes for a meaningful majority vote, and treat the polling and repair logs as your evidence of integrity. It defends against correlated failures that ordinary backups cannot.

How is LOCKSS different from a backup?

A backup is a passive copy, usually under one organisation's control, in one or two locations. If a policy error, ransomware event or institutional collapse hits the controller, every copy can vanish together. LOCKSS inverts this: copies live with separate organisations, and the system actively and continuously compares them. The protection is not "we have a copy" but "many independent parties keep proving the copy is correct."

Property	Ordinary backup	LOCKSS
Control	Single administrator	Many independent peers
Integrity checking	You schedule it	Continuous, automatic polling
Repair	Manual restore	Automatic from a peer
Threat resisted	Hardware/site loss	Also correlated/admin failure
Trust model	Trust one party	Majority vote, no central authority

How does the polling and repair actually work?

Nodes periodically run polls on each Archival Unit (AU). Every participating node hashes its copy of the content, and the nodes vote on which hash is correct. A node that finds itself in the minority concludes its copy is damaged and fetches a clean copy from a peer that agreed with the majority. No node is trusted absolutely; integrity emerges from agreement.

text

Poll on AU "Parish Records 1841":
  Node A  hash=3f9a...  agree
  Node B  hash=3f9a...  agree
  Node C  hash=3f9a...  agree
  Node D  hash=11c7...  DISAGREE  -> repairs from A/B/C
  Node E  hash=3f9a...  agree
Result: damage on D detected and repaired automatically.

This is why node independence is the whole game: if several nodes share storage, power or administration, a single fault can corrupt enough copies to win a bad vote.

When should I use a Private LOCKSS Network?

The classic public use of LOCKSS preserves subscribed e-journals. For preserving your own institutional content — digitised collections, born-digital archives, datasets — you build a Private LOCKSS Network (PLN): a closed, trusted group of partner institutions each running a node for the shared collections. Good candidates are consortia of libraries, regional archive partnerships, and research-data alliances who can commit to running nodes for years.

What are the best-practice checklist items?

Use this as your governance and configuration checklist:

Genuine independence — partners in different organisations, locations, power grids and admin domains. Co-located nodes defeat the purpose.
Enough nodes for a real vote — aim for a minimum of about six to seven independent nodes per collection so a majority is meaningful.
A written PLN agreement — who runs what, for how long, exit terms, and what happens if a partner leaves.
Content harvest plans (AUs) under version control — the definitions of what each node collects.
Monitor poll results — treat repeated repairs on one node as a hardware warning, not noise.
Document everything — the network topology, agreements and poll history are your audit evidence.

How do I prepare content for LOCKSS?

Define each collection as one or more Archival Units with a stable manifest the nodes can harvest. Keep the content addressable over HTTP(S) and supply a LOCKSS plugin or manifest page that tells nodes what belongs to the AU. In the modern LOCKSS 2.x ("Laaws") architecture this runs as containerised services:

bash

# Modern LOCKSS deploys as Docker services
git clone https://github.com/lockss/lockss-installer
cd lockss-installer
./scripts/configure-lockss      # set network, AUs, peers
./scripts/start-lockss          # bring up the node services

After start-up, register your node with the PLN's configuration so it begins harvesting the agreed AUs and joining polls.

What are the trade-offs to weigh?

LOCKSS buys exceptional resilience against correlated and administrative failure, but it asks for sustained multi-institution commitment, modest technical operations at each node, and patience — repair is gradual, by design. It is not a quick personal backup; it is infrastructure for content a community has agreed to keep together for the long term. For a single small archive with no partners, a disciplined 3-2-1 strategy is simpler; LOCKSS earns its keep when independence across organisations is the threat you must defeat.

Key Takeaways

LOCKSS keeps many independent copies that continuously audit each other and self-repair.
It defends against correlated and administrative failures that ordinary backups cannot.
Integrity comes from majority-vote polling — no node is trusted absolutely.
Use a Private LOCKSS Network of trusted partner institutions for your own content.
Node independence (separate orgs, sites, power, admin) is essential; co-location defeats it.
Run enough nodes for a meaningful vote (around six to seven independent ones).
Modern LOCKSS 2.x deploys as containerised services and integrates via web APIs.

Frequently Asked Questions

What is LOCKSS?

LOCKSS — 'Lots Of Copies Keep Stuff Safe' — is an open-source, peer-to-peer preservation system from Stanford. Independent nodes hold copies of the same content and continuously compare them, voting to detect and repair damage without any central authority.

How does LOCKSS repair damaged content?

Nodes run polls: each computes hashes over its copy and they vote. A node whose copy disagrees with the majority is identified as damaged and repairs itself by fetching good content from a peer that won the poll.

What is the difference between LOCKSS and a normal backup?

Backups are passive copies under one administrator. LOCKSS copies are held by independent organisations and are actively, continuously audited against each other, so it resists correlated failures and single points of control.

Do I need a Private LOCKSS Network?

If you are preserving your own institution's content rather than subscribed e-journals, yes. A Private LOCKSS Network (PLN) is a closed group of trusted partners running nodes for shared collections.

How many nodes does a LOCKSS network need?

Polling needs enough independent copies for a meaningful majority vote — typically a minimum of around six to seven nodes is recommended, held by genuinely separate organisations and locations.

Is LOCKSS still actively developed?

Yes. The LOCKSS Program modernised into the containerised LOCKSS 2.x / 'Laaws' architecture, which exposes web services and integrates more easily with other preservation tools.

How is LOCKSS different from a backup? ​

How does the polling and repair actually work? ​

When should I use a Private LOCKSS Network? ​

What are the best-practice checklist items? ​

How do I prepare content for LOCKSS? ​

What are the trade-offs to weigh? ​

Key Takeaways ​

Frequently Asked Questions ​

What is LOCKSS? ​

How does LOCKSS repair damaged content? ​

What is the difference between LOCKSS and a normal backup? ​

Do I need a Private LOCKSS Network? ​

How many nodes does a LOCKSS network need? ​

Is LOCKSS still actively developed? ​

Related reading ​