Appearance
To archive a single web page well, capture it with a browser-based tool that records what your browser actually renders — ArchiveWeb.page or Conifer — export it as WACZ, then replay it offline to confirm every image, font and link survived. For a permanent public copy, also push the URL to the Wayback Machine. Avoid relying on a screenshot alone: it looks like archiving but throws away the searchable text, the links and the verifiable provenance. Here is the full workflow a working archivist can reuse for any one-off page.
What does "well" mean for a single page?
A good single-page capture is complete, replayable, documented and verifiable. Concretely:
- Every resource the page needs (HTML, CSS, JS, images, fonts, media) is stored.
- It replays interactively offline, not just as a flat image.
- Its provenance — URL, UTC timestamp, tool, operator — is recorded.
- It has a fixity checksum so future tampering is detectable.
A screenshot fails the first, second and fourth tests, which is why it is a supplement, not the archive.
Step 1 — Capture what the browser renders
For one page, the ArchiveWeb.page extension is ideal because it captures live, JavaScript-rendered content from your own session:
- Install ArchiveWeb.page (Chrome/Brave/Edge).
- Start recording, navigate to the page, scroll the whole thing, expand any "read more" sections.
- Stop recording and download the WACZ.
If the page is highly dynamic or behind a login, Conifer (hosted session recording) is the more robust choice.
Step 2 — Should you use WARC or single-file HTML?
It depends on whether provenance matters.
| Approach | Fidelity | Provenance | Good for |
|---|---|---|---|
| WARC / WACZ | High | Full (headers, timestamps) | Evidence, collections |
| SingleFile HTML | Medium | None | Quick personal reference |
| PDF / screenshot | Visual only | None | Visual supplement |
For anything that might be cited or held as a record, choose WARC/WACZ. SingleFile is fine for a quick note-to-self but flattens resources and discards the request/response metadata.
Step 3 — How do I verify the capture is complete?
This is the step beginners skip and regret. Open the WACZ in ReplayWeb.page and check it against the live page:
- Do images, fonts and embedded media render?
- Do internal links resolve within the capture?
- Did scroll-loaded or click-revealed content get included?
Note anything missing. A capture with a documented gap is honest; a capture you never checked is a liability.
Step 4 — Record provenance and fixity
Store a sidecar record next to the WACZ so the capture is citable:
json
{
"url": "https://example.org/article/flood-1953",
"captured_at_utc": "2026-01-15T11:04:22Z",
"tool": "ArchiveWeb.page 0.12",
"operator": "E. Reed",
"wacz_sha256": "a41b…",
"notes": "Complete; one third-party tweet embed did not load."
}Compute the checksum so integrity is provable later:
bash
sha256sum article-flood-1953.wacz > article-flood-1953.wacz.sha256Step 5 — Make a permanent public copy
A local WACZ can be lost. For pages that should survive, also save the URL to the Internet Archive:
bash
# Trigger a Wayback Machine capture from the command line
curl -s "https://web.archive.org/save/https://example.org/article/flood-1953"Now you have two independent copies — your local high-fidelity WACZ and a public timestamped Wayback URL — which is far more resilient than either alone.
How do I cite the archived page?
Cite the original URL, the UTC capture time, and a stable link to the archived copy (a Wayback timestamped URL, or a reference to your local WACZ and its checksum). That triple — original address, when, and where the copy lives — is what makes the archive usable as evidence.
Key Takeaways
- "Well" means complete, replayable, documented and verifiable — not just a screenshot.
- Capture with a browser-based tool (ArchiveWeb.page, Conifer) to get rendered content.
- Prefer WARC/WACZ over SingleFile/PDF when provenance matters.
- Replay and compare against the live page before declaring success.
- Record URL, UTC timestamp, tool and a SHA-256 checksum.
- Keep two copies: a local WACZ plus a Wayback Machine public copy.
- Cite original URL + UTC time + stable archived link.
Frequently Asked Questions
What is the best free tool to archive one web page?
The ArchiveWeb.page browser extension is the best free option for a single page: it captures exactly what your browser renders, including JavaScript content, and exports a WACZ you can replay offline. For a permanent public copy, also save the URL to the Internet Archive's Wayback Machine.
Is a screenshot enough to archive a web page?
No. A screenshot captures appearance but loses the live links, text you can search, and the underlying HTML and resources. A proper capture stores the page and its assets as WARC/WACZ so it can be replayed interactively and verified, while a screenshot is best kept only as a visual supplement.
How do I make sure my single-page capture is complete?
Replay the capture offline and check that images, fonts, embedded media and links render, then compare it against the live page. Confirm any content that loads on scroll or click was triggered during capture, and record what, if anything, is missing.
Should I use single-file HTML or WARC for one page?
Use WARC/WACZ when fidelity and provenance matter, because it preserves each resource with its original headers and timestamps. A single-file HTML (e.g. via SingleFile) is convenient for quick reference but flattens resources and loses the request/response metadata archivists rely on.
How do I cite an archived single page?
Cite the original URL, the capture date and time in UTC, and a stable link to the archived copy (such as a Wayback Machine timestamped URL). Include the archiving tool if the copy is local, so the provenance of your capture is clear.
Can I archive a page that requires a login?
Yes, by capturing the page from an authenticated browser session with a tool like ArchiveWeb.page or Conifer, but only with proper authorisation. Be careful that the capture does not expose other people's private data, and restrict access to the resulting file.