Appearance
To capture a site with Conifer, create a collection, start a Capture session against your seed URL, then browse the page like a normal user — every request your browser makes is recorded into a WARC you can later export and replay. Conifer's strength is human-driven, high-fidelity capture of interactive pages that defeat automated crawlers, and this guide walks the full workflow from account to exported WARC.
What is Conifer best at?
Conifer (maintained by Rhizome, the successor to the original Webrecorder hosted service) records exactly what a real browser fetches while a person operates it. That makes it the right tool when content only appears after interaction: a logged-in dashboard, an infinite-scroll timeline, a lightbox gallery, or an embedded media player that lazy-loads streams. An unattended crawler typically skips those; Conifer captures them because you trigger them.
How do I set up a collection and a capture session?
- Sign in and create a collection — the container for related captures, e.g. "regional-news-2024".
- Click New Capture and paste the seed URL.
- Pick a browser/recording environment and start the session.
- Browse deliberately: scroll to the bottom, open each tab, expand accordions, play one video for a few seconds, click "load more".
- Stop the session. Conifer writes the WARC and lists the recorded pages.
The cardinal rule: if you do not visit a resource during the session, it is not in the archive. Conifer records traffic, not the abstract "whole site".
How do I capture interactive and logged-in pages?
For a page behind authentication, log in inside the capture session so the login traffic and resulting cookies are part of the recording context. Then exercise the features you need preserved. For scroll-to-load feeds, scroll slowly to the end and pause so each batch of requests completes before you move on.
text
Session checklist for a dynamic page:
[ ] Scrolled fully to the bottom (triggered all lazy loads)
[ ] Opened every navigation tab once
[ ] Played embedded media for ~3-5 seconds
[ ] Submitted/expanded any in-page widgets you must preserve
[ ] Confirmed page count rose in the Conifer sidebarWhat is Patch mode and when do I use it?
Patch mode is Conifer's gap-filler. You replay a previously captured page, and Conifer records only the resources that are missing from the existing WARC. This is far cheaper than re-capturing from scratch and is the standard fix when a quality check reveals a broken image or an un-recorded API call. Workflow: open the page in Replay, switch to Patch, interact with the broken element, then stop.
How do I export and verify the WARC?
From the collection page, choose Download to get standard WARC files. Verify them outside Conifer so you know the archive is portable:
bash
# Inspect record types and counts in an exported WARC
warcio index conifer-export.warc.gz -f warc-type,target-uri | head -20
# Replay locally to confirm fidelity, independent of Conifer
wb-manager init regional-news
wb-manager add regional-news conifer-export.warc.gz
wayback # then open http://localhost:8080If the page replays cleanly in pywb, your capture survives independently of the service.
Conifer vs. an automated crawler — which fits?
| Factor | Conifer | Browsertrix / automated |
|---|---|---|
| Interaction (logins, scroll) | Excellent | Limited without behaviors |
| Pages per hour | Low (human-paced) | High |
| Fidelity on hard pages | Very high | Variable |
| Scheduling / repeat crawls | Manual | Native |
| Best use | A few critical pages | Whole sites |
A common professional pattern is to crawl the bulk of a site automatically and use Conifer only for the dozen pages that need a human in the loop.
Key Takeaways
- Conifer records what you browse, so visit every resource you want preserved.
- It excels at interactive, logged-in and scroll-to-load pages that crawlers miss.
- Patch mode fills gaps cheaply by recording only missing resources during replay.
- Always export to standard WARC and verify replay in pywb to avoid lock-in.
- Capturing logged-in or paywalled content carries rights and ToS obligations.
- Pair Conifer with an automated crawler: bulk by crawler, hard pages by hand.
Frequently Asked Questions
What is Conifer and who maintains it?
Conifer is a hosted, interactive web-archiving service maintained by Rhizome, descended from the original Webrecorder.io. You browse pages in your account and it records them into WARCs you can export.
When should I use Conifer instead of an automated crawler?
Choose Conifer for pages that only render after human interaction — logins, scroll-to-load feeds, embedded video and complex forms — where an unattended crawler would miss content. For large static sites, an automated crawler is faster.
Can I export my captures out of Conifer?
Yes. Each collection can be downloaded as standard WARC files, so you are never locked in. You can replay those WARCs in pywb or ReplayWeb.page independently of the service.
Does Conifer capture logged-in or paywalled content?
It can, because you drive the browser yourself and log in during the session. Capturing such content raises rights and terms-of-service questions, so confirm you are permitted before archiving and before sharing the result.
What is the difference between Capture and Patch mode?
Capture mode records everything you visit fresh. Patch mode replays an existing session and only records resources that are missing, which is the standard way to fill gaps without re-capturing the whole page.