When to Preregister a computational study

Preregister a computational study when you are running a confirmatory test — a specific, falsifiable hypothesis analysed with a method you can fix in advance — on data whose outcomes you have not yet seen. Do not preregister exploratory corpus work, descriptive cataloguing, or tool development, where pretending you had a fixed plan would be a fiction. The honest signal is simple: if you could write down, today, the exact statistic and the threshold that would prove you wrong, preregistration protects you and your reader. If you cannot, you are exploring, and you should say so.

What problem does preregistration actually solve?

It solves researcher degrees of freedom. In a computational humanities study you choose a tokeniser, a stopword list, a similarity threshold, a date range, which authors count as "canonical". Each choice is defensible, but tried after seeing results, the combination that produces a significant finding is easy to reach by accident. Preregistration timestamps those decisions before the data can influence them, so a later reader can tell a predicted result from a result you fished out. This is the core of distinguishing confirmation from exploration.

When should I preregister a DH study?

Preregister when most of these hold:

You have a directional hypothesis (e.g. "epistolary networks densify after 1660").
You will use inferential statistics, not just description.
The dataset exists but you have not yet run the planned analysis on it.
The result will be used to make a claim, not to build a catalogue or interface.

text

Confirmatory  -> preregister: hypotheses, variables, model, exclusion rules, stop rule
Exploratory   -> document instead: data, code, decisions, but label findings exploratory

When is preregistration the wrong tool?

It is a poor fit, even counterproductive, for:

Situation	Better practice than preregistration
Building an OCR or NER pipeline	Versioned code, tests, benchmark report
Cataloguing or describing a collection	Documented workflow and metadata standard
Open-ended distant reading	Exploratory analysis, clearly labelled
Data you have already deeply analysed	Honest secondary-data disclosure, not a fake plan

Forcing a rigid hypothesis onto genuinely exploratory humanities work produces hollow preregistrations that satisfy a checkbox while obscuring how the insight really arose.

How do I preregister, in practice?

On the Open Science Framework you create a project, then "Register" it with a template such as the Preregistration or the secondary-data template, which freezes a read-only, DOI-bearing snapshot. A workable plan states:

markdown

Hypothesis: H1 — mean betweenness centrality of merchant nodes
             rises in correspondence after 1660.
Data: 4,210 letters, Early Modern Letters Online, accessed 2024-10.
Analysis: directed network per decade; compare pre/post-1660 with a
          permutation test, alpha = 0.01, two-sided.
Exclusions: letters with undated or unidentified senders.
Stop rule: full corpus; no interim peeking.

For higher-stakes work, a Registered Report sends this plan through peer review before results exist, so acceptance does not hinge on the outcome — which directly attacks publication bias.

What does it cost, and is it worth it?

The cost is front-loaded thinking: a few days specifying analysis you might otherwise decide on the fly, plus the discipline to honour the plan. The payoff is credibility — a reviewer can trust your confirmatory claim — and clarity, because you stop conflating "I found" with "I predicted". For a one-off exploratory essay the overhead rarely pays back. For a quantitative argument you expect to be challenged, it is cheap insurance.

Key Takeaways

Preregister confirmatory hypothesis tests on not-yet-analysed data; skip it for exploration and tool-building.
The honest test: can you state today the statistic and threshold that would falsify you?
Preregistration's job is to fence in researcher degrees of freedom and separate prediction from discovery.
Secondary-data templates and Registered Reports let you preregister even with existing archival data.
It never forbids exploration — it just requires you to label exploratory findings as such.
Use OSF or AsPredicted; reach for a Registered Report when the stakes justify peer-reviewed pre-acceptance.

Frequently Asked Questions

What is preregistration in a computational study?

Preregistration is publishing your hypotheses, data sources and analysis plan to a time-stamped, read-only record before you look at the outcomes. It separates what you predicted from what you discovered.

When should I preregister and when is it pointless?

Preregister when you are testing a specific hypothesis with confirmatory statistics on data you have not yet analysed. Skip it for exploratory work, descriptive cataloguing, or tool-building, where a fixed plan would be dishonest.

Can I preregister if my data already exists in an archive?

Yes, with a Registered Report or a secondary-data preregistration that honestly states what you have and have not already seen. The key is that the specific analysis is planned before you run it.

What is the difference between preregistration and a Registered Report?

A preregistration is a time-stamped plan you file yourself. A Registered Report is peer-reviewed and accepted in principle by a journal before results exist, so publication does not depend on the outcome.

Where do humanities scholars preregister?

The Open Science Framework (OSF) and AsPredicted are common, and OSF lets you embargo the plan. Some DH venues now accept Registered Reports for quantitative work.

Does preregistration forbid exploration?

No. It just labels it. You report confirmatory tests against your plan, then clearly mark additional findings as exploratory, so readers can weight them correctly.

What problem does preregistration actually solve? ​

When should I preregister a DH study? ​

When is preregistration the wrong tool? ​

How do I preregister, in practice? ​

What does it cost, and is it worth it? ​

Key Takeaways ​

Frequently Asked Questions ​

What is preregistration in a computational study? ​

When should I preregister and when is it pointless? ​

Can I preregister if my data already exists in an archive? ​

What is the difference between preregistration and a Registered Report? ​

Where do humanities scholars preregister? ​

Does preregistration forbid exploration? ​

Related reading ​