Best Practices to Evaluate a digital humanities project

To evaluate a digital humanities project, assess it across five dimensions — scholarly contribution, technical soundness, sustainability, usability and impact — against success criteria defined before launch, and back every judgement with evidence rather than a single headline metric. The best evaluations are consistent and defensible because they are pre-registered, mix quantitative and qualitative measures, and are owned by a named evaluator. Vanity metrics like raw pageviews are the trap to avoid.

What does a good DH evaluation actually measure?

Five dimensions, each answering a different question:

Dimension	The question it answers	Example evidence
Scholarly contribution	Does it advance or reframe a research question?	peer review, citations, documented argument
Technical soundness	Is the method correct and reproducible?	code review, reproducible pipeline, test data
Sustainability	Will it survive without the original team?	preservation plan, open formats, hosting model
Usability	Can the intended audience use it?	user testing, accessibility audit (WCAG)
Impact	Did it change practice, teaching or understanding?	reuse, adoption, qualitative testimony

A project strong on one axis and silent on the others is not a success — it is unbalanced. Scoring all five forces an honest picture.

When should you plan the evaluation?

At the proposal stage, before any work begins. This is the single most important best practice, because criteria invented at the end inevitably flatter whatever was easy to count. Write the success criteria into the project plan and treat them as a contract with your future self:

markdown

# Success criteria (set 2026-04, before build)
- Scholarly: dataset underpins ≥1 peer-reviewed output within 18 months
- Technical: pipeline reruns from README on a clean machine; 0 failing checks
- Sustainability: data deposited with a DOI in an open repository
- Usability: passes WCAG 2.1 AA; 5 target users complete core task unaided
- Impact: ≥3 external reuses (citation, teaching use, derivative dataset)

Pre-registered criteria are what make the final verdict defensible rather than retrofitted.

Why are pageviews and downloads misleading?

Because they are vanity metrics — easy to inflate, weakly tied to value, and silent on whether anyone used the thing. A dataset downloaded a thousand times and never opened is worth less than one cited in three monographs. The fix is to pair usage data with evidence of genuine reuse: citations, derivative datasets, inclusion in syllabi, or documented decisions that drew on the work. Always ask "so what did this download lead to?" before counting it.

How do you evaluate something as fuzzy as scholarly contribution?

Through peer review and a transparent, reproducible method. Contribution is not a number; it is a judgement about whether the project answers or reframes a research question and whether its reasoning can be interrogated. A method documented well enough for another scholar to critique — or rerun — is itself strong evidence of contribution, because opacity is the enemy of scholarship. Lean on community frameworks: the MLA and AHA statements on evaluating digital scholarship, the TaDiRAH taxonomy for situating the activity, and DARIAH guidance all give structured lenses for this judgement.

How do you keep results consistent across a whole project?

Use a fixed rubric applied the same way to every component and every iteration. Inconsistency creeps in when one module is judged on usage and another on peer praise. A simple banded rubric — for example each dimension rated not met / partially met / met / exceeded with evidence cited — keeps the assessment comparable across time and across reviewers. Store completed rubrics in the repository so the evaluation is itself documented and auditable.

A working evaluation checklist

Success criteria written at the proposal stage and stored in the repo.
All five dimensions scored, not just the convenient ones.
Each judgement backed by attached evidence, not assertion.
Quantitative metrics paired with qualitative reuse evidence.
Vanity metrics flagged and discounted.
A named evaluator owns the assessment.
Completed rubric committed to version control.

Key Takeaways

Evaluate across five dimensions: scholarly contribution, technical soundness, sustainability, usability and impact.
Define success criteria at the proposal stage; retrofitted metrics are not defensible.
Reject vanity metrics — pair usage data with evidence of real reuse and citation.
Judge scholarly contribution through peer review and a reproducible, critique-able method.
Use a fixed banded rubric with cited evidence to keep judgements consistent and auditable.
Defensibility comes from pre-registered criteria and shown reasoning, not a single number.

Frequently Asked Questions

How do you evaluate a digital humanities project?

Assess it across five dimensions — scholarly contribution, technical soundness, sustainability, usability and impact — against criteria you set before launch. Mixing quantitative metrics with qualitative judgement gives a defensible, rounded verdict.

Why are download and pageview counts a poor measure of DH impact?

They are vanity metrics: easy to inflate and weakly correlated with scholarly value. A dataset cited in three monographs matters more than one downloaded a thousand times and never used. Pair usage data with evidence of actual reuse.

When should evaluation be planned?

At the proposal stage. Defining success criteria before the work starts prevents the common failure of measuring whatever happened to be easy to count at the end and calling it success.

What frameworks exist for evaluating digital scholarship?

Guidance such as the MLA and AHA statements on evaluating digital scholarship for tenure, the TaDiRAH activity taxonomy for situating the work, and the DARIAH and Digital Humanities community criteria all offer structured lenses.

How do you evaluate something as fuzzy as scholarly contribution?

Through peer review and documented argument: does the project answer or reframe a research question, and is its method transparent enough to critique? A reproducible method that others can interrogate is itself evidence of contribution.

What makes a DH evaluation defensible?

Pre-registered criteria, evidence attached to each judgement, a mix of quantitative and qualitative measures, and a named evaluator. Defensibility comes from showing your reasoning, not from a single headline number.

What does a good DH evaluation actually measure? ​

When should you plan the evaluation? ​

Why are pageviews and downloads misleading? ​

How do you evaluate something as fuzzy as scholarly contribution? ​

How do you keep results consistent across a whole project? ​

A working evaluation checklist ​

Key Takeaways ​

Frequently Asked Questions ​

How do you evaluate a digital humanities project? ​

Why are download and pageview counts a poor measure of DH impact? ​

When should evaluation be planned? ​

What frameworks exist for evaluating digital scholarship? ​

How do you evaluate something as fuzzy as scholarly contribution? ​

What makes a DH evaluation defensible? ​

Related reading ​

What does a good DH evaluation actually measure?

When should you plan the evaluation?

Why are pageviews and downloads misleading?

How do you evaluate something as fuzzy as scholarly contribution?

How do you keep results consistent across a whole project?

A working evaluation checklist

Key Takeaways

Frequently Asked Questions

How do you evaluate a digital humanities project?

Why are download and pageview counts a poor measure of DH impact?

When should evaluation be planned?

What frameworks exist for evaluating digital scholarship?

How do you evaluate something as fuzzy as scholarly contribution?

What makes a DH evaluation defensible?

Related reading