How to Choose tools for a DH project

To choose tools for a DH project, start from your data and your required output, not from a list of popular software. The shape and volume of your sources, the format you must deliver, and the skills on your team narrow the field before preference enters the picture. Then favour tools that export to open standards, have an active community, and that you have actually piloted on a real sample. The best tool is the one your data fits and your team can sustain.

Should I choose the tool or define the data first?

Always the data first. A tool is a means to transform specific sources into a specific output, so let those endpoints constrain the choice. Ask three questions before naming any software:

What is the input? Handwritten manuscripts, printed text, tabular records, images, spreadsheets?
What is the output? A searchable edition, a map, a dataset, a visualisation?
What are the constraints? Team skills, budget, hardware, the need to preserve results.

Only after answering these does a shortlist make sense. Choosing a platform first and then bending your data to fit it is the most expensive mistake in DH tooling.

How do I avoid lock-in?

Prioritise tools that export cleanly to open, standard formats. If you can always get your data out as CSV, TEI-XML, GeoJSON or IIIF, the tool is replaceable and your work is preservable. Test the export before you commit:

bash

# verify a tool's export is clean, standard and re-importable
tool export --format csv --out test.csv
head -3 test.csv                 # readable headers, no proprietary cruft?
python -c "import csv; list(csv.reader(open('test.csv')))"  # parses cleanly?
xmllint --noout edition.xml      # if TEI/XML, does it validate?

A tool whose only export is a proprietary blob is a trap, however slick its interface.

What factors actually matter in the decision?

Weigh these explicitly rather than going on reputation:

Factor	Why it matters	How to check
Data fit	Wrong fit means constant workarounds	Pilot on a real sample
Open export	Avoids lock-in, aids preservation	Test the export
Community & docs	Determines how much time you lose	Browse forums, issue tracker
Maintenance	Abandoned tools rot	Check recent commits / releases
Skills fit	Team must actually use it	Honest skills audit
Cost	Licences, hosting, training	Total cost, not sticker price

The most underrated of these is community and documentation: a technically modest tool with active maintainers and good tutorials will cost you far less than a powerful one nobody supports.

Is open source always the right answer?

Usually preferable, but not automatically. Open source guards against lock-in and aligns with long-term preservation, which matters enormously in DH. But a well-supported tool with a living community beats an abandoned open-source project with no maintainers. Judge sustainability, not licence alone: an open tool last updated five years ago is riskier than a healthy one you can export away from.

Should I build a custom tool?

Default to no. Custom tools are expensive to build and far more expensive to maintain — and DH projects rarely fund maintenance. Build only when:

No existing tool can produce your required output, and
You have explicitly budgeted long-term maintenance, and
You design for export so the data survives even if the tool does not.

For most projects, composing existing tools — a transcription platform, a spreadsheet, a static site generator — beats a bespoke application every time.

How do I run a proper tool pilot?

Never commit on a demo. Take a representative slice of your real data and run it through the candidate end to end:

Import a real sample, including your messiest cases.
Do the core task you actually need.
Export the result and confirm it is clean and standard.
Note where you got stuck and how easily you found help.
Confirm it runs on your team's hardware and skill level.

Decide on the evidence the pilot produces, not on the marketing page.

Key Takeaways

Define data, output and constraints before naming any tool.
Favour tools that export to open standards to avoid lock-in.
Weigh community, documentation and maintenance as heavily as features.
Open source is usually better, but sustainability matters more than licence.
Default to existing tools; build custom only with funded maintenance and clean export.
Pilot every candidate on a real, messy sample before committing.

Frequently Asked Questions

Should I pick the tool or define the data first?

Define the data and the question first. The shape and volume of your sources, and the output you need, constrain the tool choice far more than personal preference does.

Are open-source tools always the right choice for DH?

Often, but not automatically. Open source protects against lock-in and aligns with preservation, but a well-supported tool with a real community beats an abandoned open one. Weigh sustainability, not just licence.

How do I avoid vendor and format lock-in?

Prioritise tools that export to open, standard formats — CSV, TEI-XML, GeoJSON, IIIF. If you can get your data out cleanly at any time, the tool itself matters less.

What is the most underrated factor in tool choice?

The size and health of the community and documentation. A tool with active maintainers, tutorials and forum answers will cost you far less time than a technically superior but unsupported one.

Should I build a custom tool or use an existing one?

Default to existing tools. Custom builds are expensive to make and far more expensive to maintain. Only build when no existing tool fits and you have funded long-term maintenance.

How do I evaluate a tool before committing?

Run a small pilot on a representative sample, test the export, check the documentation and community, and confirm it runs on your hardware and skills. Decide on evidence, not marketing.

Should I choose the tool or define the data first? ​

How do I avoid lock-in? ​

What factors actually matter in the decision? ​

Is open source always the right answer? ​

Should I build a custom tool? ​

How do I run a proper tool pilot? ​

Key Takeaways ​

Frequently Asked Questions ​

Should I pick the tool or define the data first? ​

Are open-source tools always the right choice for DH? ​

How do I avoid vendor and format lock-in? ​

What is the most underrated factor in tool choice? ​

Should I build a custom tool or use an existing one? ​

How do I evaluate a tool before committing? ​

Related reading ​