Skip to content
How to Extract text from PDFs in Python

A step-by-step guide on how to extract text from PDFs in Python, with practical defaults, settings and the pitfalls to avoid so you reach a usable result on

When to Plan DH project sustainability

When to plan DH project sustainability and when not to: the trade-offs, costs and signals that tell you whether this approach fits your sources and project

How to Evaluate NLP on historical text

A step-by-step guide on how to evaluate NLP on historical text, with practical defaults, settings and the pitfalls to avoid so you reach a usable result.

How to Clean messy data in R

A step-by-step guide on how to clean messy data in R, with practical defaults, settings and the pitfalls to avoid so you reach a usable result on your own

Beginner's Guide to ImageJ for spectral analysis

A gentle beginner's guide to use ImageJ for spectral analysis, explaining the core ideas in plain language with a small worked example you can follow from a stack.

Best Practices to Choose an edition publishing framework

Best practices and a working checklist to choose an edition publishing framework, so your results stay consistent, documented and defensible across a whole

Troubleshooting: Encode finding aids in EAD

Troubleshooting common problems when you encode finding aids in EAD: diagnose the usual errors, find the root cause fast and apply fixes that actually hold on

Troubleshooting: Use spatial joins for historical data

Troubleshooting common problems when you use spatial joins for historical data: diagnose the usual errors, find the root cause fast and apply fixes that

Troubleshooting: Run a rights clearance workflow

Troubleshooting common problems when you run a rights clearance workflow: diagnose the usual errors, find the root cause fast and apply fixes that actually

How to Choose Omeka vs CollectiveAccess

A step-by-step guide on how to choose Omeka vs CollectiveAccess, with practical defaults, settings and the pitfalls to avoid so you reach a usable result on