Appearance
To manage a digital humanities project on GitHub, create one repository per project, structure it with a clear README and licence, use branches and pull requests for substantive changes, track tasks as Issues, and tag versioned Releases connected to Zenodo for citable archiving. GitHub turns a scattered set of files and email threads into a single, transparent, collaborative workspace with a full audit trail.
This guide walks the lifecycle of a typical edition or dataset project from first commit to citable release, with the choices that matter for archivists and historians.
How should I structure the repository itself?
Start every repository with a small set of orientation files so a newcomer can understand it in two minutes:
my-project/
├── README.md # what this is, how to use it
├── LICENSE # CC-BY for data, MIT for code, etc.
├── CITATION.cff # how to cite the project
├── data/ # cleaned, derived datasets
├── transcriptions/ # TEI or plain text
├── scripts/ # analysis and processing code
└── docs/ # methodology, decisions, changelogThe README is the front door. It should state the research question, the data sources, the licence, and one command to reproduce the headline result. A reader who skims only the README should grasp the whole project.
When should I use branches and pull requests?
Use the main branch as the always-working version of the project. Do substantive work on a short-lived branch, then open a pull request to merge it:
bash
git switch -c clean-1841-census
# ... edit, commit ...
git push -u origin clean-1841-censusOn GitHub, open a pull request from that branch. The pull request shows a line-by-line diff, lets a collaborator comment, and records the decision. Even solo, this gives you a clean checkpoint and a place to write why a change was made. For trivial typo fixes, committing straight to main is fine; reserve branches for anything a reviewer might question.
How do I coordinate a team without endless email?
GitHub Issues replace status emails. Open an issue per task — "Verify place-names in chapter 4", "OCR is dropping marginalia" — assign it, label it, and reference it from commits. A simple labelling scheme keeps the board readable:
| Label | Use for |
|---|---|
transcription | folio-level editing tasks |
data | cleaning, normalisation, schema |
bug | something is broken |
discussion | a decision that needs agreement |
good-first-issue | suited to a new contributor or student |
A Projects board groups these issues into To Do / In Progress / Done, which is enough planning for most teams without heavyweight project software.
How do I make a project release citable?
When you reach a stable point, cut a Release. Tag it semantically and write release notes:
bash
git tag -a v1.0.0 -m "First public edition: chapters 1-6 verified"
git push origin v1.0.0Connect the repository to Zenodo once; thereafter each GitHub Release is automatically deposited and assigned a DOI. Add a CITATION.cff so GitHub renders a "Cite this repository" button. Together these turn an informal codebase into a formally archived, attributable scholarly output.
How do I license the work correctly?
DH projects usually mix things with different rights. A common, defensible split is: code under a permissive software licence such as MIT, your authored data and transcriptions under CC-BY 4.0, and a clear note that the underlying primary sources may carry separate rights you do not control. State this explicitly in the README so reusers know exactly what they may do.
What keeps a GitHub project sustainable?
The repository should survive your involvement. Document the build and data pipeline in docs/, pin dependency versions, and keep the README's "how to reproduce" section honest by running it on a clean machine occasionally. Transfer ownership to an institutional organisation account rather than a personal one so the project does not vanish when you change jobs.
Key Takeaways
- One repository per project, fronted by a README that explains purpose, data, licence and reproduction.
- Use
mainas the working version; branch and open pull requests for substantive changes. - Replace status emails with Issues, labels and a Projects board.
- Cut semantic Releases and connect Zenodo so each one mints a citable DOI.
- License code and data separately and flag third-party source rights explicitly.
- Move the repository to an institutional organisation account for long-term survival.
Frequently Asked Questions
Are private repositories free on GitHub?
Yes. GitHub gives unlimited free private and public repositories with unlimited collaborators on the standard free plan, which is enough for most DH teams.
Should my project repository be public from day one?
Not necessarily. Many teams work in a private repository while transcriptions are unverified, then flip it to public on first release. Make it public early if openness is a grant condition.
What is a pull request and do small teams need them?
A pull request proposes merging one branch into another and gives a space to review changes. Even two-person teams benefit because it creates a checkpoint and a discussion record.
How do I give my project a citable identity?
Add a CITATION.cff file and connect the repository to Zenodo so each release mints a DOI. That turns a release into a formally citable archived object.
Can non-coders contribute on GitHub?
Yes. Issues, the web editor and pull requests let editors, translators and reviewers contribute text and feedback without ever using the command line.