Appearance
When you choose a data repository, the right answer is usually: your institutional or a domain-specific repository first, Zenodo or Figshare as a general fallback, and a trusted disciplinary archive (like the UK Data Service or ADS for archaeology) when the content fits. The common failures are picking on convenience alone, hitting size limits, depositing thin metadata that nobody can find, and confusing a repository with a backup. This guide diagnoses each symptom and gives the fix.
How do I decide between the main options?
Match the repository to the data, not the other way round. Here is a fast triage.
| If your data is... | Best fit | Why |
|---|---|---|
| Archaeology / heritage | ADS, tDAR | Domain curation, mandated by many funders |
| Social-science / survey | UK Data Service, ICPSR | Access controls, disclosure review |
| General humanities, small | Zenodo | Free, fast DOI, versioning |
| Tied to a GitHub release | Zenodo | Auto-deposit on release |
| Institutionally mandated | Your institutional repo | Compliance, local support |
| Very large or 3D | Domain repo or cloud + DOI | Size limits elsewhere |
The biggest mistake is defaulting to whichever tool a colleague mentioned. A domain repository usually offers curation and audience that a general one cannot.
Why isn't my dataset showing up in searches?
Symptom: you have a DOI but nobody finds the dataset. Root cause is almost always metadata, not the platform.
text
Diagnosis checklist:
[ ] Subject keywords filled in? -> empty keywords = invisible
[ ] Temporal coverage set? -> historians search by date
[ ] Spatial coverage set? -> historians search by place
[ ] Description > one sentence? -> harvesters need text
[ ] Related identifiers linked? -> ties it to your paperFix: enrich the metadata record, then wait. Aggregators like OpenAIRE and Google Dataset Search re-harvest on a cycle, so indexing can lag by days.
What do I do when my dataset is too large?
Symptom: the upload rejects your file. Zenodo's default per-record cap is 50 GB (extendable on request); Figshare and Dryad have their own thresholds.
Fixes, in order of preference:
- Request a quota increase from the repository (Zenodo grants these for legitimate research).
- Split logically into multiple linked records with a parent DOI, not arbitrary chunks.
- Move bulk binaries (raw TIFFs, 3D scans) to a domain or institutional store and deposit a metadata-only record with a DOI plus access instructions.
- Compress with lossless, open formats; never ship a proprietary archive as the only copy.
"Can I update it after publishing?" — versioning problems
Symptom: you spot an error after the DOI is minted. You cannot edit a published version, and that is by design — citations must be stable.
Fix: use a versioning repository.
text
Concept DOI 10.5281/zenodo.1234567 -> always resolves to latest
Version 1 10.5281/zenodo.1234568 -> frozen
Version 2 10.5281/zenodo.1234569 -> your correctionCite the concept DOI in your README when you want readers to follow updates; cite the specific version DOI in a paper for exact reproducibility.
My repository minted a DOI — am I backed up now?
No, and treating a repository as a backup is a real cause of data loss. A repository preserves a finished, published copy. During the project you still need your own 3-2-1 backups (three copies, two media, one off-site). Deposit is the last step, not your safety net.
How do I verify a repository is actually trustworthy?
Symptom: you are unsure the data will still be there in ten years. Check four signals:
- CoreTrustSeal certification or equivalent.
- A published preservation/retention policy with a stated commitment.
- Persistent identifiers (DOIs/handles), not just URLs.
- Listing in re3data.org, the registry of research data repositories.
A platform that mints DOIs but makes no retention promise is a warning sign, not a home for your archival master.
Quick fixes summary
- Not found in search -> enrich metadata, add temporal/spatial coverage, wait for harvest.
- Upload too big -> request quota, split logically, or metadata-only record.
- Found an error -> publish a new version under the concept DOI.
- Worried about longevity -> check CoreTrustSeal, re3data, retention policy.
- Lost a working file -> that is a backup failure, not a repository's job.
Key Takeaways
- Choose by content fit: domain repo first, institutional next, Zenodo/Figshare as fallback.
- Invisibility in search is a metadata problem, not a DOI problem.
- Size limits are solved by quota requests, logical splits, or metadata-only records.
- Use versioning and the concept DOI to correct published datasets.
- A repository is a preservation copy, not your working backup — keep 3-2-1.
- Verify trustworthiness via CoreTrustSeal, re3data and a stated retention policy.
Frequently Asked Questions
Zenodo or my institutional repository — which should I pick?
Use your institutional or a domain repository first if one fits, because it offers curation, longevity guarantees and disciplinary visibility. Choose Zenodo when you need a quick DOI, have no suitable institutional option, or want versioned releases tied to GitHub.
My dataset is 80 GB — which repositories accept it?
Zenodo's default per-record limit is 50 GB and can be raised on request, while Figshare and Dryad have their own tiers. For very large data, an institutional or domain repository or a cloud bucket with a DOI-minting metadata record is usually the better fix.
I got a DOI but the dataset is not appearing in searches — why?
Discovery depends on metadata, not the DOI itself. Thin metadata, missing subject keywords, or no temporal and spatial coverage stops your record surfacing. Enrich the metadata and allow time for harvesters to index it.
Can I update a dataset after it has a DOI?
You cannot change a published version, but DOI-versioning repositories like Zenodo let you publish a new version with its own DOI under a stable concept DOI. Cite the concept DOI when you want to always point to the latest.
Is a repository the same as a backup?
No. A trustworthy repository preserves a published, citable copy for the long term, but it is not your working backup. Keep your own 3-2-1 backups during the project and deposit the finished dataset separately.
How do I know a repository is trustworthy?
Look for the CoreTrustSeal certification, a clear preservation policy, persistent identifiers, and listing in re3data. A repository that mints DOIs but states no retention commitment is a red flag.