Appearance
To preserve a spreadsheet for the long term, save the underlying data as UTF-8 CSV (one file per sheet) for guaranteed future readability, and keep the original workbook — ideally also exported to open OpenDocument .ods — so formulas, formatting and structure survive too. The core idea is simple: separate the data (which must stay readable for decades) from the presentation and logic (which depend on software). You preserve both, in formats that do not lock you into one vendor.
Why are spreadsheets hard to preserve?
A spreadsheet is not just a table — it is values, formulas, formatting, charts, comments, multiple sheets and hidden data types all bundled in a proprietary container like .xlsx. When that software or format becomes hard to open, everything bundled inside is at risk together. Preservation works by pulling the durable part (the data) into an open, plain-text format, while keeping a richer copy for the parts plain text cannot hold.
How do I export spreadsheet data to CSV correctly?
CSV (comma-separated values) is plain text that almost anything can read in fifty years. Export with explicit UTF-8 encoding. From LibreOffice on the command line:
bash
# convert every sheet of a workbook to UTF-8 CSV
libreoffice --headless --convert-to csv:"Text - txt - csv (StarCalc)":44,34,76 \
--outdir ./preserved data.xlsxThe trailing 76 requests UTF-8. If you use Python instead, be explicit about encoding so accented names and symbols survive:
python
import pandas as pd
xls = pd.ExcelFile("data.xlsx")
for sheet in xls.sheet_names:
df = pd.read_excel(xls, sheet)
df.to_csv(f"preserved/{sheet}.csv", index=False, encoding="utf-8")What do I lose in CSV, and how do I keep it?
CSV stores only values, so several things disappear. Know what they are and preserve them deliberately.
| Lost in CSV | How to preserve it |
|---|---|
| Formulas | Keep the original .xlsx/.ods |
| Multiple sheets | One CSV per sheet + a README mapping |
| Formatting / charts | Keep the structural copy; export charts as images |
| Data types (text vs number) | Document in a data dictionary |
| Comments / notes | Extract to a companion text file |
A leading-zero ID like 007 is the classic trap: CSV cannot tell whether it is text or a number, so record that in your documentation.
What should accompany the files?
Files alone are not preservation. Add a short plain-text README beside them describing what the data is, how the CSVs relate to each other and to the original, the encoding, and any data-type quirks. A minimal example:
text
README.txt
- source: parish_baptisms.xlsx (original retained)
- sheets exported: baptisms.csv, witnesses.csv (one CSV per sheet)
- encoding: UTF-8
- note: "ref" column is TEXT (preserves leading zeros)
- dates stored as ISO 8601 (YYYY-MM-DD)This is the difference between a folder of files and an understandable dataset.
What is a sensible beginner workflow end to end?
Follow five plain steps:
- Tidy the spreadsheet: one table per sheet, headers in the first row, no merged cells, no colour-only meaning.
- Export each sheet to UTF-8 CSV.
- Keep the original, and export an
.odsstructural copy for formulas and formatting. - Document everything in a README and a small data dictionary.
- Store the bundle with a checksum and at least one backup copy.
Do these and the data will outlive whatever spreadsheet program is fashionable in 2050.
Key Takeaways
- Separate durable data (CSV) from software-dependent logic and presentation (the original/ODS).
- Always export CSV as UTF-8 and say so in your documentation.
- Keep the original
.xlsx, and prefer an open.odsstructural copy over a proprietary one. - Export one CSV per sheet and explain the relationships in a README.
- Record data-type quirks (leading zeros, dates) in a small data dictionary.
- Store the bundle with a checksum and a backup; files without documentation are not preserved.
Frequently Asked Questions
Why is CSV recommended for preserving spreadsheet data?
CSV is plain text, open, and readable by almost any tool, so the underlying data stays accessible for decades without proprietary software. Its weakness is that it stores only values, not formulas, formatting, multiple sheets or data types, which you preserve separately.
What do I lose when I export a spreadsheet to CSV?
You lose formulas (only their last computed values remain), cell formatting, charts, multiple worksheets, comments, and data-type information such as whether 007 is text or a number. Document these or keep a companion copy to retain them.
Should I keep the original .xlsx file too?
Yes. Keep the original spreadsheet alongside the CSV. The original preserves formulas, structure and formatting, while the CSV guarantees the raw values stay readable even if spreadsheet software changes.
How do I preserve a workbook with several sheets?
Export one CSV per sheet with a clear naming convention, and keep the original workbook as the structural copy. Record in a README how the CSV files relate to each other and to the original sheets.
What character encoding should spreadsheet CSVs use?
Use UTF-8 and state it explicitly in your documentation. UTF-8 preserves accented names, currency symbols and non-Latin scripts; mismatched encodings are the most common cause of corrupted text in archived CSVs.
Is OpenDocument (.ods) a good preservation format?
ODS is an open, standardised format that retains formulas, formatting and multiple sheets, making it a better structural preservation copy than .xlsx. A common strategy pairs an ODS structural copy with per-sheet CSV value copies.