What Is Format Sustainability?
Format sustainability refers to the ability of a digital file format to remain usable, readable, and interpretable over long periods of timeoften decades or centurieswithout requiring costly or complex migration. A sustainable format is open, welldocumented, widely supported, and not dependent on a single vendor or proprietary technology.
Why It Matters
Information is a core asset for individuals, businesses, governments, and cultural institutions. When a format becomes obsolete, the data it contains can become inaccessible, leading to:
- Loss of historical records and cultural heritage
- Increased costs for data conversion and reentry
- Legal and compliance risks (e.g., inability to produce archived documents)
- Reduced trust in digital systems
Choosing sustainable formats at the outset reduces these risks and protects the value of the data throughout its lifecycle.
Key Principles of Sustainable Formats
- Open Standards Formats that are publicly documented and free of licensing restrictions (e.g., PDF/A, CSV, TIFF).
- SelfDescribing The file contains metadata that explains its structure, encoding, and semantics.
- Platform Independence No reliance on a specific operating system, hardware, or software version.
- Stability and Longevity Minimal changes to the specification over time, and a history of backward compatibility.
- Interoperability Ability to be opened by multiple applications, ensuring that future tools can still read the data.
- HumanReadability When possible, the format can be inspected or partially interpreted with simple tools (e.g., plaintext XML).
Practical Strategies for Achieving Format Sustainability
1. Conduct a Format Audit
Identify all file types in use, assess their sustainability score, and prioritize those that pose the greatest risk. Tools such as the PRONOM database or the Library of Congress's format registry can help.
2. Adopt Trusted Formats for New Content
When creating or acquiring new data, choose from proven sustainable formats:
- Text:
UTF8 plain text,CSV,JSON - Images:
TIFF(uncompressed),PNG - Audio:
WAV,FLAC - Video:
MKVwithH.264orAV1codecs - Documents:
PDF/A-2,OpenDocument (ODF) - Geospatial:
GeoTIFF,GML
3. Preserve Original Files
Whenever possible, retain the original source files alongside any derived or converted versions. This provides a fallback if a later conversion proves lossy.
4. Document the Context
Include descriptive metadata (who created the file, when, purpose, software used) either in the files native metadata sections or in an accompanying METS or Dublin Core record.
5. Plan for Periodic Review
Technology evolves. Schedule regular (e.g., every 510 years) reviews of the formats in your holdings, and migrate only when a clear, vetted alternative is available.
6. Use Trusted Digital Repositories (TDRs)
Store files in systems that implement OAIS (Open Archival Information System) standards, provide fixity checks, and maintain audit trails.
7. Educate Stakeholders
Train creators, contributors, and IT staff about the importance of format choice and the organizations policies.
Case Studies Highlighting Successes and Pitfalls
National Archives Migration from Proprietary Word Processors
In the early 2000s, the U.S. National Archives discovered a backlog of documents stored in a nowdefunct wordprocessing format. Rather than attempting direct conversion, they first created highresolution PDF/A images of each page, preserving visual fidelity, before extracting text into plaintext files. The dualfile approach allowed immediate access while safeguarding the content for future processing.
University Library Choosing Open Image Formats
A university art library migrated its digitized collections from JPEG2000 to uncompressed TIFF with embedded IPTC metadata. The switch eliminated concerns about JPEG2000s licensing and ensured that future imaging software could render the files without loss.
Corporation The Cost of Ignoring Sustainability
A midsize engineering firm stored 15years of project data in a proprietary CAD format that ceased to be supported after a vendor acquisition. The resulting migration effort required hiring external consultants and caused delays, costing the company over $200k. The case underscored the financial risk of proprietary lockin.
Further Reading & Tools
- Library of Congress Format Definitions
- PRONOM Technical Registry
- OCLC Digital Preservation Resources
- RFC 7208 Preservation Metadata Implementation Strategies
- US National Digital Stewardship Alliance
Choosing a sustainable format is not a onetime decision; it is a continuous stewardship practice that protects the future relevance of todays digital assets.
