- Preservation Home
- About
- Collections Care
- Conservation
- Digital Preservation
- Emergency Management
- En Español
- FAQ
- Preservation Science
- Resources
- Outreach & Training Opportunities
- Have a preservation question?
Ask-a-Librarian
Related Links
X. Web Archives
This format specification covers the Library’s preferred format for archived web content or web archives. The Library is aware that websites, including blogs, social media and other web content that make up websites, are presented and created in formats for viewing in a web browser, and are often different than the standard format that is recommended for preservation and long-term access. Given that the focus of this document is preservation and long-term access, the following format preferences favor those outcomes. For information on best practices to better enable preservation of web content, please visit the Library of Congress Web Archiving Team’s recommendations on creating preservable websites.
i. Websites | ||
---|---|---|
Preferred | Acceptable | |
A. Formats | The Library, and other organizations involved in web archiving, are preserving web content in the Web Archive (WARC) format using record-at-a-time GZIP compression, as described in Appendix A of the WARC Standard. |
|
B. Delivery Method | Capture using tools that produce non-proprietary output, to conform with standard formats and requirements |
Transmission of WARC or ARC_IA files created by web content producers or other archiving organizations |
C. Metadata |
|
The ARC_IA should be named in a manner that easily identifies the archiving institution (see WARC standard for recommended naming conventions) |
D. Technological Measures | Tools currently available cannot capture all web content, so certain types of web content may not be preservable through web capture at this time. These include:
|
|
E. Referencing | Web materials in any web archive can be referred to persistently using the URN Namespace Registration for Persistent Web IDentifiers (PWID). |