Search:
Home  About  Submit Site    
  
 
The WARC (Web ARChive) file format is a successor to the ARC format. Specifies a method for combining multiple digital resources into an aggregate archival file together with related information.
Sites [ Submit ]
Web Data Commons - The project extracts structured data from the Common Crawl and provides it for public download. Common Crawl data set - Description of the data set. Github: example-warc-java - Java and Clojure examples for processing Common Crawl WARC files. Github: webarchive-commons - Common web archive utility code. WARC, Web ARChive file format - Format description, ISO 28500:2009. Used by archival institutions to store content harvested by web crawls, for example via use of the Heritrix harvesting tool. Wget with WARC output - About the development version of Wget which is capable to save WARC files. The WARC File Format (ISO 28500) - Information, maintenance, drafts, hosted by the Bibliothèque nationale de France. Internetarchive/warc - Python library for reading and writing warc files and warc headers. WARC File Format Specifications - Collection of a number of drafts prepared as the WARC format has developed. Example ARC and WARC files - Short examples of the ARC and WARC files that are generated by the Internet Archive's crawlers. Web Archive Transformation (WAT) Specification, Utilities, and Usage Overview - Utilities to extract metadata from WARC files and create data analysis reports. Terminology, using WAT and Pig for data analysis. The WARC Ecosystem - Wiki with resources about the WARC format and the tools that support it. International Internet Preservation Consortium: Tools and Software - Perspectives of setting up a Web archiving chain, contains tools recommended and used by members of the IIPC. WSDK - A lightweight Erlang library to write Web Archiving software. Overview, requirements, quick start, tutorial, support services, bugs reports, license and third party libraries. WARC Implementation Guidelines v.1 - To gather advice and best practice to help institutions designing and creating WARC files for collection management, access, preservation, and interoperability with collections from different institutions. Github: pylibwarc - A Python library for dealing with Web ARChive (WARC) files. Digital Preservation Coalition: Web-Archiving - Report intended for those with an interest in, or responsibility for, setting up a web archive, particularly new practitioners or senior managers wishing to develop a holistic understanding of the issues and options available.
Click [ Submit ] above to Add a New Site, Update a Site, or Remove a Site from this Category.
This directory is made available through a Creative Commons Attribution license from the DMOZ Organization.

© 2025 - Midnight Design Productions, LLC