Search:
Go
English
Deutsch
Français
Japanese
Chinese Simplified
Chinese Traditional
Korean
Russian
Arabic
Česky
Greek
Italiano
Afrikaans
Aragonés
Armenian
Asturianu
Azerbaijani
Bahasa Indonesia
Bahasa Melayu
Bangla
Bashkir
Belarusian
Bosanski
Brezhoneg
Bulgarian
Català
Cymraeg
Dansk
Eesti
Español
Esperanto
Euskara
Frysk
Furlan
Føroyskt
Gaeilge
Gàidhlig
Galego
Gujarati
Hebrew
Hindi
Hrvatski
Interlingua
Íslenska
Kannada
Kaszëbsczi
Kazakh
Kiswahili
Kurdî
Kyrgyz
Latviski
Lëtzebuergesch
Lietuvių
Lingua Latina
Magyar
Makedonski
Marathi
Nederlands
Nordfriisk
Norsk
O'zbekcha
Occitan
Ossetian
Persian
Polski
Português
Punjabi Gurmukhi
Română
Rumantsch
Sardu
Seeltersk
Shqip
Sicilianu
Sinhala
Slovensko
Slovensky
Srpski
Suomi
Svenska
Tagalog
Taiwanese
Tamil
Tatarça
Telugu
Thai
Tiếng Việt
Türkçe
Türkmençe
Ukrainian
Urdu
Uyghurche
DMOZ Internet Directory
Presented by
DMOZLive.com
Home
About
Submit Site
Tweet
Home
Computers
Data Formats
Archive
WARC
42 Sites
The WARC (Web ARChive) file format is a successor to the ARC format. Specifies a method for combining multiple digital resources into an aggregate archival file together with related information.
Categories
Software
25 Sites
Sites
[ Submit ]
Web Data Commons
- The project extracts structured data from the Common Crawl and provides it for public download.
Common Crawl data set
- Description of the data set.
Github: example-warc-java
- Java and Clojure examples for processing Common Crawl WARC files.
Github: webarchive-commons
- Common web archive utility code.
WARC, Web ARChive file format
- Format description, ISO 28500:2009. Used by archival institutions to store content harvested by web crawls, for example via use of the Heritrix harvesting tool.
Wget with WARC output
- About the development version of Wget which is capable to save WARC files.
The WARC File Format (ISO 28500)
- Information, maintenance, drafts, hosted by the Bibliothèque nationale de France.
Internetarchive/warc
- Python library for reading and writing warc files and warc headers.
WARC File Format Specifications
- Collection of a number of drafts prepared as the WARC format has developed.
Example ARC and WARC files
- Short examples of the ARC and WARC files that are generated by the Internet Archive's crawlers.
Web Archive Transformation (WAT) Specification, Utilities, and Usage Overview
- Utilities to extract metadata from WARC files and create data analysis reports. Terminology, using WAT and Pig for data analysis.
The WARC Ecosystem
- Wiki with resources about the WARC format and the tools that support it.
International Internet Preservation Consortium: Tools and Software
- Perspectives of setting up a Web archiving chain, contains tools recommended and used by members of the IIPC.
WSDK
- A lightweight Erlang library to write Web Archiving software. Overview, requirements, quick start, tutorial, support services, bugs reports, license and third party libraries.
WARC Implementation Guidelines v.1
- To gather advice and best practice to help institutions designing and creating WARC files for collection management, access, preservation, and interoperability with collections from different institutions.
Github: pylibwarc
- A Python library for dealing with Web ARChive (WARC) files.
Digital Preservation Coalition: Web-Archiving
- Report intended for those with an interest in, or responsibility for, setting up a web archive, particularly new practitioners or senior managers wishing to develop a holistic understanding of the issues and options available.
Click
[ Submit ]
above to Add a New Site, Update a Site, or Remove a Site from this Category.
This directory is made available through a Creative Commons Attribution license from the
DMOZ Organization.