internetarchive / Warc
Licence: gpl-2.0
Python library for reading and writing warc files
Stars: ✭ 209
Programming Languages
python
139335 projects - #7 most used programming language
warc: Python library to work with WARC files
.. image:: https://secure.travis-ci.org/anandology/warc.png?branch=master :alt: build status :target: http://travis-ci.org/anandology/warc
WARC (Web ARChive) is a file format for storing web crawls.
This warc
library makes it very easy to work with WARC files.::
import warc
f = warc.open("test.warc")
for record in f:
print record['WARC-Target-URI'], record['Content-Length']
Documentation
The documentation of the warc library is available at http://warc.readthedocs.org/.
License
This software is licensed under GPL v2. See LICENSE_ file for details.
.. LICENSE: http://github.com/internetarchive/warc/blob/master/LICENSE
Note that the project description data, including the texts, logos, images, and/or trademarks,
for each open source project belongs to its rightful owner.
If you wish to add or remove any projects, please contact us at [email protected].