All Projects → Mixnode → mixnode-warcreader-php

Mixnode / mixnode-warcreader-php

Licence: other
Read Web ARChive (WARC) files in PHP.

Programming Languages

PHP
23972 projects - #3 most used programming language

Projects that are alternatives of or similar to mixnode-warcreader-php

chatnoir-resiliparse
A robust web archive analytics toolkit
Stars: ✭ 26 (+30%)
Mutual labels:  warc, webarchive
node-warc
Parse And Create Web ARChive (WARC) files with node.js
Stars: ✭ 69 (+245%)
Mutual labels:  warc, webarchive
wget-lua
Wget-AT is a modern Wget with Lua hooks, Zstandard (+dictionary) WARC compression and URL-agnostic deduplication.
Stars: ✭ 52 (+160%)
Mutual labels:  warc
wail
🐋 One-Click User Instigated Preservation
Stars: ✭ 107 (+435%)
Mutual labels:  warc
CommonCrawlDocumentDownload
A small tool which uses the CommonCrawl URL Index to download documents with certain file types or mime-types. This is used for mass-testing of frameworks like Apache POI and Apache Tika
Stars: ✭ 43 (+115%)
Mutual labels:  warc
vandal
Navigator for Web Archive
Stars: ✭ 146 (+630%)
Mutual labels:  webarchive
Heritrix3
Heritrix is the Internet Archive's open-source, extensible, web-scale, archival-quality web crawler project.
Stars: ✭ 2,104 (+10420%)
Mutual labels:  warc
Archivebox
🗃 Open source self-hosted web archiving. Takes URLs/browser history/bookmarks/Pocket/Pinboard/etc., saves HTML, JS, PDFs, media, and more...
Stars: ✭ 12,383 (+61815%)
Mutual labels:  warc
warc
⚙️ A Rust library for reading and writing WARC files
Stars: ✭ 26 (+30%)
Mutual labels:  warc
warc
📇 Tools to Work with the Web Archive Ecosystem in R
Stars: ✭ 21 (+5%)
Mutual labels:  warc

Mixnode WARC Reader for PHP

This library allows developers to read Web ARChive (WARC) files in PHP.

Installation Guide

We recommend Composer for installing this package:

curl -sS https://getcomposer.org/installer | php

Once done, run the Composer command to install Mixnode WARC Reader for PHP:

php composer.phar require mixnode/mixnode-warcreader-php

After installing, you need to require Composer's autoloader in your code:

require 'vendor/autoload.php';

You can then later update Mixnode WARC Reader using composer:

composer.phar update

A Simple Example

<?php
require 'vendor/autoload.php';

// Initialize a WarcReader object 
// The WarcReader constructure accepts paths to both raw WARC files and GZipped WARC files
$warc_reader = new Mixnode\WarcReader("test.warc.gz");

// Using nextRecord, iterate through the WARC file and output each record.
while(($record = $warc_reader->nextRecord()) != FALSE){
	// A WARC record is broken into two parts: header and content.
	// header contains metadata about content, while content is the actual resource captured.
	print_r($record['header']);
	print_r($record['content']);
	echo "------------------------------------\n";
}
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].