All Git Users → helgeho

2 open source projects by helgeho

1. Archivespark
An Apache Spark framework for easy data processing, extraction as well as derivation for web archives and archival collections, developed at Internet Archive.
2. Web2warc
An easy-to-use and highly customizable crawler that enables you to create your own little Web archives (WARC/CDX)
✭ 18
scala
1-2 of 2 user projects