GitPlanet
Projects
Users
Categories
Languages
About
All Git Users
→ helgeho
2 open source projects by helgeho
[ Open user page on Github ]
1.
Archivespark
An Apache Spark framework for easy data processing, extraction as well as derivation for web archives and archival collections, developed at Internet Archive.
✭ 111
scala
spark
web-archiving
2.
Web2warc
An easy-to-use and highly customizable crawler that enables you to create your own little Web archives (WARC/CDX)
✭ 18
scala
1-2
of
2
user projects