warc📇 Tools to Work with the Web Archive Ecosystem in R
Stars: ✭ 21 (-69.57%)
wget-luaWget-AT is a modern Wget with Lua hooks, Zstandard (+dictionary) WARC compression and URL-agnostic deduplication.
Stars: ✭ 52 (-24.64%)
wail🐋 One-Click User Instigated Preservation
Stars: ✭ 107 (+55.07%)
Archivebox🗃 Open source self-hosted web archiving. Takes URLs/browser history/bookmarks/Pocket/Pinboard/etc., saves HTML, JS, PDFs, media, and more...
Stars: ✭ 12,383 (+17846.38%)
awesome-mementoA list of things related to software, literature, and other content for 🕣 Memento
Stars: ✭ 62 (-10.14%)
munin-indexerA social media open post web archiving tool
Stars: ✭ 16 (-76.81%)
CollectA server to collect & archive websites that also supports video downloads
Stars: ✭ 62 (-10.14%)
MemGatorA Memento Aggregator CLI and Server in Go
Stars: ✭ 42 (-39.13%)
CommonCrawlDocumentDownloadA small tool which uses the CommonCrawl URL Index to download documents with certain file types or mime-types. This is used for mass-testing of frameworks like Apache POI and Apache Tika
Stars: ✭ 43 (-37.68%)
domcurlcUrl-like utility for fetching a resource (in this case we will run JS and return after network is idle) - great for JS heavy apps
Stars: ✭ 84 (+21.74%)
warrickRecover lost websites from the Web Infrastructure
Stars: ✭ 76 (+10.14%)
warcworkerA dockerized, queued high fidelity web archiver based on Squidwarc
Stars: ✭ 48 (-30.43%)
vandalNavigator for Web Archive
Stars: ✭ 146 (+111.59%)
MementoEmbedA service that provides archive-aware oEmbed-compatible embeddable surrogates (social cards, thumbnails, etc.) for archived web pages (mementos).
Stars: ✭ 13 (-81.16%)
Heritrix3Heritrix is the Internet Archive's open-source, extensible, web-scale, archival-quality web crawler project.
Stars: ✭ 2,104 (+2949.28%)
warc⚙️ A Rust library for reading and writing WARC files
Stars: ✭ 26 (-62.32%)
ArchivenowA Tool To Push Web Resources Into Web Archives
Stars: ✭ 253 (+266.67%)
ArchiverorArchiveror will help you preserve the webpages you love. 💾
Stars: ✭ 246 (+256.52%)
Wail🐋 Web Archiving Integration Layer: One-Click User Instigated Preservation
Stars: ✭ 232 (+236.23%)
WarcioStreaming WARC/ARC library for fast web archive IO
Stars: ✭ 195 (+182.61%)
WarcreateChrome extension to "Create WARC files from any webpage"
Stars: ✭ 143 (+107.25%)
Sfm UiSocial Feed Manager user interface application.
Stars: ✭ 129 (+86.96%)
ArchivesparkAn Apache Spark framework for easy data processing, extraction as well as derivation for web archives and archival collections, developed at Internet Archive.
Stars: ✭ 111 (+60.87%)
Replayweb.pageServerless Web Archive Replay directly in the browser
Stars: ✭ 84 (+21.74%)
ConiferCollect and revisit web pages.
Stars: ✭ 1,259 (+1724.64%)
Archiveweb.pageA High-Fidelity Web Archiving Extension for Chrome and Chromium based browsers!
Stars: ✭ 69 (+0%)
PywbCore Python Web Archiving Toolkit for replay and recording of web archives
Stars: ✭ 798 (+1056.52%)
Webrecorder PlayerWebrecorder Player for Desktop (OSX/Windows/Linux). (Built with Electron + Webrecorder)
Stars: ✭ 368 (+433.33%)
IpwbInterPlanetary Wayback: A distributed and persistent archive replay system using IPFS
Stars: ✭ 350 (+407.25%)
PermaIndelible links
Stars: ✭ 272 (+294.2%)
dappeteer🏌🏼E2E testing for dApps using Puppeteer + MetaMask
Stars: ✭ 138 (+100%)
jsevalEvaluate JavaScript on a URL through headless Chrome browser.
Stars: ✭ 19 (-72.46%)