node-warcParse And Create Web ARChive (WARC) files with node.js
Stars: ✭ 69 (+228.57%)
greynoiseQuery 'GreyNoise Intelligence 'API' in R
Stars: ✭ 15 (-28.57%)
xattrs🗃 Work With Filesystem Object Extended Attributes — https://hrbrmstr.github.io/xattrs/index.html
Stars: ✭ 17 (-19.05%)
urlscan👀 Analyze Websites and Resources They Request
Stars: ✭ 21 (+0%)
wget-luaWget-AT is a modern Wget with Lua hooks, Zstandard (+dictionary) WARC compression and URL-agnostic deduplication.
Stars: ✭ 52 (+147.62%)
wail🐋 One-Click User Instigated Preservation
Stars: ✭ 107 (+409.52%)
htmlunit🕸🧰☕️Tools to Scrape Dynamic Web Content via the 'HtmlUnit' Java Library
Stars: ✭ 39 (+85.71%)
CommonCrawlDocumentDownloadA small tool which uses the CommonCrawl URL Index to download documents with certain file types or mime-types. This is used for mass-testing of frameworks like Apache POI and Apache Tika
Stars: ✭ 43 (+104.76%)
mhn🍯 Analyze and Visualize Data from Modern Honey Network Servers with R
Stars: ✭ 16 (-23.81%)
reapr🕸→ℹ️ Reap Information from Websites
Stars: ✭ 14 (-33.33%)
gdnsTools to work with the Google DNS over HTTPS API in R
Stars: ✭ 23 (+9.52%)
curlconverter➰ ➡️ ➖ Translate cURL command lines into parameters for use with httr or actual httr calls (R)
Stars: ✭ 86 (+309.52%)
pdfbox📄◻️ Create, Maniuplate and Extract Data from PDF Files (R Apache PDFBox wrapper)
Stars: ✭ 46 (+119.05%)
wayback⏪ Tools to Work with the Various Internet Archive Wayback Machine APIs
Stars: ✭ 52 (+147.62%)
Heritrix3Heritrix is the Internet Archive's open-source, extensible, web-scale, archival-quality web crawler project.
Stars: ✭ 2,104 (+9919.05%)
Archivebox🗃 Open source self-hosted web archiving. Takes URLs/browser history/bookmarks/Pocket/Pinboard/etc., saves HTML, JS, PDFs, media, and more...
Stars: ✭ 12,383 (+58866.67%)
warc⚙️ A Rust library for reading and writing WARC files
Stars: ✭ 26 (+23.81%)
shodan🌑 R package to work with the Shodan API
Stars: ✭ 16 (-23.81%)
webhose🔨 Tools to Work with the 'webhose.io' 'API' in R
Stars: ✭ 12 (-42.86%)
jericho📔 Extract plain or structured text from HTML content in R
Stars: ✭ 14 (-33.33%)