FerretDeclarative web scraping
Text-AnalysisExplaining textual analysis tools in Python. Including Preprocessing, Skip Gram (word2vec), and Topic Modelling.
thal译文:Puppeteer 与 Chrome Headless —— 从入门到爬虫
metafetchNodeJS package that fetches a given URL's title, description, images, links etc.
scrapisma work-in-progress guide to web scraping as an artistic and critical practice
document-dlCommand line program to download documents from web portals.
OLX Scraper📻 An OLX Scraper using Scrapy + MongoDB. It Scrapes recent ads posted regarding requested product and dumps to NOSQL MONGODB.
torchestratorSpin up Tor containers and then proxy HTTP requests via these Tor instances
scavengerScrape and take screenshots of dynamic and static webpages
ebayMarketAnalyzerScrape all eBay sold listings to determine average/median pricing, plot listings over time with trend lines, and extract to excel
Instagram-to-discordMonitor instagram user account and automatically post new images to discord channel via a webhook. Working 2022!
youtube-audioextract videos from youtube in audio format using webscraping techniques 🎶
LeetCodeAt present contains scraped data from around 1500 problems present on the site. More to follow....
scrapmanRetrieve real (with Javascript executed) HTML code from an URL, ultra fast and supports multiple parallel loading of webs
big-data-upfRECSM-UPF Summer School: Social Media and Big Data Research
gochanges**[ARCHIVED]** website changes tracker 🔍
readability-cliA CLI for Mozilla Readability. Get clean, uncluttered, ready-to-read HTML from any webpage!
pupflareA webpage proxy that request through Chromium (puppeteer) - can be used to bypass Cloudflare anti bot / anti ddos on any application (like curl)
TradeTheEventImplementation of "Trade the Event: Corporate Events Detection for News-Based Event-Driven Trading." In Findings of ACL2021
ryuanimeA free anime streaming , using the jkanime content by scraping the jkanime website.