HumanoidNode.js package to bypass CloudFlare's anti-bot JavaScript challenges
Stars: ✭ 88 (+300%)
Mutual labels: scraping, web-scraping
PythonScrapyBasicSetupBasic setup with random user agents and IP addresses for Python Scrapy Framework.
Stars: ✭ 57 (+159.09%)
Mutual labels: scraping, web-scraping
SqrapeSimple Query Scraping with CSS and Go Reflection (MOVED to Gitlab)
Stars: ✭ 144 (+554.55%)
Mutual labels: scraping, web-scraping
AutoscraperA Smart, Automatic, Fast and Lightweight Web Scraper for Python
Stars: ✭ 4,077 (+18431.82%)
Mutual labels: scraping, web-scraping
papercutPapercut is a scraping/crawling library for Node.js built on top of JSDOM. It provides basic selector features together with features like Page Caching and Geosearch.
Stars: ✭ 15 (-31.82%)
Mutual labels: scraping, web-scraping
ScrappleA framework for creating semi-automatic web content extractors
Stars: ✭ 464 (+2009.09%)
Mutual labels: scraping, web-scraping
Scrape Linkedin Selenium`scrape_linkedin` is a python package that allows you to scrape personal LinkedIn profiles & company pages - turning the data into structured json.
Stars: ✭ 239 (+986.36%)
Mutual labels: scraping, web-scraping
PhpscraperPHP Scraper - an highly opinionated web-interface for PHP
Stars: ✭ 148 (+572.73%)
Mutual labels: scraping, web-scraping
selectorlibA library to read a YML file with Xpath or CSS Selectors and extract data from HTML pages using them
Stars: ✭ 53 (+140.91%)
Mutual labels: scraping, web-scraping
iowebWeb Scraping Framework
Stars: ✭ 31 (+40.91%)
Mutual labels: scraping, web-scraping
Gopa[WIP] GOPA, a spider written in Golang, for Elasticsearch. DEMO: http://index.elasticsearch.cn
Stars: ✭ 277 (+1159.09%)
Mutual labels: scraping, web-scraping
torchestratorSpin up Tor containers and then proxy HTTP requests via these Tor instances
Stars: ✭ 32 (+45.45%)
Mutual labels: scraping, data-scraping
Apify JsApify SDK — The scalable web scraping and crawling library for JavaScript/Node.js. Enables development of data extraction and web automation jobs (not only) with headless Chrome and Puppeteer.
Stars: ✭ 3,154 (+14236.36%)
Mutual labels: scraping, web-scraping
Detect CmsPHP Library for detecting CMS
Stars: ✭ 78 (+254.55%)
Mutual labels: scraping, web-scraping
trafilaturaPython & command-line tool to gather text on the Web: web crawling/scraping, extraction of text, metadata, comments
Stars: ✭ 711 (+3131.82%)
Mutual labels: scraping, web-scraping
browser-poolA Node.js library to easily manage and rotate a pool of web browsers, using any of the popular browser automation libraries like Puppeteer, Playwright, or SecretAgent.
Stars: ✭ 71 (+222.73%)
Mutual labels: scraping, web-scraping
top-github-scraperScape top GitHub repositories and users based on keywords
Stars: ✭ 40 (+81.82%)
Mutual labels: scraping, web-scraping
whatsapp-trackingScraping the status of WhatsApp contacts
Stars: ✭ 49 (+122.73%)
Mutual labels: scraping
linkextractorA Docker tutorial using a link extraction application example
Stars: ✭ 41 (+86.36%)
Mutual labels: web-scraping