Gopa[WIP] GOPA, a spider written in Golang, for Elasticsearch. DEMO: http://index.elasticsearch.cn
Stars: ✭ 277 (+1159.09%)
HumanoidNode.js package to bypass CloudFlare's anti-bot JavaScript challenges
Stars: ✭ 88 (+300%)
Apify JsApify SDK — The scalable web scraping and crawling library for JavaScript/Node.js. Enables development of data extraction and web automation jobs (not only) with headless Chrome and Puppeteer.
Stars: ✭ 3,154 (+14236.36%)
trafilaturaPython & command-line tool to gather text on the Web: web crawling/scraping, extraction of text, metadata, comments
Stars: ✭ 711 (+3131.82%)
papercutPapercut is a scraping/crawling library for Node.js built on top of JSDOM. It provides basic selector features together with features like Page Caching and Geosearch.
Stars: ✭ 15 (-31.82%)
ScrappleA framework for creating semi-automatic web content extractors
Stars: ✭ 464 (+2009.09%)
browser-poolA Node.js library to easily manage and rotate a pool of web browsers, using any of the popular browser automation libraries like Puppeteer, Playwright, or SecretAgent.
Stars: ✭ 71 (+222.73%)
PythonScrapyBasicSetupBasic setup with random user agents and IP addresses for Python Scrapy Framework.
Stars: ✭ 57 (+159.09%)
SqrapeSimple Query Scraping with CSS and Go Reflection (MOVED to Gitlab)
Stars: ✭ 144 (+554.55%)
AutoscraperA Smart, Automatic, Fast and Lightweight Web Scraper for Python
Stars: ✭ 4,077 (+18431.82%)
top-github-scraperScape top GitHub repositories and users based on keywords
Stars: ✭ 40 (+81.82%)
iowebWeb Scraping Framework
Stars: ✭ 31 (+40.91%)
Scrape Linkedin Selenium`scrape_linkedin` is a python package that allows you to scrape personal LinkedIn profiles & company pages - turning the data into structured json.
Stars: ✭ 239 (+986.36%)
PhpscraperPHP Scraper - an highly opinionated web-interface for PHP
Stars: ✭ 148 (+572.73%)
Detect CmsPHP Library for detecting CMS
Stars: ✭ 78 (+254.55%)
selectorlibA library to read a YML file with Xpath or CSS Selectors and extract data from HTML pages using them
Stars: ✭ 53 (+140.91%)
torchestratorSpin up Tor containers and then proxy HTTP requests via these Tor instances
Stars: ✭ 32 (+45.45%)
web-clipperEasily download the main content of a web page in html, markdown, and/or epub format from command line.
Stars: ✭ 15 (-31.82%)
linkextractorA Docker tutorial using a link extraction application example
Stars: ✭ 41 (+86.36%)
GSoC-Data-AnalyserSimple search for organisations participating/participated in the GSoC
Stars: ✭ 29 (+31.82%)
actor-scraperHouse of Apify Scrapers. Generic scraping actors with a simple UI to handle complex web crawling and scraping use cases.
Stars: ✭ 83 (+277.27%)
ZeiverA Scraper, Downloader, & Recorder for static open directories.
Stars: ✭ 14 (-36.36%)
humanparserParse a human name string into salutation, first name, middle name, last name, suffix.
Stars: ✭ 78 (+254.55%)
heroshiHeroshi – open source web crawler.
Stars: ✭ 51 (+131.82%)
subscene scraperLibrary to download subtitles from subscene.com
Stars: ✭ 14 (-36.36%)
dustArchive web pages with all relevant assets or save as a single file HTML
Stars: ✭ 19 (-13.64%)
naos📉 Uptime and error monitoring CLI
Stars: ✭ 30 (+36.36%)
kuwalaKuwala is the no-code data platform for BI analysts and engineers enabling you to build powerful analytics workflows. We are set out to bring state-of-the-art data engineering tools you love, such as Airbyte, dbt, or Great Expectations together in one intuitive interface built with React Flow. In addition we provide third-party data into data sc…
Stars: ✭ 474 (+2054.55%)
AngleParseHTML parsing and processing tool for PowerShell.
Stars: ✭ 35 (+59.09%)
sp-subway-scraper🚆This web scraper builds a dataset for São Paulo subway operation status
Stars: ✭ 24 (+9.09%)
restaurant-finder-featureReviewsBuild a Flask web application to help users retrieve key restaurant information and feature-based reviews (generated by applying market-basket model – Apriori algorithm and NLP on user reviews).
Stars: ✭ 21 (-4.55%)
webdextIntelligent Web Data Extractor
Stars: ✭ 75 (+240.91%)
scrapy-zyte-smartproxyZyte Smart Proxy Manager (formerly Crawlera) middleware for Scrapy
Stars: ✭ 317 (+1340.91%)
tableau-scrapingTableau scraper python library. R and Python scripts to scrape data from Tableau viz
Stars: ✭ 91 (+313.64%)
TorScrapperA Scraper made 100% in Python using BeautifulSoup and Tor. It can be used to scrape both normal and onion links. Happy Scraping :)
Stars: ✭ 24 (+9.09%)
pompScreen scraping and web crawling framework
Stars: ✭ 61 (+177.27%)
Captcha-ToolsAll-in-one Python (And now Go!) module to help solve captchas with Capmonster, 2captcha and Anticaptcha API's!
Stars: ✭ 23 (+4.55%)
jsevalEvaluate JavaScript on a URL through headless Chrome browser.
Stars: ✭ 19 (-13.64%)
scrapy facebookerCollection of scrapy spiders which can scrape posts, images, and so on from public Facebook Pages.
Stars: ✭ 22 (+0%)
ferendaTransform unstructured document collections to structured Linked Data
Stars: ✭ 22 (+0%)
proxiProxy pool. Finds and checks proxies with rest api for querying results. Can find over 25k proxies in under 5 minutes.
Stars: ✭ 32 (+45.45%)
dmi-instascraperA GUI for Instaloader to scrape users and hashtags with on Instagram
Stars: ✭ 21 (-4.55%)
internet-affordability🌍 Dataset that shows the Internet affordability by country (a shocking reality!)
Stars: ✭ 13 (-40.91%)
scraping-ebayScraping Ebay's products using Scrapy Web Crawling Framework
Stars: ✭ 79 (+259.09%)
halfstaff🇺🇸 Is the US flag at half-staff?
Stars: ✭ 22 (+0%)
scrapy-distributedA series of distributed components for Scrapy. Including RabbitMQ-based components, Kafka-based components, and RedisBloom-based components for Scrapy.
Stars: ✭ 38 (+72.73%)
codechef-rank-comparatorWeb application hosted on Heroku cloud platform based on web scraping in python using lxml library (XML Path Language).
Stars: ✭ 23 (+4.55%)
IMDB-ScraperScrapy project for scraping data from IMDB with Movie Dataset including 58,623 movies' data.
Stars: ✭ 37 (+68.18%)
PyLexPerform lexical analysis on words, one word at a time.
Stars: ✭ 60 (+172.73%)
gunaydinYour good mornings ☀️
Stars: ✭ 16 (-27.27%)
shupA POSIX shell script to parse HTML
Stars: ✭ 28 (+27.27%)