ScrappleA framework for creating semi-automatic web content extractors
SelectolaxPython binding to Modest engine (fast HTML5 parser with CSS selectors).
AutoscraperA Smart, Automatic, Fast and Lightweight Web Scraper for Python
AcheACHE is a web crawler for domain-specific search.
Gopa[WIP] GOPA, a spider written in Golang, for Elasticsearch. DEMO: http://index.elasticsearch.cn
Apify JsApify SDK — The scalable web scraping and crawling library for JavaScript/Node.js. Enables development of data extraction and web automation jobs (not only) with headless Chrome and Puppeteer.
Php Curl ClassPHP Curl Class makes it easy to send HTTP requests and integrate with web APIs
comic-scraper[Python] Scraps comics and manga from various websites and creates cbz files from them
Text-AnalysisExplaining textual analysis tools in Python. Including Preprocessing, Skip Gram (word2vec), and Topic Modelling.
raspagem-de-dados-fatec📓 Minicurso de raspagem de dados web com Python ministrado na Semana de Tecnologia da FATEC Jundiaí
papercutPapercut is a scraping/crawling library for Node.js built on top of JSDOM. It provides basic selector features together with features like Page Caching and Geosearch.
PaperScraperA web scraping tool to systematically extract the text of scientific papers and corresponding metadata from university accessible journals.
linkextractorA Docker tutorial using a link extraction application example
sp-subway-scraper🚆This web scraper builds a dataset for São Paulo subway operation status
codechef-rank-comparatorWeb application hosted on Heroku cloud platform based on web scraping in python using lxml library (XML Path Language).
investigation-amazon-brandsMaterials to reproduce our findings in our stories, "Amazon Puts Its Own 'Brands' First Above Better-Rated Products" and "When Amazon Takes the Buy Box, it Doesn’t Give it up"
actor-scraperHouse of Apify Scrapers. Generic scraping actors with a simple UI to handle complex web crawling and scraping use cases.
restaurant-finder-featureReviewsBuild a Flask web application to help users retrieve key restaurant information and feature-based reviews (generated by applying market-basket model – Apriori algorithm and NLP on user reviews).
heroshiHeroshi – open source web crawler.
tableau-scrapingTableau scraper python library. R and Python scripts to scrape data from Tableau viz
scraping-ebayScraping Ebay's products using Scrapy Web Crawling Framework
IMDB-ScraperScrapy project for scraping data from IMDB with Movie Dataset including 58,623 movies' data.
OLX Scraper📻 An OLX Scraper using Scrapy + MongoDB. It Scrapes recent ads posted regarding requested product and dumps to NOSQL MONGODB.
Node-js-functionalitiesThis repository contains very useful restful API's and functionalities in node-js containing many important tutorial code for mastering node-js, all tutorials have been published on medium.com, tutorials link is given below
leetcode-compensationCompensation analysis on the posts scraped from leetcode.com/discuss/compensation. At present, the reports have been generated only for Indian cities.
WaWebSessionHandler(DISCONTINUED) Save WhatsApp Web Sessions as files and open them everywhere!
browser-poolA Node.js library to easily manage and rotate a pool of web browsers, using any of the popular browser automation libraries like Puppeteer, Playwright, or SecretAgent.
extractnetA Dragnet that also extract author, headline, date, keywords from context
iwwAI based web-wrapper for web-content-extraction
Linkedin-ClientWeb scraper for grabing data from Linkedin profiles or company pages (personal project)
htmlunit🕸🧰☕️Tools to Scrape Dynamic Web Content via the 'HtmlUnit' Java Library
grailerweb scraping tool for grailed.com
cl-torrentsSearching torrents on popular trackers - CLI, readline, GUI, web client. Tutorial and binaries (issue tracker on https://gitlab.com/vindarel/cl-torrents/)
rymscraperPython API to extract data from rateyourmusic.com.
selectorlibA library to read a YML file with Xpath or CSS Selectors and extract data from HTML pages using them
faexportThe API for Furaffinity you wish existed
Pythoncovers python basic to advance topics, practice questions, logical problems in python, web development using html, css, bootstrap, jquery, DOM, Django 🚀🚀. 💥 🌈
reapr🕸→ℹ️ Reap Information from Websites
iowebWeb Scraping Framework
actor-content-checkerYou can use this act to monitor any page's content and get a notification when content changes.