All Categories → Data Processing → web-scraping

Top 135 web-scraping open source projects

Scrapple
A framework for creating semi-automatic web content extractors
Selectolax
Python binding to Modest engine (fast HTML5 parser with CSS selectors).
Ache
ACHE is a web crawler for domain-specific search.
Basketball reference web scraper
NBA Stats API via Basketball Reference
Gopa
[WIP] GOPA, a spider written in Golang, for Elasticsearch. DEMO: http://index.elasticsearch.cn
Apify Js
Apify SDK — The scalable web scraping and crawling library for JavaScript/Node.js. Enables development of data extraction and web automation jobs (not only) with headless Chrome and Puppeteer.
comic-scraper
[Python] Scraps comics and manga from various websites and creates cbz files from them
Stock-Fundamental-data-scraping-and-analysis
Project on building a web crawler to collect the fundamentals of the stock and review their performance in one go
raspagem-de-dados-fatec
📓 Minicurso de raspagem de dados web com Python ministrado na Semana de Tecnologia da FATEC Jundiaí
article-summary-deep-learning
📖 Using deep learning and scraping to analyze/summarize articles! Just drop in any URL!
papercut
Papercut is a scraping/crawling library for Node.js built on top of JSDOM. It provides basic selector features together with features like Page Caching and Geosearch.
PaperScraper
A web scraping tool to systematically extract the text of scientific papers and corresponding metadata from university accessible journals.
sp-subway-scraper
🚆This web scraper builds a dataset for São Paulo subway operation status
codechef-rank-comparator
Web application hosted on Heroku cloud platform based on web scraping in python using lxml library (XML Path Language).
investigation-amazon-brands
Materials to reproduce our findings in our stories, "Amazon Puts Its Own 'Brands' First Above Better-Rated Products" and "When Amazon Takes the Buy Box, it Doesn’t Give it up"
actor-scraper
House of Apify Scrapers. Generic scraping actors with a simple UI to handle complex web crawling and scraping use cases.
restaurant-finder-featureReviews
Build a Flask web application to help users retrieve key restaurant information and feature-based reviews (generated by applying market-basket model – Apriori algorithm and NLP on user reviews).
heroshi
Heroshi – open source web crawler.
top-github-scraper
Scape top GitHub repositories and users based on keywords
tableau-scraping
Tableau scraper python library. R and Python scripts to scrape data from Tableau viz
scraping-ebay
Scraping Ebay's products using Scrapy Web Crawling Framework
India-WhatsAppFakeNews-Dataset
WhatsApps related deaths News Articles along with other articles across India during that period
IMDB-Scraper
Scrapy project for scraping data from IMDB with Movie Dataset including 58,623 movies' data.
automation-scripts
Simple scripts that I'm using to automate the boring things.
Node-js-functionalities
This repository contains very useful restful API's and functionalities in node-js containing many important tutorial code for mastering node-js, all tutorials have been published on medium.com, tutorials link is given below
leetcode-compensation
Compensation analysis on the posts scraped from leetcode.com/discuss/compensation. At present, the reports have been generated only for Indian cities.
WaWebSessionHandler
(DISCONTINUED) Save WhatsApp Web Sessions as files and open them everywhere!
browser-pool
A Node.js library to easily manage and rotate a pool of web browsers, using any of the popular browser automation libraries like Puppeteer, Playwright, or SecretAgent.
Linkedin-Client
Web scraper for grabing data from Linkedin profiles or company pages (personal project)
htmlunit
🕸🧰☕️Tools to Scrape Dynamic Web Content via the 'HtmlUnit' Java Library
cl-torrents
Searching torrents on popular trackers - CLI, readline, GUI, web client. Tutorial and binaries (issue tracker on https://gitlab.com/vindarel/cl-torrents/)
rymscraper
Python API to extract data from rateyourmusic.com.
selectorlib
A library to read a YML file with Xpath or CSS Selectors and extract data from HTML pages using them
Python
covers python basic to advance topics, practice questions, logical problems in python, web development using html, css, bootstrap, jquery, DOM, Django 🚀🚀. 💥 🌈
TikTokDownloader PyWebIO
🚀「Douyin_TikTok_Download_API」是一个开箱即用的高性能异步抖音|TikTok数据爬取工具,支持API调用,在线批量解析及下载。
reapr
🕸→ℹ️ Reap Information from Websites
actor-content-checker
You can use this act to monitor any page's content and get a notification when content changes.
61-120 of 135 web-scraping projects