Top 392 scraper open source projects

Heroku ebooks
A script to generate Markov chains and to post to an _ebooks account on Twitter using Heroku
Scrape Linkedin Selenium
`scrape_linkedin` is a python package that allows you to scrape personal LinkedIn profiles & company pages - turning the data into structured json.
Getsy
A simple browser/client-side web scraper.
Skrape.it
A Kotlin-based testing/scraping/parsing library providing the ability to analyze and extract data from HTML (server & client-side rendered). It places particular emphasis on ease of use and a high level of readability by providing an intuitive DSL. It aims to be a testing lib, but can also be used to scrape websites in a convenient fashion.
Annie
👾 Fast and simple video download library and CLI tool written in Go
Scrapysharp
reborn of https://bitbucket.org/rflechner/scrapysharp
Ruiji.net
crawler framework, distributed crawler extractor
Goose Parser
Universal scrapping tool, which allows you to extract data using multiple environments
Media Scraper
Scrapes all photos and videos in a web page / Instagram / Twitter / Tumblr / Reddit / pixiv / TikTok
Tianyancha
pip安装的天眼查爬虫API,指定的单个/多个企业工商信息一键保存为Excel/JSON格式。A Battery-included Scraper API of Tianyancha, the best Chinese business data and investigation platform.
Colly
Elegant Scraper and Crawler Framework for Golang
Weibo terminater
Final Weibo Crawler Scrap Anything From Weibo, comments, weibo contents, followers, anything. The Terminator
Querylist
🕷️ The progressive PHP crawler framework! 优雅的渐进式PHP采集框架。
Jsonframe Cheerio
simple multi-level scraper json input/output for Cheerio
Jvppeteer
Headless Chrome For Java (Java 爬虫)
Unfurl
Scraper for oEmbed, Twitter Cards and Open Graph metadata - fast and Promise-based ⚡️
Thepiratebay
💀 The Pirate Bay node.js client
Anime Dl
Anime-dl is a command-line program to download anime from CrunchyRoll and Funimation.
Goribot
[Crawler/Scraper for Golang]🕷A lightweight distributed friendly Golang crawler framework.一个轻量的分布式友好的 Golang 爬虫框架。
Gmdb
GMDB is the ultra-simple, cross-platform Movie Library with Features (Search, Take Note, Watch Later, Like, Import, Learn, Instantly Torrent Magnet Watch)
Unhtml.rs
A magic html parser
Instagram Crawler
Crawl instagram photos, posts and videos for download.
Linkedin Profile Scraper
🕵️‍♂️ LinkedIn profile scraper returning structured profile data in JSON. Works in 2020.
Readablewebproxy
Rewriting web proxy and archival tool. At this point, it just tries to download all the things.
Novel
基于 Laravel 5.2 的小说网站
Scrape Twitter
🐦 Access Twitter data without an API key. [DEPRECATED]
Scrapelib
⛏ a library for scraping things
Datmusic Api
Alternative for VK Audio API
Opensanctions
An open database of international sanctions data, persons of interest and politically exposed persons
Covid19 mobility
COVID-19 Mobility Data Aggregator. Scraper of Google, Apple, Waze and TomTom COVID-19 Mobility Reports🚶🚘🚉
Instagram Scraper
scrapes medias, likes, followers, tags and all metadata. Inspired by instagram-php-scraper,bot
Demeter
Demeter is a tool for scraping the calibre web ui
Serpscrap
SEO python scraper to extract data from major searchengine result pages. Extract data like url, title, snippet, richsnippet and the type from searchresults for given keywords. Detect Ads or make automated screenshots. You can also fetch text content of urls provided in searchresults or by your own. It's usefull for SEO and business related research tasks.
Nooverviewavailable.com
A survey of Apple developer documentation.
✭ 152
rubyscraper
Phpscraper
PHP Scraper - an highly opinionated web-interface for PHP
Scraperwiki Python
ScraperWiki Python library for scraping and saving data
Google2csv
Google2Csv a simple google scraper that saves the results on a csv/xlsx/jsonl file
Google Play Scraper
Google play scraper for Python inspired by <facundoolano/google-play-scraper>
Zillow
Zillow Scraper for Python using Selenium
Go Jd
京东自动登录,在线商品自动下单
Bandcamp Scraper
A scraper for https://bandcamp.com
Onegram
This repository is no longer maintained.
Udemycoursegrabber
Your will to enroll in Udemy course is here, but the money isn't? Search no more! This python program searches for your desired course in more than [insert big number here] websites, compares the last updated date, and gives you the download link of the latest one back, but you also have the choice to see the other ones as well!
Newspaper
News, full-text, and article metadata extraction in Python 3. Advanced docs:
Proxyscrape
Python library for retrieving free proxies (HTTP, HTTPS, SOCKS4, SOCKS5).
Scraper
A scraper that switches between normal mode and gentleman mode, built on Eletron, React
Mwoffliner
Scrape any online Mediawiki motorised wiki (like Wikipedia) to your local filesystem
Arxivscraper
A python module to scrape arxiv.org for specific date range and categories
Youtube Comment Suite
Download YouTube comments from numerous videos, playlists, and channels for archiving, general search, and showing activity.
Seleniumcrawler
An example using Selenium webdrivers for python and Scrapy framework to create a web scraper to crawl an ASP site
Cum
comic updater, mangafied
Ridereceipts
🚕 Simple automation desktop app to download and organize your receipts from Uber/Lyft. Try out our new Ride Receipts PRO !
Instagram Python Scraper
A instagram scraper wrote in python. Similar to instagram-php-scraper.Usages are in example.py. Enjoy it!
Headlesschrome
A Go package for working with headless Chrome. Run interactive JavaScript commands on web pages with Go and Chrome.
1-60 of 392 scraper projects