Top 615 crawler open source projects

Weibo Analyst
Social media (Weibo) comments analyzing toolbox in Chinese 微博评论分析工具, 实现功能: 1.微博评论数据爬取; 2.分词与关键词提取; 3.词云与词频统计; 4.情感分析; 5.主题聚类
Runoob Pdf
爬取菜鸟教程网站并转PDF__python_crawer_by_chrome
Comicbook
本项目不再维护,详情可加群了解 https://t.me/onecomicbook
Iclr2020 Openreviewdata
Script that crawls meta data from ICLR OpenReview webpage. Tutorials on installing and using Selenium and ChromeDriver on Ubuntu.
Dotcommon
What do people have in their dotfiles?
Opensearchserver
Open-source Enterprise Grade Search Engine Software
Mmjpg
👩 美女写真套图爬虫(一)
Bilili
🍻 bilibili video (including bangumi) and danmaku downloader | B站视频(含番剧)、弹幕下载器
Signature algorithm
各种App、小程序、网站的请求签名或加密算法。 现已有:自如、小红书、蛋壳公寓、luckin coffee(瑞幸咖啡)、bangkokair(曼谷航空)
Iclr2019 Openreviewdata
Script that crawls meta data from ICLR OpenReview webpage. Tutorials on installing and using Selenium and ChromeDriver on Ubuntu.
Netease Music Cracker
🎵 将可下载的网易云音乐的缓存文件转换为 MP3 文件
Spider Flow
新一代爬虫平台,以图形化方式定义爬虫流程,不写代码即可完成爬虫。
Jivesearch
A search engine that doesn't track you.
Webster
a reliable high-level web crawling & scraping framework for Node.js.
Fictiondown
小说下载|小说爬取|起点|笔趣阁|导出Markdown|导出txt|转换epub|广告过滤|自动校对
Tsrtc
台灣股票即時爬蟲。Taiwan Stock Exchange Real Time Crawler
Instagramcrawler
A non API python program to crawl public photos, posts or followers
Freshonions Torscraper
Fresh Onions is an open source TOR spider / hidden service onion crawler hosted at zlal32teyptf4tvi.onion
Scavenger
Crawler (Bot) searching for credential leaks on different paste sites.
Xcrawler
快速、简洁且强大的PHP爬虫框架
Ttbot
今日头条机器人,支持用户登陆、关注、取消关注、获取关注粉丝、发文、发悟空问答、点赞、评论、采集各种类型新闻讯息等,使用今日头条网页版API实现
91porn Api
🌭💦 91porn爬虫在线无限制API接口(永久有效,口令每日更新) 及 在线web预览
Zhihu Login
知乎模拟登录,支持提取验证码和保存 Cookies
91porn Crawler
🌭💦 91porn爬虫在线API接口(永久有效) 及 在线web预览
Tsec
台灣上市上櫃股票爬蟲 Taiwan Stock Exchange Crawler
Dom Crawler
The DomCrawler component eases DOM navigation for HTML and XML documents.
Crawlerforreader
Android 本地网络小说爬虫,基于jsoup及xpath
Scylla
Intelligent proxy pool for Humans™ (Maintainer needed)
Supercrawler
A web crawler. Supercrawler automatically crawls websites. Define custom handlers to parse content. Obeys robots.txt, rate limits and concurrency limits.
Toapi
Every web site provides APIs.
Go Dork
The fastest dork scanner written in Go.
Hquery.php
An extremely fast web scraper that parses megabytes of invalid HTML in a blink of an eye. PHP5.3+, no dependencies.
Ghcrawler
Crawl GitHub APIs and store the discovered orgs, repos, commits, ...
Weixin Spider
微信公众号爬虫,公众号历史文章,文章评论,文章阅读及在看数据,可视化web页面,可部署于Windows服务器。基于Python3之flask/mysql/redis/mitmproxy/pywin32等实现,高效微信爬虫,微信公众号爬虫,历史文章,文章评论,数据更新。
Sasila
一个灵活、友好的爬虫框架
Gospider
golang实现的爬虫框架,使用者只需关心页面规则,提供web管理界面。基于colly开发。
Crawlertutorial
爬蟲極簡教學(fetch, parse, search, multiprocessing, API)- PTT 為例
Scrapy Crawlera
Crawlera middleware for Scrapy
Dotnetspider
DotnetSpider, a .NET standard web crawling library. It is lightweight, efficient and fast high-level web crawling & scraping framework
Hacker News Digest
📰 A responsive interface of Hacker News with summaries and thumbnails.
Gopa
[WIP] GOPA, a spider written in Golang, for Elasticsearch. DEMO: http://index.elasticsearch.cn
Sitemap Generator
Easily create XML sitemaps for your website.
Rcrawler
An R web crawler and scraper
Line Bot Tutorial
line-bot-tutorial use python flask
Bt Btt
磁力網站U3C3介紹以及域名更新
Weibo terminator workflow
Update Version of weibo_terminator, This is Workflow Version aim at Get Job Done!
Tumblr crawler
This is a Multi-thread crawler for Tumblr.
Spidy
The simple, easy to use command line web crawler.
Skycaiji
蓝天采集器是一款免费的数据采集发布爬虫软件,采用php+mysql开发,可部署在云服务器,几乎能采集所有类型的网页,无缝对接各类CMS建站程序,免登录实时发布数据,全自动无需人工干预!是网页大数据采集软件中完全跨平台的云端爬虫系统
lightnovel epub
🍭 epub generator for (light)novels (轻) 小说 epub 生成器,支持站点:轻之国度、轻小说文库
galer
A fast tool to fetch URLs from HTML attributes by crawl-in.
octopus
Recursive and multi-threaded broken link checker