Weibo AnalystSocial media (Weibo) comments analyzing toolbox in Chinese 微博评论分析工具, 实现功能: 1.微博评论数据爬取; 2.分词与关键词提取; 3.词云与词频统计; 4.情感分析; 5.主题聚类
Comicbook本项目不再维护,详情可加群了解 https://t.me/onecomicbook
Iclr2020 OpenreviewdataScript that crawls meta data from ICLR OpenReview webpage. Tutorials on installing and using Selenium and ChromeDriver on Ubuntu.
DotcommonWhat do people have in their dotfiles?
Bilili🍻 bilibili video (including bangumi) and danmaku downloader | B站视频(含番剧)、弹幕下载器
Signature algorithm各种App、小程序、网站的请求签名或加密算法。 现已有:自如、小红书、蛋壳公寓、luckin coffee(瑞幸咖啡)、bangkokair(曼谷航空)
Iclr2019 OpenreviewdataScript that crawls meta data from ICLR OpenReview webpage. Tutorials on installing and using Selenium and ChromeDriver on Ubuntu.
Webstera reliable high-level web crawling & scraping framework for Node.js.
Fictiondown小说下载|小说爬取|起点|笔趣阁|导出Markdown|导出txt|转换epub|广告过滤|自动校对
Tsrtc台灣股票即時爬蟲。Taiwan Stock Exchange Real Time Crawler
InstagramcrawlerA non API python program to crawl public photos, posts or followers
Freshonions TorscraperFresh Onions is an open source TOR spider / hidden service onion crawler hosted at zlal32teyptf4tvi.onion
ScavengerCrawler (Bot) searching for credential leaks on different paste sites.
Vaultswiss army knife for hackers
Ttbot今日头条机器人,支持用户登陆、关注、取消关注、获取关注粉丝、发文、发悟空问答、点赞、评论、采集各种类型新闻讯息等,使用今日头条网页版API实现
91porn Api🌭💦 91porn爬虫在线无限制API接口(永久有效,口令每日更新) 及 在线web预览
AutoscraperA Smart, Automatic, Fast and Lightweight Web Scraper for Python
Tsec台灣上市上櫃股票爬蟲 Taiwan Stock Exchange Crawler
Dom CrawlerThe DomCrawler component eases DOM navigation for HTML and XML documents.
ScyllaIntelligent proxy pool for Humans™ (Maintainer needed)
SupercrawlerA web crawler. Supercrawler automatically crawls websites. Define custom handlers to parse content. Obeys robots.txt, rate limits and concurrency limits.
ToapiEvery web site provides APIs.
Go DorkThe fastest dork scanner written in Go.
Hquery.phpAn extremely fast web scraper that parses megabytes of invalid HTML in a blink of an eye. PHP5.3+, no dependencies.
GhcrawlerCrawl GitHub APIs and store the discovered orgs, repos, commits, ...
Weixin Spider微信公众号爬虫,公众号历史文章,文章评论,文章阅读及在看数据,可视化web页面,可部署于Windows服务器。基于Python3之flask/mysql/redis/mitmproxy/pywin32等实现,高效微信爬虫,微信公众号爬虫,历史文章,文章评论,数据更新。
Gospidergolang实现的爬虫框架,使用者只需关心页面规则,提供web管理界面。基于colly开发。
DotnetspiderDotnetSpider, a .NET standard web crawling library. It is lightweight, efficient and fast high-level web crawling & scraping framework
Gopa[WIP] GOPA, a spider written in Golang, for Elasticsearch. DEMO: http://index.elasticsearch.cn
ArachniWeb Application Security Scanner Framework
SpidyThe simple, easy to use command line web crawler.
Skycaiji蓝天采集器是一款免费的数据采集发布爬虫软件,采用php+mysql开发,可部署在云服务器,几乎能采集所有类型的网页,无缝对接各类CMS建站程序,免登录实时发布数据,全自动无需人工干预!是网页大数据采集软件中完全跨平台的云端爬虫系统
lightnovel epub🍭 epub generator for (light)novels (轻) 小说 epub 生成器,支持站点:轻之国度、轻小说文库
galerA fast tool to fetch URLs from HTML attributes by crawl-in.
octopusRecursive and multi-threaded broken link checker