ha-multiscrapeHome Assistant custom component for scraping (html, xml or json) multiple values (from a single HTTP request) with a separate sensor/attribute for each value. Support for (login) form-submit functionality.
Stars: ✭ 103 (-91.73%)
proxycrawl-pythonProxyCrawl Python library for scraping and crawling
Stars: ✭ 51 (-95.91%)
robotstxtrobots.txt file parsing and checking for R
Stars: ✭ 65 (-94.78%)
document-dlCommand line program to download documents from web portals.
Stars: ✭ 14 (-98.88%)
scrapy-distributedA series of distributed components for Scrapy. Including RabbitMQ-based components, Kafka-based components, and RedisBloom-based components for Scrapy.
Stars: ✭ 38 (-96.95%)
scrapmanRetrieve real (with Javascript executed) HTML code from an URL, ultra fast and supports multiple parallel loading of webs
Stars: ✭ 21 (-98.31%)
CrawlabDistributed web crawler admin platform for spiders management regardless of languages and frameworks. 分布式爬虫管理平台,支持任何语言和框架
Stars: ✭ 8,392 (+573.52%)
flink-crawlerContinuous scalable web crawler built on top of Flink and crawler-commons
Stars: ✭ 48 (-96.15%)
Scraper-Projects🕸 List of mini projects that involve web scraping 🕸
Stars: ✭ 25 (-97.99%)
ZeiverA Scraper, Downloader, & Recorder for static open directories.
Stars: ✭ 14 (-98.88%)
TorScrapperA Scraper made 100% in Python using BeautifulSoup and Tor. It can be used to scrape both normal and onion links. Happy Scraping :)
Stars: ✭ 24 (-98.07%)
TikTokDownloader PyWebIO🚀「Douyin_TikTok_Download_API」是一个开箱即用的高性能异步抖音|TikTok数据爬取工具,支持API调用,在线批量解析及下载。
Stars: ✭ 919 (-26.24%)
Bilili🍻 bilibili video (including bangumi) and danmaku downloader | B站视频(含番剧)、弹幕下载器
Stars: ✭ 379 (-69.58%)
Lizard💐 Full Amazon Automatic Download
Stars: ✭ 41 (-96.71%)
facebook-discussion-tkA collection of tools to (semi-)automatically collect and analyze data from online discussions on Facebook groups and pages.
Stars: ✭ 33 (-97.35%)
lightnovel epub🍭 epub generator for (light)novels (轻) 小说 epub 生成器,支持站点:轻之国度、轻小说文库
Stars: ✭ 89 (-92.86%)
RcrawlerAn R web crawler and scraper
Stars: ✭ 274 (-78.01%)
Bt Btt磁力網站U3C3介紹以及域名更新
Stars: ✭ 261 (-79.05%)
ScrapedinLinkedIn Scraper (currently working 2020)
Stars: ✭ 453 (-63.64%)
MamanRust Web Crawler saving pages on Redis
Stars: ✭ 39 (-96.87%)
Sasila一个灵活、友好的爬虫框架
Stars: ✭ 286 (-77.05%)
ToapiEvery web site provides APIs.
Stars: ✭ 3,209 (+157.54%)
Gospidergolang实现的爬虫框架,使用者只需关心页面规则,提供web管理界面。基于colly开发。
Stars: ✭ 285 (-77.13%)
91porn Api🌭💦 91porn爬虫在线无限制API接口(永久有效,口令每日更新) 及 在线web预览
Stars: ✭ 341 (-72.63%)
scraper图片爬取下载工具,极速爬取下载 站酷https://www.zcool.com.cn/, CNU 视觉 http://www.cnu.cc/ 设计师/用户 上传的 图片/照片/插画。
Stars: ✭ 64 (-94.86%)
ScrappleA framework for creating semi-automatic web content extractors
Stars: ✭ 464 (-62.76%)
Signature algorithm各种App、小程序、网站的请求签名或加密算法。 现已有:自如、小红书、蛋壳公寓、luckin coffee(瑞幸咖啡)、bangkokair(曼谷航空)
Stars: ✭ 380 (-69.5%)
Spider Flow新一代爬虫平台,以图形化方式定义爬虫流程,不写代码即可完成爬虫。
Stars: ✭ 365 (-70.71%)
Haipproxy💖 High available distributed ip proxy pool, powerd by Scrapy and Redis
Stars: ✭ 4,993 (+300.72%)
XsrfprobeThe Prime Cross Site Request Forgery (CSRF) Audit and Exploitation Toolkit.
Stars: ✭ 532 (-57.3%)
Webstera reliable high-level web crawling & scraping framework for Node.js.
Stars: ✭ 364 (-70.79%)
NetdiscoveryNetDiscovery 是一款基于 Vert.x、RxJava 2 等框架实现的通用爬虫框架/中间件。
Stars: ✭ 573 (-54.01%)
DataflowkitExtract structured data from web sites. Web sites scraping.
Stars: ✭ 456 (-63.4%)
DouyinAPI of DouYin for Humans used to Crawl Popular Videos and Musics
Stars: ✭ 580 (-53.45%)
GoscraperGolang pkg to quickly return a preview of a webpage (title/description/images)
Stars: ✭ 72 (-94.22%)
Go jobs带你了解一下Golang的市场行情
Stars: ✭ 526 (-57.78%)
Xxl CrawlerA distributed web crawler framework.(分布式爬虫框架XXL-CRAWLER)
Stars: ✭ 561 (-54.98%)
Fictiondown小说下载|小说爬取|起点|笔趣阁|导出Markdown|导出txt|转换epub|广告过滤|自动校对
Stars: ✭ 362 (-70.95%)
IcrawlerA multi-thread crawler framework with many builtin image crawlers provided.
Stars: ✭ 629 (-49.52%)
ScrapyrtHTTP API for Scrapy spiders
Stars: ✭ 637 (-48.88%)
Creeper🐾 Creeper - The Next Generation Crawler Framework (Go)
Stars: ✭ 762 (-38.84%)
Imagescraper✂️ High performance, multi-threaded image scraper
Stars: ✭ 630 (-49.44%)
Grab SiteThe archivist's web crawler: WARC output, dashboard for all crawls, dynamic ignore patterns
Stars: ✭ 680 (-45.43%)
Jd AutobuyPython爬虫,京东自动登录,在线抢购商品
Stars: ✭ 1,174 (-5.78%)
GospiderGospider - Fast web spider written in Go
Stars: ✭ 785 (-37%)
Nodespider[DEPRECATED] Simple, flexible, delightful web crawler/spider package
Stars: ✭ 33 (-97.35%)
Zhihu Crawlerzhihu-crawler是一个基于Java的高性能、支持免费http代理池、支持横向扩展、分布式爬虫项目
Stars: ✭ 890 (-28.57%)
TorbotDark Web OSINT Tool
Stars: ✭ 821 (-34.11%)
BeanbunBeanbun 是用 PHP 编写的多进程网络爬虫框架,具有良好的开放性、高可扩展性,基于 Workerman。
Stars: ✭ 1,096 (-12.04%)
PypergrabberFetches PubMed article IDs (PMIDs) from email inbox, then crawls PubMed, Google Scholar and Sci-Hub for respective PDF files.
Stars: ✭ 14 (-98.88%)
PypatentSearch for and retrieve US Patent and Trademark Office Patent Data
Stars: ✭ 31 (-97.51%)