ScrapyrtHTTP API for Scrapy spiders
Stars: ✭ 637 (+3438.89%)
ARGUSARGUS is an easy-to-use web scraping tool. The program is based on the Scrapy Python framework and is able to crawl a broad range of different websites. On the websites, ARGUS is able to perform tasks like scraping texts or collecting hyperlinks between websites. See: https://link.springer.com/article/10.1007/s11192-020-03726-9
Stars: ✭ 68 (+277.78%)
scrapy-distributedA series of distributed components for Scrapy. Including RabbitMQ-based components, Kafka-based components, and RedisBloom-based components for Scrapy.
Stars: ✭ 38 (+111.11%)
double-agentA test suite of common scraper detection techniques. See how detectable your scraper stack is.
Stars: ✭ 123 (+583.33%)
Scrapy SeleniumScrapy middleware to handle javascript pages using selenium
Stars: ✭ 550 (+2955.56%)
DotnetcrawlerDotnetCrawler is a straightforward, lightweight web crawling/scrapying library for Entity Framework Core output based on dotnet core. This library designed like other strong crawler libraries like WebMagic and Scrapy but for enabling extandable your custom requirements. Medium link : https://medium.com/@mehmetozkaya/creating-custom-web-crawler-with-dotnet-core-using-entity-framework-core-ec8d23f0ca7c
Stars: ✭ 100 (+455.56%)
scrapy-fieldstatsA Scrapy extension to log items coverage when the spider shuts down
Stars: ✭ 17 (-5.56%)
Vaultswiss army knife for hackers
Stars: ✭ 346 (+1822.22%)
Elves🎊 Design and implement of lightweight crawler framework.
Stars: ✭ 315 (+1650%)
Awesome ScrapyA curated list of awesome packages, articles, and other cool resources from the Scrapy community.
Stars: ✭ 360 (+1900%)
WebhubbotPython + Scrapy + MongoDB . 5 million data per day !!!💥 The world's largest website.
Stars: ✭ 5,427 (+30050%)
Scrapy RedisRedis-based components for Scrapy.
Stars: ✭ 4,998 (+27666.67%)
LinkedinLinkedin Scraper using Selenium Web Driver, Chromium headless, Docker and Scrapy
Stars: ✭ 309 (+1616.67%)
AlltheplacesA set of spiders and scrapers to extract location information from places that post their location on the internet.
Stars: ✭ 277 (+1438.89%)
FunpyspidersearchengineWord2vec 千人千面 个性化搜索 + Scrapy2.3.0(爬取数据) + ElasticSearch7.9.1(存储数据并提供对外Restful API) + Django3.1.1 搜索
Stars: ✭ 782 (+4244.44%)
Gopa[WIP] GOPA, a spider written in Golang, for Elasticsearch. DEMO: http://index.elasticsearch.cn
Stars: ✭ 277 (+1438.89%)
Apify JsApify SDK — The scalable web scraping and crawling library for JavaScript/Node.js. Enables development of data extraction and web automation jobs (not only) with headless Chrome and Puppeteer.
Stars: ✭ 3,154 (+17422.22%)
FerretDeclarative web scraping
Stars: ✭ 4,837 (+26772.22%)
FilesDocs and files for ScrapydWeb, Scrapyd, Scrapy, and other projects
Stars: ✭ 390 (+2066.67%)
TweetscraperTweetScraper is a simple crawler/spider for Twitter Search without using API
Stars: ✭ 694 (+3755.56%)
Webstera reliable high-level web crawling & scraping framework for Node.js.
Stars: ✭ 364 (+1922.22%)
Lulu[Unmaintained] A simple and clean video/music/image downloader 👾
Stars: ✭ 789 (+4283.33%)
ip proxy poolGenerating spiders dynamically to crawl and check those free proxy ip on the internet with scrapy.
Stars: ✭ 39 (+116.67%)
SpidermonScrapy Extension for monitoring spiders execution.
Stars: ✭ 309 (+1616.67%)
FbcrawlA Facebook crawler
Stars: ✭ 536 (+2877.78%)
Sasila一个灵活、友好的爬虫框架
Stars: ✭ 286 (+1488.89%)
Haipproxy💖 High available distributed ip proxy pool, powerd by Scrapy and Redis
Stars: ✭ 4,993 (+27638.89%)
SeekerSeeker - another job board aggregator.
Stars: ✭ 16 (-11.11%)
Python Spider豆瓣电影top250、斗鱼爬取json数据以及爬取美女图片、淘宝、有缘、CrawlSpider爬取红娘网相亲人的部分基本信息以及红娘网分布式爬取和存储redis、爬虫小demo、Selenium、爬取多点、django开发接口、爬取有缘网信息、模拟知乎登录、模拟github登录、模拟图虫网登录、爬取多点商城整站数据、爬取微信公众号历史文章、爬取微信群或者微信好友分享的文章、itchat监听指定微信公众号分享的文章
Stars: ✭ 615 (+3316.67%)
DataflowkitExtract structured data from web sites. Web sites scraping.
Stars: ✭ 456 (+2433.33%)
PttImageSpiderPTT 圖片下載器 (抓取整個看板的圖片,並用文章標題作為資料夾的名稱 ) (使用Scrapy)
Stars: ✭ 16 (-11.11%)
Happy Spiders🔧 🔩 🔨 收集整理了爬虫相关的工具、模拟登陆技术、代理IP、scrapy模板代码等内容。
Stars: ✭ 261 (+1350%)
SpidyThe simple, easy to use command line web crawler.
Stars: ✭ 257 (+1327.78%)
IcrawlerA multi-thread crawler framework with many builtin image crawlers provided.
Stars: ✭ 629 (+3394.44%)
Douban CrawlerUno Crawler por https://douban.com
Stars: ✭ 13 (-27.78%)
ScrappleA framework for creating semi-automatic web content extractors
Stars: ✭ 464 (+2477.78%)
House RentingPossibly the best practice of Scrapy 🕷 and renting a house 🏡
Stars: ✭ 741 (+4016.67%)
CrawlyCrawly, a high-level web crawling & scraping framework for Elixir.
Stars: ✭ 440 (+2344.44%)
scrapyra simple & tiny scrapy clustering solution, considered a drop-in replacement for scrapyd
Stars: ✭ 50 (+177.78%)
bots-zooNo description or website provided.
Stars: ✭ 59 (+227.78%)
Isp Data PollutionISP Data Pollution to Protect Private Browsing History with Obfuscation
Stars: ✭ 425 (+2261.11%)
toutiao今日头条科技新闻接口爬虫
Stars: ✭ 17 (-5.56%)
Scrapy Finance[OUTDATED] scrapy spiders to crawl the financial text data 📚 📜 pertinent to train word vectors 🚀
Stars: ✭ 17 (-5.56%)