ScrapoxyScrapoxy hides your scraper behind a cloud. It starts a pool of proxies to send your requests. Now, you can crawl without thinking about blacklisting!
Stars: ✭ 1,322 (+370.46%)
Mutual labels: crawler, scrapy, proxy
ScrappleA framework for creating semi-automatic web content extractors
Stars: ✭ 464 (+65.12%)
Mutual labels: crawler, scrapy, scraping
DotnetcrawlerDotnetCrawler is a straightforward, lightweight web crawling/scrapying library for Entity Framework Core output based on dotnet core. This library designed like other strong crawler libraries like WebMagic and Scrapy but for enabling extandable your custom requirements. Medium link : https://medium.com/@mehmetozkaya/creating-custom-web-crawler-with-dotnet-core-using-entity-framework-core-ec8d23f0ca7c
Stars: ✭ 100 (-64.41%)
Mutual labels: crawler, scrapy, scraping
Easy Scraping TutorialSimple but useful Python web scraping tutorial code.
Stars: ✭ 583 (+107.47%)
Mutual labels: crawler, scrapy, scraping
Marmot💐Marmot | Web Crawler/HTTP protocol Download Package 🐭
Stars: ✭ 186 (-33.81%)
Mutual labels: crawler, scrapy, proxy
double-agentA test suite of common scraper detection techniques. See how detectable your scraper stack is.
Stars: ✭ 123 (-56.23%)
Mutual labels: scraping, scrapy
RARBG-scraperWith Selenium headless browsing and CAPTCHA solving
Stars: ✭ 38 (-86.48%)
Mutual labels: scraping, scrapy
InstaBotSimple and friendly Bot for Instagram, using Selenium and Scrapy with Python.
Stars: ✭ 32 (-88.61%)
Mutual labels: scraping, scrapy
proxiProxy pool. Finds and checks proxies with rest api for querying results. Can find over 25k proxies in under 5 minutes.
Stars: ✭ 32 (-88.61%)
Mutual labels: scraping, scrapy
FilesensorDynamic file detection tool based on crawler 基于爬虫的动态敏感文件探测工具
Stars: ✭ 227 (-19.22%)
Mutual labels: crawler, scrapy
torchestratorSpin up Tor containers and then proxy HTTP requests via these Tor instances
Stars: ✭ 32 (-88.61%)
Mutual labels: scraping, scrapy
scrapy facebookerCollection of scrapy spiders which can scrape posts, images, and so on from public Facebook Pages.
Stars: ✭ 22 (-92.17%)
Mutual labels: scraping, scrapy
Gopa[WIP] GOPA, a spider written in Golang, for Elasticsearch. DEMO: http://index.elasticsearch.cn
Stars: ✭ 277 (-1.42%)
Mutual labels: crawler, scraping
Ppspiderweb spider built by puppeteer, support task-queue and task-scheduling by decorators,support nedb / mongodb, support data visualization; 基于puppeteer的web爬虫框架,提供灵活的任务队列管理调度方案,提供便捷的数据保存方案(nedb/mongodb),提供数据可视化和用户交互的实现方案
Stars: ✭ 237 (-15.66%)
Mutual labels: crawler, proxy
scrapy-fieldstatsA Scrapy extension to log items coverage when the spider shuts down
Stars: ✭ 17 (-93.95%)
Mutual labels: scraping, scrapy
Ecommercecrawlers码云仓库链接:AJay13/ECommerceCrawlers
Github 仓库链接:DropsDevopsOrg/ECommerceCrawlers
项目展示平台链接:http://wechat.doonsec.com
Stars: ✭ 3,073 (+993.59%)
Mutual labels: crawler, scrapy
scrapy-distributedA series of distributed components for Scrapy. Including RabbitMQ-based components, Kafka-based components, and RedisBloom-based components for Scrapy.
Stars: ✭ 38 (-86.48%)
Mutual labels: scraping, scrapy
ptt-web-crawlerPTT 網路版爬蟲
Stars: ✭ 20 (-92.88%)
Mutual labels: crawler, scrapy
papercutPapercut is a scraping/crawling library for Node.js built on top of JSDOM. It provides basic selector features together with features like Page Caching and Geosearch.
Stars: ✭ 15 (-94.66%)
Mutual labels: crawler, scraping
memes-apiAPI for scrapping common meme sites
Stars: ✭ 17 (-93.95%)
Mutual labels: scraping, scrapy