Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

ARGUS is an easy-to-use web scraping tool. The program is based on the Scrapy Python framework and is able to crawl a broad range of different websites. On the websites, ARGUS is able to perform tasks like scraping texts or collecting hyperlinks between websites. See: https://link.springer.com/article/10.1007/s11192-020-03726-9

Stars: ✭ 68 (-81.11%)

Mutual labels: scrapy

Douban Crawler

Uno Crawler por https://douban.com

Stars: ✭ 13 (-96.39%)

Mutual labels: scrapy

dannyAVgleDownloader

知名網站avgle下載器

Stars: ✭ 27 (-92.5%)

Mutual labels: scrapy

Elves

🎊 Design and implement of lightweight crawler framework.

Stars: ✭ 315 (-12.5%)

Mutual labels: scrapy

PttImageSpider

PTT 圖片下載器 (抓取整個看板的圖片，並用文章標題作為資料夾的名稱 ) (使用Scrapy)

Stars: ✭ 16 (-95.56%)

Mutual labels: scrapy

Happy Spiders

🔧 🔩 🔨 收集整理了爬虫相关的工具、模拟登陆技术、代理IP、scrapy模板代码等内容。

Stars: ✭ 261 (-27.5%)

Mutual labels: scrapy

policy-data-analyzer

Building a model to recognize incentives for landscape restoration in environmental policies from Latin America, the US and India. Bringing NLP to the world of policy analysis through an extensible framework that includes scraping, preprocessing, active learning and text analysis pipelines.

Stars: ✭ 22 (-93.89%)

Mutual labels: scrapy

scrapyr

a simple & tiny scrapy clustering solution, considered a drop-in replacement for scrapyd

Stars: ✭ 50 (-86.11%)

Mutual labels: scrapy

tripadvisor-scraper

TripAdvisor scraper

Stars: ✭ 63 (-82.5%)

Mutual labels: scrapy

douban-spider

基于Scrapy框架的豆瓣电影爬虫

Stars: ✭ 25 (-93.06%)

Mutual labels: scrapy

Scrapy Crawlera

Crawlera middleware for Scrapy

Stars: ✭ 281 (-21.94%)

Mutual labels: scrapy

scrapy-pipelines

A collection of pipelines for Scrapy

Stars: ✭ 16 (-95.56%)

Mutual labels: scrapy

ip proxy pool

Generating spiders dynamically to crawl and check those free proxy ip on the internet with scrapy.

Stars: ✭ 39 (-89.17%)

Mutual labels: scrapy

Vault

swiss army knife for hackers

Stars: ✭ 346 (-3.89%)

Mutual labels: scrapy

Linkedin Scraper using Selenium Web Driver, Chromium headless, Docker and Scrapy

Stars: ✭ 309 (-14.17%)

Mutual labels: scrapy

Tieba spider

百度贴吧爬虫(基于scrapy和mysql)

Stars: ✭ 257 (-28.61%)

Mutual labels: scrapy

View All Similar Projects ➔

Awesome Scrapy

A curated list of awesome packages, articles, and other cool resources from the Scrapy community. Scrapy is a fast high-level web crawling & scraping framework for Python.

Apps
Resources
- Articles
- Exercises
- Video
- Book

Apps

Visual Web Scraping

Portia Visual scraping for Scrapy

Distributed Spider

scrapy-redis Redis-based components for Scrapy.

Scrapy Service

scrapyscript Run a Scrapy spider programmatically from a script or a Celery task - no project required.
scrapyd A service daemon to run Scrapy spiders
scrapyd-client Command line client for Scrapyd server
python-scrapyd-api A Python wrapper for working with Scrapyd's API.
SpiderKeeper A scalable admin ui for spider service
scrapyrt HTTP server which provides API for scheduling Scrapy spiders and making requests with spiders.

Monitor

scrapy-sentry Logs Scrapy exceptions into Sentry
scrapy-statsd-middleware Statsd integration middleware for scrapy
scrapy-jsonrpc An extension to control a running Scrapy web crawler via JSON-RPC
scrapy-fieldstats A Scrapy extension to log items coverage when the spider shuts down

Avoid Ban

HttpProxyMiddleware A middleware for scrapy. Used to change HTTP proxy from time to time.
scrapy-proxies Processes Scrapy requests using a random proxy from list to avoid IP ban and improve crawling speed.
scrapy-rotating-proxies Use multiple proxies with Scrapy
scrapy-random-useragent Scrapy Middleware to set a random User-Agent for every Request.
scrapy-fake-useragent Random User-Agent middleware based on fake-useragent
scrapy-crawlera Crawlera routes requests through a pool of IPs, throttling access by introducing delays and discarding IPs from the pool when they get banned from certain domains, or have other problems.

Data Processing

scrapy-elasticsearch A scrapy pipeline which send items to Elastic Search server
scrapy-mongodb MongoDB pipeline for Scrapy.
scrapy-s3pipeline Scrapy pipeline to store chunked items into AWS S3 bucket
scrapy-sqs-exporter Scrapy extension for outputting scraped items to an Amazon SQS instance
scrapy-kafka-export Scrapy extension which writes crawled items to Kafka
scrapy-rss-exporter An RSS exporter for Scrapy

Process Javascript

scrapy-splash Make Scrapy can understand Javascript

Other Useful Extensions

scrapy-djangoitem Scrapy extension to write scraped items using Django models
scrapy-deltafetch Scrapy spider middleware to ignore requests to pages containing items seen in previous crawls
scrapy-crawl-once This package provides a Scrapy middleware which allows to avoid re-crawling pages which were already downloaded in previous crawls.
scrapy-magicfields Scrapy middleware to add extra fields to items, like timestamp, response fields, spider attributes etc.
scrapy-pagestorage A scrapy extension to store requests and responses information in storage service.

Resources

Articles

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 360

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (3) 🔗

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

AccordBox / Awesome Scrapy

Programming Languages

Labels

Projects that are alternatives of or similar to Awesome Scrapy

Awesome Scrapy

Table of Contents

Apps

Visual Web Scraping

Distributed Spider

Scrapy Service

Monitor

Avoid Ban

Data Processing

Process Javascript

Other Useful Extensions

Resources

Articles

Exercises

Video

Book