All Projects → howie6879 → Ruia

howie6879 / Ruia

Licence: apache-2.0
Async Python 3.6+ web scraping micro-framework based on asyncio

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Ruia

Fooproxy
稳健高效的评分制-针对性- IP代理池 + API服务,可以自己插入采集器进行代理IP的爬取,针对你的爬虫的一个或多个目标网站分别生成有效的IP代理数据库,支持MongoDB 4.0 使用 Python3.7(Scored IP proxy pool ,customise proxy data crawler can be added anytime)
Stars: ✭ 195 (-85.72%)
Mutual labels:  asyncio, crawler, spider, aiohttp
Gain
Web crawling framework based on asyncio.
Stars: ✭ 2,002 (+46.56%)
Mutual labels:  asyncio, crawler, spider, aiohttp
Owllook
owllook-小说搜索引擎
Stars: ✭ 2,163 (+58.35%)
Mutual labels:  asyncio, spider, aiohttp
Ok ip proxy pool
🍿爬虫代理IP池(proxy pool) python🍟一个还ok的IP代理池
Stars: ✭ 196 (-85.65%)
Mutual labels:  crawler, spider, aiohttp
yutto
🧊 一个可爱且任性的 B 站视频下载器(bilili V2)
Stars: ✭ 383 (-71.96%)
Mutual labels:  spider, aiohttp, asyncio
Car Prices
Golang爬虫 爬取汽车之家 二手车产品库
Stars: ✭ 57 (-95.83%)
Mutual labels:  crawler, spider
Beanbun
Beanbun 是用 PHP 编写的多进程网络爬虫框架,具有良好的开放性、高可扩展性,基于 Workerman。
Stars: ✭ 1,096 (-19.77%)
Mutual labels:  crawler, spider
Douyinsdk
抖音 SDK,数据采集,爬虫抓取不是梦
Stars: ✭ 99 (-92.75%)
Mutual labels:  crawler, spider
Raven Aiohttp
An aiohttp transport for raven-python
Stars: ✭ 92 (-93.27%)
Mutual labels:  asyncio, aiohttp
Crawlab
Distributed web crawler admin platform for spiders management regardless of languages and frameworks. 分布式爬虫管理平台,支持任何语言和框架
Stars: ✭ 8,392 (+514.35%)
Mutual labels:  crawler, spider
Arachnid
Powerful web scraping framework for Crystal
Stars: ✭ 68 (-95.02%)
Mutual labels:  crawler, spider
Gopa Abandoned
GOPA, a spider written in Go.(NOTE: this project moved to https://github.com/infinitbyte/gopa )
Stars: ✭ 98 (-92.83%)
Mutual labels:  crawler, spider
Awesome Python Primer
自学入门 Python 优质中文资源索引,包含 书籍 / 文档 / 视频,适用于 爬虫 / Web / 数据分析 / 机器学习 方向
Stars: ✭ 57 (-95.83%)
Mutual labels:  crawler, spider
Photon
Incredibly fast crawler designed for OSINT.
Stars: ✭ 8,332 (+509.96%)
Mutual labels:  crawler, spider
Hproxy
hproxy - Asynchronous IP proxy pool, aims to make getting proxy as convenient as possible.(异步爬虫代理池)
Stars: ✭ 62 (-95.46%)
Mutual labels:  asyncio, crawler
Avbook
AV 电影管理系统, avmoo , javbus , javlibrary 爬虫,线上 AV 影片图书馆,AV 磁力链接数据库,Japanese Adult Video Library,Adult Video Magnet Links - Japanese Adult Video Database
Stars: ✭ 8,133 (+495.39%)
Mutual labels:  crawler, spider
Pyfailsafe
Simple failure handling. Failsafe implementation in Python
Stars: ✭ 70 (-94.88%)
Mutual labels:  asyncio, aiohttp
Python Dependency Injector
Dependency injection framework for Python
Stars: ✭ 1,203 (-11.93%)
Mutual labels:  asyncio, aiohttp
Crawler examples
Some classic web crawler projects.一些经典的爬虫
Stars: ✭ 74 (-94.58%)
Mutual labels:  crawler, spider
Puppeteer Walker
a puppeteer walker 🕷 🕸
Stars: ✭ 78 (-94.29%)
Mutual labels:  crawler, spider

Ruia logo

Ruia

🕸️ Async Python 3.6+ web scraping micro-framework based on asyncio.

⚡ Write less, run faster.

travis codecov PyPI - Python Version PyPI Downloads gitter

Overview

Ruia is an async web scraping micro-framework, written with asyncio and aiohttp, aims to make crawling url as convenient as possible.

Write less, run faster:

Features

  • Easy: Declarative programming
  • Fast: Powered by asyncio
  • Extensible: Middlewares and plugins
  • Powerful: JavaScript support

Installation

# For Linux & Mac
pip install -U ruia[uvloop]

# For Windows
pip install -U ruia

# New features
pip install git+https://github.com/howie6879/ruia

Tutorials

  1. Overview
  2. Installation
  3. Define Data Items
  4. Spider Control
  5. Request & Response
  6. Customize Middleware
  7. Write a Plugins

TODO

  • [x] Cache for debug, to decreasing request limitation, ruia-cache
  • [x] Provide an easy way to debug the script, ruia-shell
  • [ ] Distributed crawling/scraping

Contribution

Ruia is still under developing, feel free to open issues and pull requests:

  • Report or fix bugs
  • Require or publish plugins
  • Write or fix documentation
  • Add test cases

!!!Notice: We use black to format the code

Thanks

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].