All Projects → imWildCat → Scylla

imWildCat / Scylla

Licence: apache-2.0
Intelligent proxy pool for Humans™ (Maintainer needed)

Programming Languages

python
139335 projects - #7 most used programming language
typescript
32286 projects
Makefile
30231 projects
Dockerfile
14818 projects
HTML
75241 projects
SCSS
7915 projects
shell
77523 projects

Projects that are alternatives of or similar to Scylla

Bt Btt
磁力網站U3C3介紹以及域名更新
Stars: ✭ 261 (-92.34%)
Mutual labels:  crawler
Scrapy Crawlera
Crawlera middleware for Scrapy
Stars: ✭ 281 (-91.76%)
Mutual labels:  crawler
Ghcrawler
Crawl GitHub APIs and store the discovered orgs, repos, commits, ...
Stars: ✭ 293 (-91.41%)
Mutual labels:  crawler
Arachni
Web Application Security Scanner Framework
Stars: ✭ 2,942 (-13.7%)
Mutual labels:  crawler
Hacker News Digest
📰 A responsive interface of Hacker News with summaries and thumbnails.
Stars: ✭ 278 (-91.85%)
Mutual labels:  crawler
Gospider
golang实现的爬虫框架,使用者只需关心页面规则,提供web管理界面。基于colly开发。
Stars: ✭ 285 (-91.64%)
Mutual labels:  crawler
Tumblr crawler
This is a Multi-thread crawler for Tumblr.
Stars: ✭ 258 (-92.43%)
Mutual labels:  crawler
Toapi
Every web site provides APIs.
Stars: ✭ 3,209 (-5.87%)
Mutual labels:  crawler
Dotnetspider
DotnetSpider, a .NET standard web crawling library. It is lightweight, efficient and fast high-level web crawling & scraping framework
Stars: ✭ 3,233 (-5.16%)
Mutual labels:  crawler
Python Automation Scripts
Simple yet powerful automation stuffs.
Stars: ✭ 292 (-91.43%)
Mutual labels:  crawler
Rcrawler
An R web crawler and scraper
Stars: ✭ 274 (-91.96%)
Mutual labels:  crawler
Gopa
[WIP] GOPA, a spider written in Golang, for Elasticsearch. DEMO: http://index.elasticsearch.cn
Stars: ✭ 277 (-91.87%)
Mutual labels:  crawler
Sasila
一个灵活、友好的爬虫框架
Stars: ✭ 286 (-91.61%)
Mutual labels:  crawler
Line Bot Tutorial
line-bot-tutorial use python flask
Stars: ✭ 267 (-92.17%)
Mutual labels:  crawler
Hquery.php
An extremely fast web scraper that parses megabytes of invalid HTML in a blink of an eye. PHP5.3+, no dependencies.
Stars: ✭ 295 (-91.35%)
Mutual labels:  crawler
Weibo terminator workflow
Update Version of weibo_terminator, This is Workflow Version aim at Get Job Done!
Stars: ✭ 259 (-92.4%)
Mutual labels:  crawler
Crawlertutorial
爬蟲極簡教學(fetch, parse, search, multiprocessing, API)- PTT 為例
Stars: ✭ 282 (-91.73%)
Mutual labels:  crawler
Supercrawler
A web crawler. Supercrawler automatically crawls websites. Define custom handlers to parse content. Obeys robots.txt, rate limits and concurrency limits.
Stars: ✭ 306 (-91.02%)
Mutual labels:  crawler
Go Dork
The fastest dork scanner written in Go.
Stars: ✭ 274 (-91.96%)
Mutual labels:  crawler
Weixin Spider
微信公众号爬虫,公众号历史文章,文章评论,文章阅读及在看数据,可视化web页面,可部署于Windows服务器。基于Python3之flask/mysql/redis/mitmproxy/pywin32等实现,高效微信爬虫,微信公众号爬虫,历史文章,文章评论,数据更新。
Stars: ✭ 287 (-91.58%)
Mutual labels:  crawler

banner_scylla Build Status codecov Documentation Status PyPI version Docker Build Status Donate

An intelligent proxy pool for humanities, only supports Python 3.8+. Key features:

  • Automatic proxy ip crawling and validation
  • Easy-to-use JSON API
  • Simple but beautiful web-based user interface (eg. geographical distribution of proxies)
  • Get started with only 1 command minimally
  • Simple HTTP Forward proxy server
  • Scrapy and requests integration with only 1 line of code minimally
  • Headless browser crawling

对于偏好中文的用户,请阅读 中文文档。For those who prefer to use Chinese, please read the Chinese Documentation.

Get started

Installation

Install with Docker (highly recommended)

docker run -d -p 8899:8899 -p 8081:8081 -v /var/www/scylla:/var/www/scylla --name scylla wildcat/scylla:latest

Install directly via pip

pip install scylla
scylla --help
scylla # Run the crawler and web server for JSON API

Install from source

git clone https://github.com/imWildCat/scylla.git
cd scylla

pip install -r requirements.txt

yarn install
make assets-build

python -m scylla
For Windows user who fails at installing sanic due to uvloop does not support Windows at the moment:
export SANIC_NO_UVLOOP=true
export SANIC_NO_UJSON=true
pip3 install sanic

If this also fails, yoi will need to manual install sanic from source.

Usage

This is an example of running a service locally (localhost), using port 8899.

Note: You might have to wait for 1 to 2 minutes in order to get some proxy ips populated in the database for the first time you use Scylla.

JSON API

Proxy IP List

http://localhost:8899/api/v1/proxies

Optional URL parameters:

Parameters Default value Description
page 1 The page number
limit 20 The number of proxies shown on each page
anonymous any Show anonymous proxies or not. Possible values:true, only anonymous proxies; false, only transparent proxies
https any Show HTTPS proxies or not. Possible values:true, only HTTPS proxies; false, only HTTP proxies
countries None Filter proxies for specific countries. Format example: US, or multi-countries: US,GB

Sample result:

{
    "proxies": [{
        "id": 599,
        "ip": "91.229.222.163",
        "port": 53281,
        "is_valid": true,
        "created_at": 1527590947,
        "updated_at": 1527593751,
        "latency": 23.0,
        "stability": 0.1,
        "is_anonymous": true,
        "is_https": true,
        "attempts": 1,
        "https_attempts": 0,
        "location": "54.0451,-0.8053",
        "organization": "AS57099 Boundless Networks Limited",
        "region": "England",
        "country": "GB",
        "city": "Malton"
    }, {
        "id": 75,
        "ip": "75.151.213.85",
        "port": 8080,
        "is_valid": true,
        "created_at": 1527590676,
        "updated_at": 1527593702,
        "latency": 268.0,
        "stability": 0.3,
        "is_anonymous": true,
        "is_https": true,
        "attempts": 1,
        "https_attempts": 0,
        "location": "32.3706,-90.1755",
        "organization": "AS7922 Comcast Cable Communications, LLC",
        "region": "Mississippi",
        "country": "US",
        "city": "Jackson"
    },
    ...
    ],
    "count": 1025,
    "per_page": 20,
    "page": 1,
    "total_page": 52
}

System Statistics

http://localhost:8899/api/v1/stats

Sample result:

{
    "median": 181.2566407083,
    "valid_count": 1780,
    "total_count": 9528,
    "mean": 174.3290085201
}

HTTP Forward Proxy Server

By default, Scylla will start a HTTP Forward Proxy Server on port 8081. This server will select one proxy updated recently from the database and it will be used for forward proxy. Whenever an HTTP request comes, the proxy server will select a proxy randomly.

Note: HTTPS requests are not supported at present.

The example for curl using this proxy server is shown below:

curl http://api.ipify.org -x http://127.0.0.1:8081

You could also use this feature with requests:

requests.get('http://api.ipify.org', proxies={'http': 'http://127.0.0.1:8081'})

Web UI

Open http://localhost:8899 in your browser to see the Web UI of this project.

Proxy IP List

http://localhost:8899/

Screenshot:

screenshot-proxy-list

Globally Geographical Distribution Map

http://localhost:8899/#/geo

Screenshot:

screenshot-geo-distribution

API Documentation

Please read Module Index.

Roadmap

Please see Projects.

Development and Contribution

git clone https://github.com/imWildCat/scylla.git
cd scylla

pip install -r requirements.txt

npm install # or `yarn install`
make assets-build

Testing

If you wish to run tests locally, the commands are shown below:

pip install -r tests/requirements-test.txt
pytest tests/

You are welcomed to add more test cases to this project, increasing the robustness of this project.

Naming of This Project

Scylla is derived from the name of a group of memory chips in the American TV series, Prison Break. This project was named after this American TV series to pay tribute to it.

Help

How to install Python Scylla on CentOS7

Donation

If you find this project useful, could you please donate some money to it?

No matter how much the money is, Your donation will inspire the author to develop new features continuously! 🎉 Thank you!

The ways for donation are shown below:

PayPal

paypal_donation

Alipay or WeChat Pay

Alipay And WeChat Donation

License

Apache License 2.0. For more details, please read the LICENSE file.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].