All Projects → Jiramew → Spoon

Jiramew / Spoon

🥄 A package for building specific Proxy Pool for different Sites.

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Spoon

Ok ip proxy pool
🍿爬虫代理IP池(proxy pool) python🍟一个还ok的IP代理池
Stars: ✭ 196 (+13.29%)
Mutual labels:  crawler, spider, proxy, ip, proxypool
Proxy pool
Python爬虫代理IP池(proxy pool)
Stars: ✭ 13,964 (+7971.68%)
Mutual labels:  crawler, spider, redis, proxy, proxypool
Haipproxy
💖 High available distributed ip proxy pool, powerd by Scrapy and Redis
Stars: ✭ 4,993 (+2786.13%)
Mutual labels:  crawler, spider, redis, distributed
Proxybroker
Proxy [Finder | Checker | Server]. HTTP(S) & SOCKS 🎭
Stars: ✭ 2,767 (+1499.42%)
Mutual labels:  crawler, proxy, proxies, proxypool
Free proxy website
获取免费socks/https/http代理的网站集合
Stars: ✭ 119 (-31.21%)
Mutual labels:  crawler, spider, proxy, ip
Fooproxy
稳健高效的评分制-针对性- IP代理池 + API服务,可以自己插入采集器进行代理IP的爬取,针对你的爬虫的一个或多个目标网站分别生成有效的IP代理数据库,支持MongoDB 4.0 使用 Python3.7(Scored IP proxy pool ,customise proxy data crawler can be added anytime)
Stars: ✭ 195 (+12.72%)
Mutual labels:  crawler, spider, proxypool
Jlitespider
A lite distributed Java spider framework :-)
Stars: ✭ 151 (-12.72%)
Mutual labels:  crawler, spider, distributed
Ppspider
web spider built by puppeteer, support task-queue and task-scheduling by decorators,support nedb / mongodb, support data visualization; 基于puppeteer的web爬虫框架,提供灵活的任务队列管理调度方案,提供便捷的数据保存方案(nedb/mongodb),提供数据可视化和用户交互的实现方案
Stars: ✭ 237 (+36.99%)
Mutual labels:  crawler, spider, proxy
Scrapy Redis
Redis-based components for Scrapy.
Stars: ✭ 4,998 (+2789.02%)
Mutual labels:  crawler, redis, distributed
Xxl Crawler
A distributed web crawler framework.(分布式爬虫框架XXL-CRAWLER)
Stars: ✭ 561 (+224.28%)
Mutual labels:  crawler, spider, distributed
Fp Server
Free proxy server, continuously crawling and providing proxies, based on Tornado and Scrapy. 免费代理服务器,基于Tornado和Scrapy,在本地搭建属于自己的代理池
Stars: ✭ 154 (-10.98%)
Mutual labels:  spider, proxy, proxypool
Marmot
💐Marmot | Web Crawler/HTTP protocol Download Package 🐭
Stars: ✭ 186 (+7.51%)
Mutual labels:  crawler, spider, proxy
X Proxies
Usable ip proxies, crawling from some proxy websites.
Stars: ✭ 53 (-69.36%)
Mutual labels:  redis, ip, proxies
Zi5book
book.zi5.me全站kindle电子书籍爬取,按照作者书籍名分类,每本书有mobi和equb两种格式,采用分布式进行全站爬取
Stars: ✭ 191 (+10.4%)
Mutual labels:  spider, redis, distributed
Lizard
💐 Full Amazon Automatic Download
Stars: ✭ 41 (-76.3%)
Mutual labels:  crawler, spider, distributed
Baiducrawler
Sample of using proxies to crawl baidu search results.
Stars: ✭ 116 (-32.95%)
Mutual labels:  crawler, proxy, proxies
Proxypool
An Efficient ProxyPool with Getter, Tester and Server
Stars: ✭ 3,050 (+1663.01%)
Mutual labels:  redis, proxy, proxypool
Netdiscovery
NetDiscovery 是一款基于 Vert.x、RxJava 2 等框架实现的通用爬虫框架/中间件。
Stars: ✭ 573 (+231.21%)
Mutual labels:  crawler, spider, redis
Pspider
简单易用的Python爬虫框架,QQ交流群:597510560
Stars: ✭ 1,611 (+831.21%)
Mutual labels:  crawler, spider, proxies
Go spider
[爬虫框架 (golang)] An awesome Go concurrent Crawler(spider) framework. The crawler is flexible and modular. It can be expanded to an Individualized crawler easily or you can use the default crawl components only.
Stars: ✭ 1,745 (+908.67%)
Mutual labels:  crawler, spider

Spoon - A package for building specific Proxy Pool for different Sites.

Spoon is a library for building Distributed Proxy Pool for each different sites as you assign.
Only running on python 3.

Install

Simply run: pip install spoonproxy or clone the repo and set it into your PYTHONPATH.

Run

Spoon-server

Please make sure the Redis is running. Default configuration is "host:localhost, port:6379". You can also modify the Redis connection.
Like example.py in spoon_server/example,
You can assign many different proxy providers.

from spoon_server.proxy.fetcher import Fetcher
from spoon_server.main.proxy_pipe import ProxyPipe
from spoon_server.proxy.kuai_provider import KuaiProvider
from spoon_server.proxy.xici_provider import XiciProvider
from spoon_server.database.redis_config import RedisConfig
from spoon_server.main.checker import CheckerBaidu

def main_run():
    redis = RedisConfig("127.0.0.1", 21009)
    p1 = ProxyPipe(url_prefix="https://www.baidu.com",
                   fetcher=Fetcher(use_default=False),
                   database=redis,
                   checker=CheckerBaidu()).set_fetcher([KuaiProvider()]).add_fetcher([XiciProvider()])
    p1.start()


if __name__ == '__main__':
    main_run()

Also, with different checker, you can validate the result precisely.

class CheckerBaidu(Checker):
    def checker_func(self, html=None):
        if isinstance(html, bytes):
            html = html.decode('utf-8')
        if re.search(r".*百度一下,你就知道.*", html):
            return True
        else:
            return False

Also, as the code shows in spoon_server/example/example_multi.py, by using multiprocess, you can get many queues to fetching & validating the proxies.
You can also assign different Providers for different url.
The default proxy providers are shown below, you can write your own providers.

name description
WebProvider Get proxy from http api
FileProvider Get proxy from file
GouProvider http://www.goubanjia.com
KuaiProvider http://www.kuaidaili.com
SixProvider http://m.66ip.cn
UsProvider https://www.us-proxy.org
WuyouProvider http://www.data5u.com
XiciProvider http://www.xicidaili.com
IP181Provider http://www.ip181.com
XunProvider http://www.xdaili.cn
PlpProvider https://list.proxylistplus.com
IP3366Provider http://www.ip3366.net
BusyProvider https://proxy.coderbusy.com
NianProvider http://www.nianshao.me
PdbProvider http://proxydb.net
ZdayeProvider http://ip.zdaye.com
YaoProvider http://www.httpsdaili.com/
FeilongProvider http://www.feilongip.com/
IP31Provider https://31f.cn/http-proxy/
XiaohexiaProvider http://www.xiaohexia.cn/
CoolProvider https://www.cool-proxy.net/
NNtimeProvider http://nntime.com/
ListendeProvider https://www.proxy-listen.de/
IhuanProvider https://ip.ihuan.me/
IphaiProvider http://www.iphai.com/
MimvpProvider(@NeedCaptcha) https://proxy.mimvp.com/
GPProvider(@NeedProxy if you're in China) http://www.gatherproxy.com
FPLProvider(@NeedProxy if you're in China) https://free-proxy-list.net
SSLProvider(@NeedProxy if you're in China) https://www.sslproxies.org
NordProvider(@NeedProxy if you're in China) https://nordvpn.com
PremProvider(@NeedProxy if you're in China) https://premproxy.com
YouProvider(@Deprecated) http://www.youdaili.net

Spoon-web

A Simple django web api demo. You could use any web server and write your own api.
Gently run python manager.py runserver **.**.**.**:*****
The simple apis include:

name description
http://127.0.0.1:21010/api/v1/get_keys Get all keys from redis
http://127.0.0.1:21010/api/v1/fetchone_from?target=www.google.com&filter=65 Get one useful proxy.
target: the specific url
filter: successful-revalidate times
http://127.0.0.1:21010/api/v1/fetchall_from?target=www.google.com&filter=65 Get all useful proxies.
http://127.0.0.1:21010/api/v1/fetch_hundred_recent?target=www.baidu.com&filter=5 Get recently joined full-scored proxies.
target: the specific url
filter: time in seconds
http://127.0.0.1:21010/api/v1/fetch_stale?num=100 Get recently proxies without check.
num: the specific number of proxies you want
http://127.0.0.1:21010/api/v1/fetch_recent?target=www.baidu.com Get recently proxies that successfully validated.
target: the specific url
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].