cwjokaka / Ok_ip_proxy_pool
Licence: mit
🍿爬虫代理IP池(proxy pool) python🍟一个还ok的IP代理池
Stars: ✭ 196
Programming Languages
Projects that are alternatives of or similar to Ok ip proxy pool
Proxy pool
Python爬虫代理IP池(proxy pool)
Stars: ✭ 13,964 (+7024.49%)
Mutual labels: crawler, spider, flask, proxy, proxypool
Fooproxy
稳健高效的评分制-针对性- IP代理池 + API服务,可以自己插入采集器进行代理IP的爬取,针对你的爬虫的一个或多个目标网站分别生成有效的IP代理数据库,支持MongoDB 4.0 使用 Python3.7(Scored IP proxy pool ,customise proxy data crawler can be added anytime)
Stars: ✭ 195 (-0.51%)
Mutual labels: async, crawler, spider, aiohttp, proxypool
Spoon
🥄 A package for building specific Proxy Pool for different Sites.
Stars: ✭ 173 (-11.73%)
Mutual labels: crawler, spider, proxy, ip, proxypool
Free proxy website
获取免费socks/https/http代理的网站集合
Stars: ✭ 119 (-39.29%)
Mutual labels: crawler, spider, proxy, ip
Proxybroker
Proxy [Finder | Checker | Server]. HTTP(S) & SOCKS 🎭
Stars: ✭ 2,767 (+1311.73%)
Mutual labels: crawler, proxy, proxypool
Gain
Web crawling framework based on asyncio.
Stars: ✭ 2,002 (+921.43%)
Mutual labels: crawler, spider, aiohttp
Ppspider
web spider built by puppeteer, support task-queue and task-scheduling by decorators,support nedb / mongodb, support data visualization; 基于puppeteer的web爬虫框架,提供灵活的任务队列管理调度方案,提供便捷的数据保存方案(nedb/mongodb),提供数据可视化和用户交互的实现方案
Stars: ✭ 237 (+20.92%)
Mutual labels: crawler, spider, proxy
Proxypool
An Efficient ProxyPool with Getter, Tester and Server
Stars: ✭ 3,050 (+1456.12%)
Mutual labels: flask, proxy, proxypool
Weixin Spider
微信公众号爬虫,公众号历史文章,文章评论,文章阅读及在看数据,可视化web页面,可部署于Windows服务器。基于Python3之flask/mysql/redis/mitmproxy/pywin32等实现,高效微信爬虫,微信公众号爬虫,历史文章,文章评论,数据更新。
Stars: ✭ 287 (+46.43%)
Mutual labels: crawler, spider, flask
Ruia
Async Python 3.6+ web scraping micro-framework based on asyncio
Stars: ✭ 1,366 (+596.94%)
Mutual labels: crawler, spider, aiohttp
Awesome Python Primer
自学入门 Python 优质中文资源索引,包含 书籍 / 文档 / 视频,适用于 爬虫 / Web / 数据分析 / 机器学习 方向
Stars: ✭ 57 (-70.92%)
Mutual labels: crawler, spider, flask
Marmot
💐Marmot | Web Crawler/HTTP protocol Download Package 🐭
Stars: ✭ 186 (-5.1%)
Mutual labels: crawler, spider, proxy
Nodespider
[DEPRECATED] Simple, flexible, delightful web crawler/spider package
Stars: ✭ 33 (-83.16%)
Mutual labels: async, crawler, spider
Jianso movie
🎬 电影资源爬虫,电影图片抓取脚本,Flask|Nginx|wsgi
Stars: ✭ 114 (-41.84%)
Mutual labels: crawler, sqlite, flask
Fp Server
Free proxy server, continuously crawling and providing proxies, based on Tornado and Scrapy. 免费代理服务器,基于Tornado和Scrapy,在本地搭建属于自己的代理池
Stars: ✭ 154 (-21.43%)
Mutual labels: spider, proxy, proxypool
Scrapingoutsourcing
ScrapingOutsourcing专注分享爬虫代码 尽量每周更新一个
Stars: ✭ 164 (-16.33%)
Mutual labels: crawler, spider
Linkedin Profile Scraper
🕵️♂️ LinkedIn profile scraper returning structured profile data in JSON. Works in 2020.
Stars: ✭ 171 (-12.76%)
Mutual labels: crawler, spider
ok_ip_proxy_pool😁
一个还ok的IP代理池,先做给自己用着~
运行环境
- python 3.7
特点
- 异步爬取&验证代理🚀
- 用权重加减来衡量代理的可用性(可用性:通过验证则+1,否则-1)🎭
- 使用Sqlite,无需安装数据库环境🛴
- 目前支持的免费代理有: 免费代理/全网/66/西刺/快代理/云代理/IP海
下载&安装
-
源码下载:
git clone [email protected]:cwjokaka/ok_ip_proxy_pool.git
-
安装依赖:
pip install -r requirements.txt
配置文件
# 代理爬虫配置
SPIDER = {
'crawl_interval': 120, # 爬取IP代理的间隔(秒)
'list': [ # 使用的代理爬虫(类名)
'Spider66Ip',
'SpiderQuanWangIp',
'SpiderXiciIp',
'SpiderKuaiDaiLiIp',
'SpiderYunDaiLiIp',
'SpiderIpHaiIp',
'SpiderMianFeiDaiLiIp'
]
}
# 校验器配置
VALIDATOR = {
'test_url': 'http://www.baidu.com', # 可用校验url
'request_timeout': 4, # 校验超时时间
'validate_interval': 60 # 校验间隔(秒)
}
# 匿名性校验配置
ANONYMITY_VALIDATOR = {
'http_test_url': 'http://httpbin.org/get', # 匿名校验url
'https_test_url': 'https://httpbin.org/get',
'request_timeout': 4, # 校验最大超时时间
'interval': 180 # 校验间隔(秒)
}
# 数据库配置
DB = {
'db_name': 'proxy.db',
'table_name': 'proxy'
}
# WEB配置(Flask)
WEB_SERVER = {
'host': '0.0.0.0',
'port': '8080'
}
# 爬虫请求头
HEADERS = {
"X-Requested-With": "XMLHttpRequest",
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 "
"(KHTML, like Gecko) Chrome/56.0.2924.87 Safari/537.36",
}
运行
python main.py
API使用
API | method | description |
---|---|---|
/ | GET | 首页介绍 |
/get | GET | 获取一个代理 |
/get_all | GET | 获取所有代理 |
代理爬虫扩展
如果需要添加自定义代理爬虫,可通过以下步骤添加:
- 进入src/spider/spiders.py
- 添加自己的爬虫类,继承AbsSpider,实现它的do_crawl & get_page_range & get_urls方法,按需重写其他方法。
- 用@spider_register修饰此类
- 在配置文件setting.py的SPIDER['list']中添加此类名
LAST
欢迎Fork|Star|Issue 三连😘
Note that the project description data, including the texts, logos, images, and/or trademarks,
for each open source project belongs to its rightful owner.
If you wish to add or remove any projects, please contact us at [email protected].