All Projects → ZKeeer → Ipproxy

ZKeeer / Ipproxy

爬虫所需要的IP代理,抓取九个网站的代理IP检测/清洗/入库/更新,添加调用接口

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Ipproxy

Spoon
🥄 A package for building specific Proxy Pool for different Sites.
Stars: ✭ 173 (+27.21%)
Mutual labels:  spider, proxies
Pspider
简单易用的Python爬虫框架,QQ交流群:597510560
Stars: ✭ 1,611 (+1084.56%)
Mutual labels:  spider, proxies
Free proxy website
获取免费socks/https/http代理的网站集合
Stars: ✭ 119 (-12.5%)
Mutual labels:  spider
Guwen Spider
一个完整的nodeJs 串行爬虫 抓取3万多个页面。
Stars: ✭ 129 (-5.15%)
Mutual labels:  spider
Proxy
A simple tool for fetching usable proxies from several websites.
Stars: ✭ 124 (-8.82%)
Mutual labels:  proxies
Pddspider
拼多多爬虫,爬取所有商品、评论等信息
Stars: ✭ 121 (-11.03%)
Mutual labels:  spider
Feapder
feapder是一款支持分布式、批次采集、任务防丢、报警丰富的python爬虫框架
Stars: ✭ 110 (-19.12%)
Mutual labels:  spider
Copybook
用爬虫爬取小说网站上所有小说,存储到数据库中,并用爬到的数据构建自己的小说网站
Stars: ✭ 117 (-13.97%)
Mutual labels:  spider
Lambdaattack
Minecraft bot for servers. Currently supports stress testing. More features are planned
Stars: ✭ 133 (-2.21%)
Mutual labels:  proxies
Douban crawler
备份豆瓣计划
Stars: ✭ 124 (-8.82%)
Mutual labels:  spider
Weibo Topic Spider
微博超级话题爬虫,微博词频统计+情感分析+简单分类,新增肺炎超话爬取数据
Stars: ✭ 128 (-5.88%)
Mutual labels:  spider
Apiproject
[https://www.sofineday.com], golang项目开发脚手架,集成最佳实践(gin+gorm+go-redis+mongo+cors+jwt+json日志库zap(支持日志收集到kafka或mongo)+消息队列kafka+微信支付宝支付gopay+api加密+api反向代理+go modules依赖管理+headless爬虫chromedp+makefile+二进制压缩+livereload热加载)
Stars: ✭ 124 (-8.82%)
Mutual labels:  spider
Barbatunnel
A layer that hide, redirect. forward, re-encrypt internet packet to keep VPN, Proxies and other p2p software hidden from Firewall. Free implementation for HTTP-Tunnel, UDP-Tunnel, port forwarding, port redirecting and packet re-encryption that can work in network data-link layer and transport layer
Stars: ✭ 128 (-5.88%)
Mutual labels:  proxies
Wechat article
爬取微信公众号文章
Stars: ✭ 121 (-11.03%)
Mutual labels:  spider
Digger
Digger is a powerful and flexible web crawler implemented by pure golang
Stars: ✭ 130 (-4.41%)
Mutual labels:  spider
Decryptlogin
APIs for loginning some websites by using requests.
Stars: ✭ 1,861 (+1268.38%)
Mutual labels:  spider
Ippsample
IPP sample implementations.
Stars: ✭ 123 (-9.56%)
Mutual labels:  proxies
Yspider
yspider -- 轻量级爬虫系统
Stars: ✭ 125 (-8.09%)
Mutual labels:  spider
Bilibili User Information Spider
B站3亿用户信息爬虫(mid号,昵称,性别,关注,粉丝,等级)
Stars: ✭ 136 (+0%)
Mutual labels:  spider
Mm131
MM131网站图片爬取 🚨
Stars: ✭ 129 (-5.15%)
Mutual labels:  spider

IPProxy

爬虫所需要的IP代理,抓取八个网站的代理IP检测/清洗/入库/更新,添加调用接口



目前只在win10 64位机,python3.5 / ubuntu server 16.04.1 LTS 64位 ,python 3.5下测试通过
不同配置的机器, 请在Config.py中修改最大线程数。详情可以看下面Config.py部分

如何使用

查看demo.py

Util.Refresh():数据库和新的数据需要主动调用此函数更新

Util.Get():调用可获取一条可用的代理,Util.Get()返回的代理:
{'http': 'http://115.159.152.130:81', 'https': 'https://115.159.152.130:81'}
requests可以直接使用:requests.get(url,proxies=Util.Get(),headers={})

Config.py 部分:

设置最大线程数量限制,MaxThreads。如果说,我的电脑配置很低,那么设置16,32慢慢跑;如果对你的电脑贼自信,我电脑牛X啊,i7 志强,又是什么N多G内存,网络带宽贼6,那么你可以设置1024。
如果你还有代理网站可以添加,请添加在Url_Regular字典中。
代理IP网址和对应的正则式,正则式一定要IP和Port分开获取,例如[(192.168.1.1, 80), (192.168.1.1, 90),]
只抓取首页,想要抓取首页以后页面的可以将链接和正则式贴上来,例如,将某网站的1、2、……页的链接和对应的正则式分别添加到Url_Regular字典中。
添加正则式之前请先在 站长工具-正则表达式在线测试 测试通过后添加


数据来源:

http://www.kuaidaili.com/free/
http://www.66ip.cn/
http://www.xicidaili.com/nn/
http://www.ip3366.net/free/
http://www.proxy360.cn/Region/China
http://www.mimiip.com/
http://www.data5u.com/free/index.shtml
http://www.ip181.com/
http://www.kxdaili.com/
欢迎添加你知道的代理网站,大家资源共享

逻辑结构:


欢迎issue和pull,代码渣渣,大神轻喷
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].