All Projects → fengzhizi715 → Proxypool

fengzhizi715 / Proxypool

Licence: apache-2.0
给爬虫使用的代理IP池

Programming Languages

java
68154 projects - #9 most used programming language

Projects that are alternatives of or similar to Proxypool

Nice Knowledge System
📚不积跬步无以至千里,每天进步一点点,Passion,Self-regulation,Love and Share
Stars: ✭ 137 (-73.03%)
Mutual labels:  rxjava2, spring-boot
Spring Reactive Sample
Spring 5 Reactive playground
Stars: ✭ 867 (+70.67%)
Mutual labels:  rxjava2, spring-boot
Grpc By Example Java
A collection of useful/essential gRPC Java Examples
Stars: ✭ 709 (+39.57%)
Mutual labels:  rxjava2, spring-boot
Java Specialagent
Automatic instrumentation for 3rd-party libraries in Java applications with OpenTracing.
Stars: ✭ 156 (-69.29%)
Mutual labels:  rxjava2, spring-boot
Guns Lite
基于spring boot脚手架项目,spring data jpa+Spring Boot2+bootstrap完整的后台管理系统
Stars: ✭ 491 (-3.35%)
Mutual labels:  spring-boot
Cloudreader
🗡️ 云阅:一款基于网易云音乐UI,使用玩Android Api,Retrofit2 + RxJava2 + Room + MVVM-databinding架构开发的Android客户端
Stars: ✭ 4,611 (+807.68%)
Mutual labels:  rxjava2
Celerio Angular Quickstart
Generate an Angular 5 CRUD application from an existing database schema (we provide a sample one)
Stars: ✭ 483 (-4.92%)
Mutual labels:  spring-boot
Parallel
Parallel processing for PHP based on Amp.
Stars: ✭ 478 (-5.91%)
Mutual labels:  parallel
Rxlocation
🗺 [DEPRECATED] Reactive Location APIs Library for Android and RxJava 2
Stars: ✭ 503 (-0.98%)
Mutual labels:  rxjava2
Ultra Runner
🏃⛰ Ultra fast monorepo script runner and build tool
Stars: ✭ 496 (-2.36%)
Mutual labels:  parallel
Sts4
The next generation of tooling for Spring Boot, including support for Cloud Foundry manifest files, Concourse CI pipeline definitions, BOSH deployment manifests, and more... - Available for Eclipse, Visual Studio Code, and Theia
Stars: ✭ 490 (-3.54%)
Mutual labels:  spring-boot
Cerberus
A demonstration of a completely stateless and RESTful token-based authorization system using JSON Web Tokens (JWT) and Spring Security.
Stars: ✭ 482 (-5.12%)
Mutual labels:  spring-boot
Rxretrojsoup
A simple API-like from html website (scrapper) for Android, RxJava2 ready !
Stars: ✭ 492 (-3.15%)
Mutual labels:  rxjava2
Spring Boot Tutorial
100+ Spring Boot Articles, Tutorials, Video tutorials, Projects, Guides, Source code examples etc
Stars: ✭ 482 (-5.12%)
Mutual labels:  spring-boot
Traceur
Easier RxJava2 debugging with better stacktraces
Stars: ✭ 502 (-1.18%)
Mutual labels:  rxjava2
Spring Cloud Zookeeper
Spring Cloud Zookeeper
Stars: ✭ 481 (-5.31%)
Mutual labels:  spring-boot
Hsweb3 Demo
hsweb 3.0版本演示,ui基于miniui,集成hsweb全家桶的web端演示,集成代码生成器
Stars: ✭ 490 (-3.54%)
Mutual labels:  spring-boot
Zally
A minimalistic, simple-to-use API linter
Stars: ✭ 499 (-1.77%)
Mutual labels:  spring-boot
Hellokoding Courses
HelloKoding provides practical coding guides series of Spring Boot, Java, Algorithms, and other topics on software engineering
Stars: ✭ 490 (-3.54%)
Mutual labels:  spring-boot
Spring Cloud Security
Security concerns for distributed applications implemented in Spring
Stars: ✭ 488 (-3.94%)
Mutual labels:  spring-boot

ProxyPool

@Tony沈哲 on weibo Download License

  • ProxyPool的作用:从网络上获取免费可用的IP代理数据。先用爬虫程序抓取代理数据,再检查代理是否可用,可用的话就存放到数据库中。每隔一段时间重复执行这个过程。

  • ProxyPool的技术:Spring Boot+RxJava2.x+MongoDB等,前端:layUI+jquery 等

  • ProxyPool的概述:该项目有两个模块proxypool和proxypool-web,从网络上抓取数据的核心工作由proxypool模块完成,可以在site这个package下新增针对不同网页的解析类。proxypool-web模块是依赖proxypool模块实现的sample模块。

1. 使用方法

  • 单独使用ProxyPool项目中proxypool模块的抓取逻辑,它无任何界面,可用于任何项目,无侵入性

对于Java工程如果使用gradle构建,由于默认没有使用jcenter(),需要在相应module的build.gradle中配置

repositories {
    mavenCentral()
    jcenter()
}

Gradle:

compile 'com.cv4j.proxy:proxypool:1.1.13'
  • clone到本地,运行proxypool-web模块,带界面

准备条件:

1)本地装好MongoDB数据库

2)proxypool-web模块下的application.properties,参考配置如下:

spring.data.mongodb.uri=mongodb://localhost:27017/proxypool
spring.data.mongodb.uri=mongodb://username:[email protected]:27017/proxypool (有账号密码)

3)创建database和collection

database:proxypool
collection:Proxy_Resource、Resource_Plan、Proxy、Job_Log、Sys_Sequence

4)collection中的默认数据 Proxy_Resource:

{
    "_id" : ObjectId("5a48578737a340d5c48a84af"),
    "_class" : "com.cv4j.proxy.web.dto.ProxyResource",
    "resId" : 1,
    "webName" : "西刺国内高匿代理",
    "webUrl" : "http://www.xicidaili.com/nn/1.html",
    "pageCount" : 100,
    "prefix" : "http://www.xicidaili.com/nn/",
    "suffix" : ".html",
    "parser" : "com.cv4j.proxy.site.xicidaili.XicidailiProxyListPageParser",
    "addTime" : NumberLong(1515114009516),
    "modTime" : NumberLong(1515114009516)
}

Sys_Sequence:

{
    "_id" : ObjectId("5a4f2baf87ccb25df57b096b"),
    "colName" : "Proxy_Resource",
    "sequence" : 2
}

5)运行 按照SpringBoot项目的方式运行程序,访问web的地址如下:

  • 解析资源 http://{host}:{port}/proxypool/resourcelist

  • 依赖资源,当前job的要抓取的目标网页 http://{host}:{port}/proxypool/planlist

  • 查看抓取到本地的代理数据 http://{host}:{port}/proxypool/proxylist (用easyUI框架开发的界面) http://{host}:{port}/proxypool/proxys (用layUI框架开发的界面)

  • 获取数据库里当前可用的代理数据 http://{host}:{port}/proxypool/proxys/{count}

6)定时抓取数据的job proxypool-web模块下目前配置了一个每隔三小时自动运行的job:

com.cv4j.proxy.web.job.ScheduleJobs.cronJob()
application.properties: cronJob.schedule = 0 0 0/3 * * ?s

2. 免费的在线演示:

3. 专业的爬虫

笔者开发的专业的爬虫框架: NetDiscovery

4. 联系方式:

Wechat:fengzhizi715

Java与Android技术栈:每周更新推送原创技术文章,欢迎扫描下方的公众号二维码并关注,期待与您的共同成长和进步。

License

Copyright (C) 2017 - present, Tony Shen.

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

   http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].