All Projects → zhangyingwei → Cockroach

zhangyingwei / Cockroach

Licence: apache-2.0
又一个 java 内容(pa)获取(chong)工具

Programming Languages

java
68154 projects - #9 most used programming language

Labels

Projects that are alternatives of or similar to Cockroach

Csdn Spider
爬取CSDN上的博客文章
Stars: ✭ 89 (-20.54%)
Mutual labels:  spider
Ruia
Async Python 3.6+ web scraping micro-framework based on asyncio
Stars: ✭ 1,366 (+1119.64%)
Mutual labels:  spider
Crawler Detect
🕷 CrawlerDetect is a PHP class for detecting bots/crawlers/spiders via the user agent
Stars: ✭ 1,549 (+1283.04%)
Mutual labels:  spider
Zhihuspider
知乎用户公开个人信息爬虫, 能够爬取用户关注关系,基于Python、使用代理、多线程
Stars: ✭ 92 (-17.86%)
Mutual labels:  spider
Douyinsdk
抖音 SDK,数据采集,爬虫抓取不是梦
Stars: ✭ 99 (-11.61%)
Mutual labels:  spider
Animesearcher
整合第三方网站的视频和弹幕资源, 为白嫖党提供最佳看番追剧体验
Stars: ✭ 101 (-9.82%)
Mutual labels:  spider
Zhihu Spider
知乎爬虫程序,定时跟踪问题数据,定时推送热门话题
Stars: ✭ 87 (-22.32%)
Mutual labels:  spider
Jobs Search
🕷招聘网站爬虫合集,不定期更新分支
Stars: ✭ 111 (-0.89%)
Mutual labels:  spider
Luoo.spider
🤖 A spider and server for Luoo.qy
Stars: ✭ 99 (-11.61%)
Mutual labels:  spider
Daily scripts
日常小脚本,懒人欢乐多。
Stars: ✭ 105 (-6.25%)
Mutual labels:  spider
Spider
🕷some website spider application base on proxy pool (support http & websocket)
Stars: ✭ 93 (-16.96%)
Mutual labels:  spider
Gopa Abandoned
GOPA, a spider written in Go.(NOTE: this project moved to https://github.com/infinitbyte/gopa )
Stars: ✭ 98 (-12.5%)
Mutual labels:  spider
Nl2lf
The Resources for "Natural Language to Logical Form" ; "自然语言转逻辑形式"研究资料收集。
Stars: ✭ 105 (-6.25%)
Mutual labels:  spider
Ant nest
Simple, clear and fast Web Crawler framework build on python3.6+, powered by asyncio.
Stars: ✭ 90 (-19.64%)
Mutual labels:  spider
Not Your Average Web Crawler
A web crawler (for bug hunting) that gathers more than you can imagine.
Stars: ✭ 107 (-4.46%)
Mutual labels:  spider
Spider
简简单单spider
Stars: ✭ 88 (-21.43%)
Mutual labels:  spider
Pspider
一个简单的分布式爬虫框架
Stars: ✭ 102 (-8.93%)
Mutual labels:  spider
Baiduspider
BaiduSpider,一个爬取百度搜索结果的爬虫,目前支持百度网页搜索,百度图片搜索,百度知道搜索,百度视频搜索,百度资讯搜索,百度文库搜索,百度经验搜索和百度百科搜索。
Stars: ✭ 105 (-6.25%)
Mutual labels:  spider
Hive
lots of spider (很多爬虫)
Stars: ✭ 110 (-1.79%)
Mutual labels:  spider
Skycaiji
蓝天采集器是一款免费的数据采集发布爬虫软件,采用php+mysql开发,可部署在云服务器,几乎能采集所有类型的网页,无缝对接各类CMS建站程序,免登录实时发布数据,全自动无需人工干预!是网页大数据采集软件中完全跨平台的云端爬虫系统
Stars: ✭ 1,514 (+1251.79%)
Mutual labels:  spider

cockroach 爬虫:又一个 java 爬虫实现

License

重构了 cockroach2

简介

cockroach[小强] 当时不知道为啥选了这么个名字,又长又难记,导致编码的过程中因为单词的拼写问题耽误了好长时间。

这个项目算是我的又一个坑吧,算起来挖的坑多了去了,多一个不多少一个不少。

一个小巧、灵活、健壮的内容(pa)获取(chong)框架,暂且叫做框架吧。

简单到什么程度呢,几句话就可以创建一个内容(pa)获取(chong)程序。

依赖部分

<dependency>
  <groupId>com.github.zhangyingwei</groupId>
  <artifactId>cockroach-core</artifactId>
  <version>1.0.6-Beta</version>
</dependency>
<!-- https://mvnrepository.com/artifact/com.github.zhangyingwei/cockroach-annotation -->
<dependency>
    <groupId>com.github.zhangyingwei</groupId>
    <artifactId>cockroach-annotation</artifactId>
    <version>1.0.6-Beta</version>
</dependency>

代码部分:

@EnableAutoConfiguration
public class CockroachApplicationTest {
    public static void main(String[] args) throws Exception {
        TaskQueue queue = TaskQueue.of();
        queue.push(new Task("http://blog.zhangyingwei.com"));
        CockroachApplication.run(CockroachApplicationTest.class,queue);
    }
}

没错,就是这么简单。这个内容(pa)获取(chong)程序就是获(pa)取 http://blog.zhangyingwei.com 这个页面的内容并将结果打印出来。 在结果处理这个问题上,程序中默认使用 PringStore 这个类将所有结果打印出来。

scala & kotlin

作为目前使用的 jvm 系语言几大巨头,scala 与 kotlin 这里基本上对跟 java 的互调做的很好,但是这里还是给几个 demo。

scala

/**
  * Created by zhangyw on 2017/12/25.
  */
class TTTStore extends IStore{
    override def store(taskResponse: TaskResponse): Unit = {
        println("ttt store")
    }
}

object TTTStore{}
/**
  * Created by zhangyw on 2017/12/25.
  */
@EnableAutoConfiguration
@ThreadConfig(num = 1)
@Store(classOf[TTTStore])
object MainApplication {
    def main(args: Array[String]): Unit = {
        println("hello scala spider")
        val queue = TaskQueue.of()
        queue.push(new Task("http://blog.zhangyingwei.com"))
        CockroachApplication.run(MainApplication.getClass(),queue)
    }
}

kotlin

class TTTStore :IStore{
    override fun store(response: TaskResponse) {
        print("ttt store")
    }
}
/**
 * Created by zhangyw on 2017/12/25.
 */
@EnableAutoConfiguration
@ThreadConfig(num = 1)
@Store(TTTStore::class)
object MainApplication {
    @JvmStatic
    fun main(args: Array<String>) {
        print("hello kotlin spider")
        val queue = TaskQueue.of()
        queue.push(Task("http://blog.zhangyingwei.com"))
        CockroachApplication.run(MainApplication::class.java, queue)
    }
}

联系方式

Lisence

Lisenced under Apache 2.0 lisence

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].