All Projects → crazyacking → zeekEye

crazyacking / zeekEye

Licence: other
A Fast and Powerful Scraping and Web Crawling Framework.

Programming Languages

java
68154 projects - #9 most used programming language

Projects that are alternatives of or similar to zeekEye

wb wx zh tt
新浪微博,微信,知乎,头条爬虫,支持新浪登录打码获取cookie实现登录
Stars: ✭ 16 (-55.56%)
Mutual labels:  spider, weibo
WeiboCrawler
无cookie版微博爬虫,可以连续爬取一个或多个新浪微博用户信息、用户微博及其微博评论转发。
Stars: ✭ 45 (+25%)
Mutual labels:  weibo, weibo-spider
Weibo Topic Spider
微博超级话题爬虫,微博词频统计+情感分析+简单分类,新增肺炎超话爬取数据
Stars: ✭ 128 (+255.56%)
Mutual labels:  spider, weibo
weibo-scraper
Simple Weibo Scraper
Stars: ✭ 50 (+38.89%)
Mutual labels:  weibo, weibo-spider
Happy Spiders
🔧 🔩 🔨 收集整理了爬虫相关的工具、模拟登陆技术、代理IP、scrapy模板代码等内容。
Stars: ✭ 261 (+625%)
Mutual labels:  spider, weibo
Decryptlogin
APIs for loginning some websites by using requests.
Stars: ✭ 1,861 (+5069.44%)
Mutual labels:  spider, weibo
weibo topic
微博话题关键词,个人微博采集, 微博博文一键删除 selenium获取cookie,requests处理
Stars: ✭ 28 (-22.22%)
Mutual labels:  spider, weibo
documentDownloader
download document from book118 for free
Stars: ✭ 72 (+100%)
Mutual labels:  spider
web-data-extractor
Extracting and parsing structured data with jQuery Selector, XPath or JsonPath from common web format like HTML, XML and JSON.
Stars: ✭ 52 (+44.44%)
Mutual labels:  spider
L-Spider
A DHT Spider allows you to sniff the torrents and magnets.You can download them directly.
Stars: ✭ 64 (+77.78%)
Mutual labels:  spider
zhihu
搜索你的知乎收藏:可以直观地浏览你的所有收藏夹的内容,并进行全文搜索
Stars: ✭ 39 (+8.33%)
Mutual labels:  spider
FofaMap
FofaMap是一款基于Python3开发的跨平台FOFA数据采集器,支持网站图标查询、批量查询和自定义查询FOFA数据,能够根据查询结果自动去重并生成对应的Excel表格。另外春节特别版还可以调用Nuclei对目标进行漏洞扫描,让你在挖洞路上快人一步。
Stars: ✭ 118 (+227.78%)
Mutual labels:  spider
LoginSharePay
LoginSharePay集成QQ,微博,微信的登录和分享,包括微信支付。它配置简单,使用方便,且能够快速运用到应用中,为开发者节省了大量时间。
Stars: ✭ 62 (+72.22%)
Mutual labels:  weibo
txt2img
山寨长微博
Stars: ✭ 13 (-63.89%)
Mutual labels:  weibo
gitextender
Git Extender is a plugin for jet brains products, like IntelliJ IDEA, which offers the option of updating all local branches that track a remote one for all git roots in the project
Stars: ✭ 15 (-58.33%)
Mutual labels:  intellij
V2EX Spider
V2EX爬虫
Stars: ✭ 21 (-41.67%)
Mutual labels:  spider
talospider
talospider - A simple,lightweight scraping micro-framework
Stars: ✭ 57 (+58.33%)
Mutual labels:  spider
NeteaseApi
网易云音乐 api(第三方)
Stars: ✭ 13 (-63.89%)
Mutual labels:  spider
WeiboPictureWorkflow
微博图床 Alfred Workflow,警告:微博修改了登录方式,此 workflow 暂时不能用了,何时修复未定,推荐使用 iPic
Stars: ✭ 23 (-36.11%)
Mutual labels:  weibo
weibo spider
基于Django的的微博转发分析系统
Stars: ✭ 14 (-61.11%)
Mutual labels:  weibo-spider

zeekEye

A Fast and Powerful Scraping and Web Crawling Framework

build module license

zeekEye是一款轻量级垂直爬虫,针对但不限于新浪微博,采用Java语言开发,基于hetrix架构,使用HTTPClient4.0Apache4.0网络包.

特点概述:

  • 数据存储:采用MySQL数据库存储数据,支持多线程并发操作.

  • 功能实现:模拟微博登录、爬取微博用户信息、用户评论、提取数据、建立数据表、数据成份分析。待更新...

  • 待实现:互粉推荐、情感分析、数据聚类.

------欢迎 Fork !


运行

  git clone https://github.com/crazyacking/zeekEye.git
  cd zeekEye
  mvn compile
  mvn exec:java -Dexec.mainClass="SpiderStarter" 
  ...

默认编辑器是IntelliJ IDEA 14.1.4,开发环境为jdk1.7.0,编译执行前先用IntelliJ IDEA把项目源码导出成jar包.

API(如何使用)

project config

  conf/spider.properties文件为整个项目相关参数的配置文件,包括数据库接口地址、并行线程、爬取数量上限的配置等.

weibo-Spider(选项)

"选项"包含以下字段:

  • maxSockets - 线程池中最大并行线程数. 默认为 4.
  • userAgent - 发送到远程服务器的用户代理请求. 默认为 `Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_6_4; en-US)’
  • pool - 一个包含该请求代理的哈希线程池。如果省略,将使用全局设置的maxsockets.

添加路由处理程序

spider.route(主机,模式)

其中参数如下 :

  • hosts - string类型 -- 或是一个数组类型 -- 目标主机的url.

爬虫抓取url队列.

spider.get(url)其中'url'是要抓取的网络url.

拓展 / 更新缓存

目前更新缓存暂提供以下方法:

  • get(url, cb) - 如果url已存在,通过 cb 回调函数返回 url'的body. 否则返回'null'.
    • cb - 固定形式 `function(retval) {...}'
  • getHeaders(url, cb) - 如果url已经存在,返回urlheaders,否则返回null.
    • cb - 固定格式 function(retval) {...}
  • set(url, headers, body) - 设置/保存 urlheadersbody.

设置冗余/日志级别

spider.log(level) - 这儿的level是一个string,可以是"debug", "info", "error"

Source Code

The source code of zeekEye is made available for study purposes only. Neither it, its source code, nor its byte code may be modified and recompiled for public use by anyone except us.

We do accept and encourage private modifications with the intent for said modifications to be added to the official public version.

反馈与建议


感谢阅读这份帮助文档。如果您有好的建议,欢迎反馈。

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].