hellokaton / elves

Licence: MIT license

🎊 Design and implement of lightweight crawler framework.

Programming Languages

java

68154 projects - #9 most used programming language

Projects that are alternatives of or similar to elves

163Music

163music spider by scrapy.

Stars: ✭ 60 (-81.37%)

Mutual labels: spider, scrapy

Py Elasticsearch Django

基于python语言开发的千万级别搜索引擎

Stars: ✭ 207 (-35.71%)

Mutual labels: spider, scrapy

Scrapydweb

Web app for Scrapyd cluster management, Scrapy log analysis & visualization, Auto packaging, Timer tasks, Monitor & Alert, and Mobile UI. DEMO 👉

Stars: ✭ 2,385 (+640.68%)

Mutual labels: spider, scrapy

Fp Server

Free proxy server, continuously crawling and providing proxies, based on Tornado and Scrapy. 免费代理服务器，基于Tornado和Scrapy，在本地搭建属于自己的代理池

Stars: ✭ 154 (-52.17%)

Mutual labels: spider, scrapy

scrapy helper

Dynamic configurable crawl (动态可配置化爬虫)

Stars: ✭ 84 (-73.91%)

Mutual labels: spider, scrapy

Scrapingoutsourcing

ScrapingOutsourcing专注分享爬虫代码尽量每周更新一个

Stars: ✭ 164 (-49.07%)

Mutual labels: spider, scrapy

devsearch

A web search engine built with Python which uses TF-IDF and PageRank to sort search results.

Stars: ✭ 52 (-83.85%)

Mutual labels: spider, scrapy

Scrapy demo

all kinds of scrapy demo

Stars: ✭ 128 (-60.25%)

Mutual labels: spider, scrapy

Spider job

招聘网数据爬虫

Stars: ✭ 234 (-27.33%)

Mutual labels: spider, scrapy

Spiderkeeper

admin ui for scrapy/open source scrapinghub

Stars: ✭ 2,562 (+695.65%)

Mutual labels: spider, scrapy

Python3 Spider

Python爬虫实战 - 模拟登陆各大网站包含但不限于：滑块验证、拼多多、美团、百度、bilibili、大众点评、淘宝，如果喜欢请start ❤️

Stars: ✭ 2,129 (+561.18%)

Mutual labels: spider, scrapy

small-spider-project

日常爬虫

Stars: ✭ 14 (-95.65%)

Mutual labels: spider, scrapy

Awesome Web Scraper

A collection of awesome web scaper, crawler.

Stars: ✭ 147 (-54.35%)

Mutual labels: spider, scrapy

Marmot

💐Marmot | Web Crawler/HTTP protocol Download Package 🐭

Stars: ✭ 186 (-42.24%)

Mutual labels: spider, scrapy

Taobaoscrapy

😩Tool For Taobao/Tmall| 儿时玩具已经过时

Stars: ✭ 146 (-54.66%)

Mutual labels: spider, scrapy

Goribot

[Crawler/Scraper for Golang]🕷A lightweight distributed friendly Golang crawler framework.一个轻量的分布式友好的 Golang 爬虫框架。

Stars: ✭ 190 (-40.99%)

Mutual labels: spider, scrapy

Crawlab Lite

Lite version of Crawlab. 轻量版 Crawlab 爬虫管理平台

Stars: ✭ 122 (-62.11%)

Mutual labels: spider, scrapy

Feapder

feapder是一款支持分布式、批次采集、任务防丢、报警丰富的python爬虫框架

Stars: ✭ 110 (-65.84%)

Mutual labels: spider, scrapy

Gerapy

Distributed Crawler Management Framework Based on Scrapy, Scrapyd, Django and Vue.js

Stars: ✭ 2,601 (+707.76%)

Mutual labels: spider, scrapy

Web-Iota

Iota is a web scraper which can find all of the images and links/suburls on a webpage

Stars: ✭ 60 (-81.37%)

Mutual labels: spider, scrapy

View All Similar Projects ➔

Elves

一个轻量级的爬虫框架设计与实现，博文分析。

特性

事件驱动
易于定制
多线程执行
CSS 选择器和 XPath 支持

Maven 坐标

<dependency>
    <groupId>io.github.biezhi</groupId>
    <artifactId>elves</artifactId>
    <version>0.0.2</version>
</dependency>

如果你想在本地运行这个项目源码，请确保你是 Java8 环境并且安装了 lombok 插件。

架构图

调用流程图

快速上手

搭建一个爬虫程序需要进行这么几步操作

编写一个爬虫类继承自 Spider
设置要抓取的 URL 列表
实现 Spider 的 parse 方法
添加 Pipeline 处理 parse 过滤后的数据

举个栗子:

public class DoubanSpider extends Spider {

    public DoubanSpider(String name) {
        super(name);
        this.startUrls(
            "https://movie.douban.com/tag/爱情",
            "https://movie.douban.com/tag/喜剧",
            "https://movie.douban.com/tag/动画",
            "https://movie.douban.com/tag/动作",
            "https://movie.douban.com/tag/史诗",
            "https://movie.douban.com/tag/犯罪");
    }

    @Override
    public void onStart(Config config) {
        this.addPipeline((Pipeline<List<String>>) (item, request) -> log.info("保存到文件: {}", item));
    }

    public Result parse(Response response) {
        Result<List<String>> result   = new Result<>();
        Elements             elements = response.body().css("#content table .pl2 a");

        List<String> titles = elements.stream().map(Element::text).collect(Collectors.toList());
        result.setItem(titles);

        // 获取下一页 URL
        Elements nextEl = response.body().css("#content > div > div.article > div.paginator > span.next > a");
        if (null != nextEl && nextEl.size() > 0) {
            String  nextPageUrl = nextEl.get(0).attr("href");
            Request nextReq     = this.makeRequest(nextPageUrl, this::parse);
            result.addRequest(nextReq);
        }
        return result;
    }

}

public static void main(String[] args) {
    DoubanSpider doubanSpider = new DoubanSpider("豆瓣电影");
    Elves.me(doubanSpider, Config.me()).start();
}

爬虫例子

开源协议

MIT

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

hellokaton / elves

Programming Languages

Labels

Projects that are alternatives of or similar to elves

Elves

特性

架构图

调用流程图

快速上手

爬虫例子

开源协议