Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → spk → Maman

spk / Maman

Licence: mit

Rust Web Crawler saving pages on Redis

Programming Languages

11053 projects

Labels

web http crawler spider web-crawler

Projects that are alternatives of or similar to Maman

Lite version of Crawlab. 轻量版 Crawlab 爬虫管理平台

Stars: ✭ 122 (+212.82%)

Mutual labels: crawler, spider, web-crawler

简单易用的Python爬虫框架，QQ交流群：597510560

Stars: ✭ 1,611 (+4030.77%)

Mutual labels: crawler, spider, web-crawler

Distributed web crawler admin platform for spiders management regardless of languages and frameworks. 分布式爬虫管理平台，支持任何语言和框架

Stars: ✭ 8,392 (+21417.95%)

Mutual labels: crawler, spider, web-crawler

新一代爬虫平台，以图形化方式定义爬虫流程，不写代码即可完成爬虫。

Stars: ✭ 365 (+835.9%)

Mutual labels: crawler, spider, web-crawler

Continuous scalable web crawler built on top of Flink and crawler-commons

Stars: ✭ 48 (+23.08%)

Mutual labels: crawler, spider, web-crawler

Cross Platform C# web crawler framework built for speed and flexibility. Please star this project! +1.

Stars: ✭ 1,961 (+4928.21%)

Mutual labels: crawler, spider, web-crawler

A versatile Ruby web spidering library that can spider a site, multiple domains, certain links or infinitely. Spidr is designed to be fast and easy to use.

Stars: ✭ 656 (+1582.05%)

Mutual labels: crawler, spider, web-crawler

Zhihu Crawler People

A simple distributed crawler for zhihu && data analysis

Stars: ✭ 182 (+366.67%)

Mutual labels: crawler, spider, web-crawler

[WIP] GOPA, a spider written in Golang, for Elasticsearch. DEMO: http://index.elasticsearch.cn

Stars: ✭ 277 (+610.26%)

Mutual labels: crawler, spider, web-crawler

Awesome Crawler

A collection of awesome web crawler,spider in different languages

Stars: ✭ 4,793 (+12189.74%)

Mutual labels: crawler, spider, web-crawler

A distributed web crawler framework.（分布式爬虫框架XXL-CRAWLER）

Stars: ✭ 561 (+1338.46%)

Mutual labels: crawler, spider

A Facebook crawler

Stars: ✭ 536 (+1274.36%)

Mutual labels: crawler, spider

The Prime Cross Site Request Forgery (CSRF) Audit and Exploitation Toolkit.

Stars: ✭ 532 (+1264.1%)

Mutual labels: crawler, spider

带你了解一下Golang的市场行情

Stars: ✭ 526 (+1248.72%)

Mutual labels: crawler, spider

API of DouYin for Humans used to Crawl Popular Videos and Musics

Stars: ✭ 580 (+1387.18%)

Mutual labels: crawler, spider

NetDiscovery 是一款基于 Vert.x、RxJava 2 等框架实现的通用爬虫框架/中间件。

Stars: ✭ 573 (+1369.23%)

Mutual labels: crawler, spider

Free Web Scraping Tool with Java

Stars: ✭ 589 (+1410.26%)

Mutual labels: crawler, spider

A multi-thread crawler framework with many builtin image crawlers provided.

Stars: ✭ 629 (+1512.82%)

Mutual labels: crawler, spider

💖 High available distributed ip proxy pool, powerd by Scrapy and Redis

Stars: ✭ 4,993 (+12702.56%)

Mutual labels: crawler, spider

Baiduimagespider

一个超级轻量的百度图片爬虫

Stars: ✭ 591 (+1415.38%)

Mutual labels: crawler, spider

View All Similar Projects ➔

Maman

Maman is a Rust Web Crawler saving pages on Redis.

Pages are send to list <MAMAN_ENV>:queue:maman using Sidekiq job format

{
"class": "Maman",
"jid": "b4a577edbccf1d805744efa9",
"retry": true,
"created_at": 1461789979, "enqueued_at": 1461789979,
"args": {
    "document":"<html><body><a href='#' /><a href='/new' /></html>",
    "urls": ["https://example.net/new"],
    "headers": {"content-type": "text/html"},
    "url": "https://example.net/"
    }
}

Dependencies

Redis

Installation

With cargo

cargo install maman

With just

PREFIX=~/.local just install

Usage

maman URL [LIMIT] [MIME_TYPES]

LIMIT must be an integer or 0 is the default, meaning no limit.

Environment variables

Defaults

MAMAN_ENV=development
REDIS_URL="redis://127.0.0.1/"

Others

RUST_LOG=maman=info

LICENSE

The MIT License

Copyright (c) 2016-2020 Laurent Arnoud [email protected]

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 39

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (0) 🔗