spk / Maman
Licence: mit
Rust Web Crawler saving pages on Redis
Stars: ✭ 39
Programming Languages
rust
11053 projects
Projects that are alternatives of or similar to Maman
Crawlab Lite
Lite version of Crawlab. 轻量版 Crawlab 爬虫管理平台
Stars: ✭ 122 (+212.82%)
Mutual labels: crawler, spider, web-crawler
Pspider
简单易用的Python爬虫框架,QQ交流群:597510560
Stars: ✭ 1,611 (+4030.77%)
Mutual labels: crawler, spider, web-crawler
Crawlab
Distributed web crawler admin platform for spiders management regardless of languages and frameworks. 分布式爬虫管理平台,支持任何语言和框架
Stars: ✭ 8,392 (+21417.95%)
Mutual labels: crawler, spider, web-crawler
Spider Flow
新一代爬虫平台,以图形化方式定义爬虫流程,不写代码即可完成爬虫。
Stars: ✭ 365 (+835.9%)
Mutual labels: crawler, spider, web-crawler
flink-crawler
Continuous scalable web crawler built on top of Flink and crawler-commons
Stars: ✭ 48 (+23.08%)
Mutual labels: crawler, spider, web-crawler
Abot
Cross Platform C# web crawler framework built for speed and flexibility. Please star this project! +1.
Stars: ✭ 1,961 (+4928.21%)
Mutual labels: crawler, spider, web-crawler
Spidr
A versatile Ruby web spidering library that can spider a site, multiple domains, certain links or infinitely. Spidr is designed to be fast and easy to use.
Stars: ✭ 656 (+1582.05%)
Mutual labels: crawler, spider, web-crawler
Zhihu Crawler People
A simple distributed crawler for zhihu && data analysis
Stars: ✭ 182 (+366.67%)
Mutual labels: crawler, spider, web-crawler
Gopa
[WIP] GOPA, a spider written in Golang, for Elasticsearch. DEMO: http://index.elasticsearch.cn
Stars: ✭ 277 (+610.26%)
Mutual labels: crawler, spider, web-crawler
Awesome Crawler
A collection of awesome web crawler,spider in different languages
Stars: ✭ 4,793 (+12189.74%)
Mutual labels: crawler, spider, web-crawler
Xxl Crawler
A distributed web crawler framework.(分布式爬虫框架XXL-CRAWLER)
Stars: ✭ 561 (+1338.46%)
Mutual labels: crawler, spider
Xsrfprobe
The Prime Cross Site Request Forgery (CSRF) Audit and Exploitation Toolkit.
Stars: ✭ 532 (+1264.1%)
Mutual labels: crawler, spider
Douyin
API of DouYin for Humans used to Crawl Popular Videos and Musics
Stars: ✭ 580 (+1387.18%)
Mutual labels: crawler, spider
Netdiscovery
NetDiscovery 是一款基于 Vert.x、RxJava 2 等框架实现的通用爬虫框架/中间件。
Stars: ✭ 573 (+1369.23%)
Mutual labels: crawler, spider
Icrawler
A multi-thread crawler framework with many builtin image crawlers provided.
Stars: ✭ 629 (+1512.82%)
Mutual labels: crawler, spider
Haipproxy
💖 High available distributed ip proxy pool, powerd by Scrapy and Redis
Stars: ✭ 4,993 (+12702.56%)
Mutual labels: crawler, spider
Maman
Maman is a Rust Web Crawler saving pages on Redis.
Pages are send to list <MAMAN_ENV>:queue:maman
using
Sidekiq job format
{
"class": "Maman",
"jid": "b4a577edbccf1d805744efa9",
"retry": true,
"created_at": 1461789979, "enqueued_at": 1461789979,
"args": {
"document":"<html><body><a href='#' /><a href='/new' /></html>",
"urls": ["https://example.net/new"],
"headers": {"content-type": "text/html"},
"url": "https://example.net/"
}
}
Dependencies
Installation
With cargo
cargo install maman
just
WithPREFIX=~/.local just install
Usage
maman URL [LIMIT] [MIME_TYPES]
LIMIT
must be an integer or 0
is the default, meaning no limit.
Environment variables
Defaults
- MAMAN_ENV=development
- REDIS_URL="redis://127.0.0.1/"
Others
- RUST_LOG=maman=info
LICENSE
The MIT License
Copyright (c) 2016-2020 Laurent Arnoud [email protected]
Note that the project description data, including the texts, logos, images, and/or trademarks,
for each open source project belongs to its rightful owner.
If you wish to add or remove any projects, please contact us at [email protected].