All Projects → openzim → zimfarm

openzim / zimfarm

Licence: GPL-3.0 license
Farm operated by bots to grow and harvest new zim files

Programming Languages

python
139335 projects - #7 most used programming language
Vue
7211 projects
javascript
184084 projects - #8 most used programming language
shell
77523 projects
Dockerfile
14818 projects
HTML
75241 projects
CSS
56736 projects

Projects that are alternatives of or similar to zimfarm

fabex
Block explorer for Hyperledger Fabric
Stars: ✭ 26 (-55.17%)
Mutual labels:  distributed-systems
mindware
An efficient open-source AutoML system for automating machine learning lifecycle, including feature engineering, neural architecture search, and hyper-parameter tuning.
Stars: ✭ 34 (-41.38%)
Mutual labels:  distributed-systems
Pubbie
A high performance pubsub client/server implementation for .NET Core
Stars: ✭ 122 (+110.34%)
Mutual labels:  distributed-systems
little-raft
The lightest distributed consensus library. Run your own replicated state machine! ❤️
Stars: ✭ 316 (+444.83%)
Mutual labels:  distributed-systems
distributed-dev-learning
汇总、整理常用的分布式开发技术,给出demo,方便学习。包括数据分片、共识算法、一致性hash、分布式事务、非侵入的分布式链路追踪实现原理等内容。
Stars: ✭ 39 (-32.76%)
Mutual labels:  distributed-systems
interview
成为一名更好的软件工程师所需要的一切。
Stars: ✭ 14 (-75.86%)
Mutual labels:  distributed-systems
moleculer-java
Java implementation of the Moleculer microservices framework
Stars: ✭ 39 (-32.76%)
Mutual labels:  distributed-systems
matrixone
Hyperconverged cloud-edge native database
Stars: ✭ 1,057 (+1722.41%)
Mutual labels:  distributed-systems
research
research, notes & ideas on various subjects
Stars: ✭ 54 (-6.9%)
Mutual labels:  distributed-systems
huffleraft
Replicated key-value store driven by the raft consensus protocol 🚵
Stars: ✭ 32 (-44.83%)
Mutual labels:  distributed-systems
nsq-0.3.7
nsq注释版基于0.3.7版本
Stars: ✭ 45 (-22.41%)
Mutual labels:  distributed-systems
go-chassis-config
pull and push configs in distributed configuration management service. migrated to go-archaius https://github.com/go-chassis/go-archaius/pull/87
Stars: ✭ 23 (-60.34%)
Mutual labels:  distributed-systems
learning-computer-science
Learning data structures, algorithms, machine learning and various computer science constructs by programming practice from resources around the web.
Stars: ✭ 28 (-51.72%)
Mutual labels:  distributed-systems
dockerfiles
Dockerfiles for various things
Stars: ✭ 37 (-36.21%)
Mutual labels:  docker-images
pat-helland-and-me
Materials related to my talk "Pat Helland and Me"
Stars: ✭ 14 (-75.86%)
Mutual labels:  distributed-systems
docker
docker image of Monica
Stars: ✭ 89 (+53.45%)
Mutual labels:  docker-images
ripple
Simple shared surface streaming application
Stars: ✭ 17 (-70.69%)
Mutual labels:  distributed-systems
Learning-Notes
some notes on learning C++, Go, UNIX, Database and Distributed System
Stars: ✭ 24 (-58.62%)
Mutual labels:  distributed-systems
MIT6.824-2021
4 labs + 2 challenges + 4 docs
Stars: ✭ 594 (+924.14%)
Mutual labels:  distributed-systems
NodeDial
A distributed, key-value NoSQL database 🌌
Stars: ✭ 13 (-77.59%)
Mutual labels:  distributed-systems

ZIM Farm

Build Status CodeFactor License: GPL v3 codecov

The ZIM farm (zimfarm) is a semi-decentralised software solution to build ZIM files efficiently. This means scraping Web contents, packaging them into a ZIM file and uploading the result to an online ZIM files repository.

How does it work?

The Zimfarm platform is a combination of different tools:

dispatcher

The dispatcher is a central database and API that records recipes (metadata of ZIM to produce) and tasks. It includes a scheduler that decides when a ZIM file should be recreated (based on the recipe) and a dispatcher that creates and assigns tasks to workers.

frontend

The frontend, available at farm.openzim.org is a simple consumer of the API.

It is used to create, clone and edit recipes, but also to monitor the evolution of tasks and workers.

Anybody can use it in read-only mode.

workers

Workers are always-running computers which gets assigned ZIM creation tasks by the dispatcher. If you are interested in providing us worker resources, please read these instructions.

A worker is made of two software components:

worker-manager

The manager is responsible for declaring its available resources and configuration and receives tasks assigned to it by the dispatcher. It's a very-low resources container whose job is to spawn task-worker ones.

task-worker

The task-worker is responsible for running a specific task. It's also a very-low resources container but contrary to the manager, one is spawned for each task assigned to the worker (the manager defines the concurrency based on resources).

The task-worker's role is to start and monitor the scraper's container for the task and to spawn uploader containers for both created ZIM files and logs.

uploader

The uploader is instantiated by the task-worker to upload, individually, each created ZIM files, as well as the scraper's container log.

The uploader supports both SCP and SFTP. We are currently using SFTP for all uploads due to a slight speed gain.

Uploader is very fast and convenient (can watch and resumes files) but works only off files at the moment.

receiver

The receiver is a jailed OpenSSH-server that receives scraper logs and ZIM files and pass the latter through a quarantine via the zimcheck tool which eventually either put them aside (invalid ZIM) or move those to the public download server.

scrapers

Scrapers are the tools used to actually convert a scraping request (recorded in a Zimfarm recipe) into one or several ZIM files.

The most important one is the Mediawiki scraper, called mwoffliner but there are many of them for Stack-Exchange, Project Gutenberg, PhET and others.

Scrapers are not part of the Zimfarm. Those are completely independent projects for which the requirements to integrate into the Zimfarm are minimal:

  • Works completely off a docker image
  • Arguments should be set on the command line
  • ZIM output folder should be settable via an argument

How do I request a ZIM file?

ZIM file requests are handled on zim-requests repository.

If there's already a scraper for the website you want to convert to ZIM, someone with editor access to the Zimfarm will create the recipe and in a few days, a ZIM file should be available.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].