All Projects → G-Research → armada

G-Research / armada

Licence: Apache-2.0 license
A multi-cluster batch queuing system for high-throughput workloads on Kubernetes.

Programming Languages

go
31211 projects - #10 most used programming language
C#
18002 projects
typescript
32286 projects
python
139335 projects - #7 most used programming language
Makefile
30231 projects
CSS
56736 projects

Projects that are alternatives of or similar to armada

siembol
An open-source, real-time Security Information & Event Management tool based on big data technologies, providing a scalable, advanced security analytics framework.
Stars: ✭ 153 (-19.47%)
Mutual labels:  gr-oss
spark-dgraph-connector
A connector for Apache Spark and PySpark to Dgraph databases.
Stars: ✭ 36 (-81.05%)
Mutual labels:  gr-oss
thanos-remote-read
Adapter to query Thanos StoreAPI with Prometheus remote read support.
Stars: ✭ 21 (-88.95%)
Mutual labels:  gr-oss
geras
Geras provides a Thanos Store API for the OpenTSDB HTTP API. This makes it possible to query OpenTSDB via PromQL, through Thanos.
Stars: ✭ 35 (-81.58%)
Mutual labels:  gr-oss

CircleCI Go Report Card

Armada

Armada is a multi-Kubernetes cluster batch job scheduler.

Users submit jobs, which are expressed as a Kubernetes pod spec plus Armada-specific metadata, to a central Armada server. Armada stores jobs in user or project-specific queues that are backed by a specialized high-throughput storage layer. Armada manages several Kubernetes worker clusters that queued jobs are dispatched to.

Armada is designed to operate at scale and to address the following issues:

  1. A single Kubernetes cluster can not be scaled indefinitely, and managing very large Kubernetes clusters is challenging. Hence, Armada is a multi-cluster scheduler built on top of several single-cluster schedulers, e.g., the vanilla scheduler or Volcano.
  2. Acheiving very high throughput using the in-cluster storage backend, etcd, is challenging. Hence, queueing and scheduling is performed partly out-of-cluster using a specialized storage layer (i.e., Armada, does not primarily rely on etcd).

Further, Armada is designed primarily for machine learning, AI, and data analytics workloads, and to:

  • Manage compute clusters composed of tens of thousands of nodes in total.
  • Schedule a thousand or more pods per second, on average.
  • Enqueue tens of thousands of jobs over a few seconds.
  • Divide resources fairly between users.
  • Provide visibility for users and admins.
  • Ensure near-constant uptime.

Armada is a CNCF Sandbox project in production at G Research and is actively developed.

For an overview of Armada, see this video.

Documentation

For an overview of the architecture and design of Armada, and instructions for submitting jobs, see:

For instructions of how to setup and develop Armada, see:

For API reference, see:

We expect readers of the documentation to have a basic understanding of Docker and Kubernetes; see, e.g., the following links:

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].