All Projects → kube-HPC → Hkube

kube-HPC / Hkube

Licence: mit
🐟 High Performance Computing over Kubernetes - Core Repo 🎣

Programming Languages

javascript
184084 projects - #8 most used programming language

Projects that are alternatives of or similar to Hkube

Venona
Codefresh runtime-environment agent
Stars: ✭ 31 (-85.51%)
Mutual labels:  pipeline, cluster
Flowr
Robust and efficient workflows using a simple language agnostic approach
Stars: ✭ 73 (-65.89%)
Mutual labels:  pipeline, cluster
Graphview
Flutter GraphView is used to display data in graph structures. It can display Tree layout, Directed and Layered graph. Useful for Family Tree, Hierarchy View.
Stars: ✭ 152 (-28.97%)
Mutual labels:  algorithm, cluster
Jenkinsdocs
Jenkins实践文档 最新站点地址: http://www.idevops.site
Stars: ✭ 200 (-6.54%)
Mutual labels:  pipeline
Java String Similarity
Implementation of various string similarity and distance algorithms: Levenshtein, Jaro-winkler, n-Gram, Q-Gram, Jaccard index, Longest Common Subsequence edit distance, cosine similarity ...
Stars: ✭ 2,403 (+1022.9%)
Mutual labels:  algorithm
Shifu
An end-to-end machine learning and data mining framework on Hadoop
Stars: ✭ 207 (-3.27%)
Mutual labels:  pipeline
Cpp Timsort
A C++ implementation of timsort
Stars: ✭ 211 (-1.4%)
Mutual labels:  algorithm
Mathmodel
研究生数学建模,本科生数学建模、数学建模竞赛优秀论文,数学建模算法,LaTeX论文模板,算法思维导图,参考书籍,Matlab软件教程,PPT
Stars: ✭ 3,834 (+1691.59%)
Mutual labels:  algorithm
Oq Engine
OpenQuake's Engine for Seismic Hazard and Risk Analysis
Stars: ✭ 207 (-3.27%)
Mutual labels:  cluster
Bezier
Algorithm to draw smooth bezier curves through a set of points
Stars: ✭ 207 (-3.27%)
Mutual labels:  algorithm
Algorithms Notes
《算法(第4版)》笔记及代码 | 《Algorithms(Fourth Edition)》notes & code
Stars: ✭ 206 (-3.74%)
Mutual labels:  algorithm
Lightautoml
LAMA - automatic model creation framework
Stars: ✭ 196 (-8.41%)
Mutual labels:  pipeline
Tiup
A component manager for TiDB
Stars: ✭ 207 (-3.27%)
Mutual labels:  cluster
Cluster Lifecycle Manager
Cluster Lifecycle Manager (CLM) to provision and update multiple Kubernetes clusters
Stars: ✭ 200 (-6.54%)
Mutual labels:  cluster
Bulk Writer
Provides guidance for fast ETL jobs, an IDataReader implementation for SqlBulkCopy (or the MySql or Oracle equivalents) that wraps an IEnumerable, and libraries for mapping entites to table columns.
Stars: ✭ 210 (-1.87%)
Mutual labels:  pipeline
K3d
Little helper to run Rancher Lab's k3s in Docker
Stars: ✭ 3,090 (+1343.93%)
Mutual labels:  cluster
Flowcraft
FlowCraft: a component-based pipeline composer for omics analysis using Nextflow. 🐳📦
Stars: ✭ 208 (-2.8%)
Mutual labels:  pipeline
Interviewroom
Contains all important data structure and algorithms problems asked in interviews
Stars: ✭ 207 (-3.27%)
Mutual labels:  algorithm
Whispers
Identify hardcoded secrets and dangerous behaviours
Stars: ✭ 66 (-69.16%)
Mutual labels:  pipeline
Swiftlcs
Swift implementation of the longest common subsequence (LCS) algorithm.
Stars: ✭ 207 (-3.27%)
Mutual labels:  algorithm

HKube

HKube is a cloud-native open source framework to run distributed pipeline of algorithms built on Kubernetes.

HKube optimally utilizing pipeline's resources, based on user priorities and heuristics.

Features

  • Distributed pipeline of algorithms

    • Receives DAG graph as input and automatically parallelizes your algorithms over the cluster.
    • Manages the complications of distributed processing, keep your code simple (even single threaded).
  • Language Agnostic - As a container based framework designed to facilitate the use of any language for your algorithm.

  • Batch Algorithms - Run algorithms as a batch - instances of the same algorithm in order to accelerate the running time.

  • Optimize Hardware Utilization

    • Containers automatically placed based on their resource requirements and other constraints, while not sacrificing availability.
    • Mixes critical and best-effort workloads in order to drive up utilization and save resources.
    • Efficient execution and clustering by heuristics which uses pipeline and algorithm metrics with combination of user requirements.
  • Build API - Just upload your code, you don't have to worry about building containers and integrating them with HKube API.

  • Cluster Debugging

    • Debug a part of a pipeline based on previous results.
    • Debug a single algorithm on your IDE, while the rest of the algorithms running in the cluster.
  • Jupyter Integration - Scale your jupyter running tasks Jupyter with hkube.

User Guide

Installation

Dependencies

HKube runs on top of Kubernetes so in order to run HKube we have to install it's prerequisites.

Helm

  1. Add the HKube Helm repository to helm:

    helm repo add hkube http://hkube.io/helm/
    
  2. Configure a docker registry for builds
    Create a values.yaml file for custom helm values

build_secret:
# pull secret is only needed if docker hub is not accessible
  pull:
    registry: ''
    namespace: ''
    username: ''
    password: ''
# enter your docker hub / other registry credentials
  push:
    registry: '' # can be left empty for docker hub
    namespace: '' # registry namespace - usually your username
    username: ''
    password: ''
  1. Install HKube chart

    helm install hkube/hkube  -f ./values.yaml --name my-release
    

This command installs HKube in a minimal configuration for development. Check production-deployment.

APIs

There are three ways to communicate with HKube: Dashboard, REST API and CLI.

UI Dashboard

Dashboard is a web-based HKube user interface. Dashboard supports every functionality HKube has to offer.

ui

REST API

HKube exposes it's functionality with REST API.

CLI

hkubectl is HKube command line tool.

hkubectl [type] [command] [name]

# More information
hkubectl --help

Download hkubectl latest version.

curl -Lo hkubectl https://github.com/kube-HPC/hkubectl/releases/latest/download/hkubectl-linux \
&& chmod +x hkubectl \
&& sudo mv hkubectl /usr/local/bin/

For mac replace with hkubectl-macos
For Windows download hkubectl-win.exe

Config hkubectl with your running Kubernetes.

# Config
hkubectl config set endpoint ${KUBERNETES-MASTER-IP}

hkubectl config set rejectUnauthorized false

Make sure kubectl is configured to your cluster.

HKube requires that certain pods will run in privileged security permissions, consult your Kubernetes installation to see how it's done.

API Usage Example

The Problem

We want to solve the next problem with given input and a desired output:

  • Input: Two numbers N, k.
  • Desired Output: A number M so:

For example: N=5, k=2 will result:

Solution

We will solve the problem by running a distributed pipeline of three algorithms: Range, Multiply and Reduce.

Range Algorithm

Creates an array of length N.

 N = 5
 5 -> [1,2,3,4,5]

Multiply Algorithm

Multiples the received data from Range Algorithm by k.

k = 2
[1,2,3,4,5] * (2) -> [2,4,6,8,10]

Reduce Algorithm

The algorithm will wait until all the instances of the Multiply Algorithm will finish then it will summarize the received data together .

[2,4,6,8,10] -> 30

Building a Pipeline

We will implement the algorithms using various languages and construct a pipeline from them using HKube.

PipelineExample

Pipeline Descriptor

The pipeline descriptor is a JSON object which describes and defines the links between the nodes by defining the dependencies between them.

{
  "name": "numbers",
  "nodes": [
    {
      "nodeName": "Range",
      "algorithmName": "range",
      "input": ["@flowInput.data"]
    },
    {
      "nodeName": "Multiply",
      "algorithmName": "multiply",
      "input": ["#@Range", "@flowInput.mul"]
    },
    {
      "nodeName": "Reduce",
      "algorithmName": "reduce",
      "input": ["@Multiply"]
    }
  ],
  "flowInput": {
    "data": 5,
    "mul": 2
  }
}

Note the flowInput: data = N = 5, mul = k = 2

Node dependencies

HKube allows special signs in nodes input for defining the pipeline execution flow.

In our case we used:

(@)  —  References input parameters for the algorithm.

(#)  —  Execute nodes in parallel and reduce the results into single node.

(#@) — By combining # and @ we can create a batch processing on node results.

JSON

JSON Breakdown

We created a pipeline name numbers.

    "name":"numbers"

The pipeline is defined by three nodes.

"nodes":[
    {
            "nodeName":"Range",
            "algorithmName":"range",
            "input":["@flowInput.data"]
        },
        {
            "nodeName":"Multiply",
            "algorithmName":"multiply",
            "input":["#@Range","@flowInput.mul"]
        },
        {
            "nodeName":"Reduce",
            "algorithmName":"reduce",
            "input":["@Multiply"]
        },
    ]

In HKube, the linkage between the nodes is done by defining the algorithm inputs. Multiply will be run after Range algorithm because of the input dependency between them.

Keep in mind that HKube will transport the results between the nodes automatically for doing it HKube currently support two different types of transportation layers object storage and files system.

Group 4 (3)

The flowInput is the place to define the Pipeline inputs:

"flowInput":{
    "data":5,
    "mul":2
}

In our case we used Numeric Type but it can be any JSON type (Object, String etc).

Advance Options

There are more features that can be defined from the descriptor file.

"webhooks": {
    "progress": "http://my-url-to-progress",
      "result": "http://my-url-to-result"
    },
  "priority": 3,
  "triggers":
      {
      "pipelines":[],
        "cron":{}
      }
  "options":{
      "batchTolerance": 80,
      "concurrentPipelines": 2,
      "ttl": 3600,
      "progressVerbosityLevel":"info"
  }
  • webhooks - There are two types of webhooks, progress and result.

    You can also fetch the same data from the REST API.

    • progress:{jobId}/api/v1/exec/status
    • result: {jobId}/api/v1/exec/results
  • priority - HKube support five level of priorities, five is the highest. Those priorities with the metrics that HKube gathered helps to decide which algorithms should be run first.

  • triggers - There are two types of triggers that HKube currently support cron and pipeline.

    • cron - HKube can schedule your stored pipelines based on cron pattern.

      Check cron editor in order to construct your cron.

    • pipeline - You can set your pipelines to run each time other pipeline/s has been finished successfully .
  • options - There are other more options that can be configured:

    • Batch Tolerance - The Batch Tolerance is a threshold setting that allow you to control in which percent from the batch processing the entire pipeline should be fail.
    • Concurrency - Pipeline Concurrency define the number of pipelines that are allowed to be running at the same time.
    • TTL - Time to live (TTL) limits the lifetime of pipeline in the cluster. stop will be sent if pipeline running for more than ttl (in seconds).
    • Verbosity Level - The Verbosity Level is a setting that allows to control what type of progress events the client will notified about. The severity levels are ascending from least important to most important: trace debug info warn error critical.

Algorithm

The pipeline is built from algorithms which containerized with docker.

There are two ways to integrate your algorithm into HKube:

  • Seamless Integration - As written above HKube can build automatically your docker with the HKube's websocket wrapper.
  • Code writing - In order to add algorithm manually to HKube you need to wrap your algorithm with HKube. HKube already has a wrappers for python,javaScript, java and .NET core.

Implementing the Algorithms

We will create the algorithms to solve the problem, HKube currently support two languages for auto build Python and JavaScript.

Important notes:

  • Installing dependencies During the container build, HKube will search for the requirement.txt file and will try to install the packages from the pip package manager.
  • Advanced Operations HKube can build the algorithm only by implementing start function but for advanced operation such as one time initiation and gracefully stopping you have to implement two other functions init and stop.
Range (Python)
def start(args):
    print('algorithm: range start')
    input = args['input'][0]
    array = list(range(input))
    return array

The start method calls with the args parameter, the inputs to the algorithm will appear in the input property.

The input property is an array, so you would like to take the first argument ("input":["@flowInput.data"] as you can see we placed data as the first argument)

Multiply (Python)
def start(args):
    print('algorithm: multiply start')
    input = args['input'][0]
    mul = args['input'][1]
    return input * mul

We sent two parameters "input":["#@Range","@flowInput.mul"], the first one is the output from Range that sent an array of numbers, but because we using batch sign (#) each multiply algorithm will get one item from the array, the second parameter we passing is the mul parameter from the flowInput object.

Reduce (Javascript)
module.exports.start = args => {
  console.log('algorithm: reduce start');
  const input = args.input[0];
  return input.reduce((acc, cur) => acc + cur);
};

We placed ["@Multiply"] in the input parameter, HKube will collect all the data from the multiply algorithm and will sent it as an array in the first input parameter.

Integrate Algorithms

After we created the algorithms, we will integrate them with the CLI.

Can be done also through the Dashboard.

Create a yaml (or JSON) that defines the algorithm:

# range.yml
name: range
env: python # can be python or javascript
resources:
  cpu: 0.5
  gpu: 1 # if not needed just remove it from the file
  mem: 512Mi

code:
  path: /path-to-algorithm/range.tar.gz
  entryPoint: main.py

Add it with the CLI:

hkubectl algorithm apply --f range.yml

Keep in mind we have to do it for each one of the algorithms.

Integrate Pipeline

Create a yaml (or JSON) that defines the pipeline:

# number.yml
name: numbers
nodes:
  - nodeName: Range
    algorithmName: range
    input:
      - '@flowInput.data'
  - nodeName: Multiply
    algorithmName: multiply
    input:
      - '#@Range'
      - '@flowInput.mul'
  - nodeName: Reduce
    algorithmName: reduce
    input:
      - '@Multiply'
flowInput:
  data: 5
  mul: 2

Raw - Ad-hoc pipeline running

For running our pipeline as raw-data:

hkubectl exec raw --f numbers.yml

Stored - Storing the pipeline descriptor for next running

First we store the pipeline:

hkubectl pipeline store --f numbers.yml

Then you can execute it (if flowInput available)

# flowInput stored
hkubectl exec stored numbers

For executing the pipeline with other input, create yaml (or JSON) file with flowInput key:

# otherFlowInput.yml
flowInput:
  data: 500
  mul: 200

Then you can executed it by pipeline name:

# Executes pipeline "numbers" with data=500, mul=200
hkubectl exec stored numbers --f otherFlowInput.yml

Monitor Pipeline Results

As a result of executing pipeline, HKube will return a jobId.

# Job ID returned after execution.
result:
  jobId: numbers:a56c97cb-5d62-4990-817c-04a8b0448b7c.numbers

This is a unique identifier helps to query this specific pipeline execution.

  • Stop pipeline execution: hkubectl exec stop <jobId> [reason]

  • Track pipeline status: hkubectl exec status <jobId>

  • Track pipeline result: hkubectl exec result <jobId>

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].