All Projects → apache → incubator-liminal

apache / incubator-liminal

Licence: Apache-2.0 license
Apache Liminals goal is to operationalise the machine learning process, allowing data scientists to quickly transition from a successful experiment to an automated pipeline of model training, validation, deployment and inference in production. Liminal provides a Domain Specific Language to build ML workflows on top of Apache Airflow.

Programming Languages

python
139335 projects - #7 most used programming language
Dockerfile
14818 projects
shell
77523 projects

Projects that are alternatives of or similar to incubator-liminal

Feast
Feature Store for Machine Learning
Stars: ✭ 2,576 (+2101.71%)
Mutual labels:  big-data, ml
Mmlspark
Simple and Distributed Machine Learning
Stars: ✭ 2,899 (+2377.78%)
Mutual labels:  big-data, ml
SynapseML
Simple and Distributed Machine Learning
Stars: ✭ 3,355 (+2767.52%)
Mutual labels:  big-data, ml
leetspeek
Open and collaborative content from leet hackers!
Stars: ✭ 11 (-90.6%)
Mutual labels:  big-data, ml
astro
Astro allows rapid and clean development of {Extract, Load, Transform} workflows using Python and SQL, powered by Apache Airflow.
Stars: ✭ 79 (-32.48%)
Mutual labels:  airflow, workflows
Datasciencevm
Tools and Docs on the Azure Data Science Virtual Machine (http://aka.ms/dsvm)
Stars: ✭ 153 (+30.77%)
Mutual labels:  big-data, ml
awesome-AI-kubernetes
❄️ 🐳 Awesome tools and libs for AI, Deep Learning, Machine Learning, Computer Vision, Data Science, Data Analytics and Cognitive Computing that are baked in the oven to be Native on Kubernetes and Docker with Python, R, Scala, Java, C#, Go, Julia, C++ etc
Stars: ✭ 95 (-18.8%)
Mutual labels:  big-data, ml
Books
整理一些书籍 ,包含 C&C++ 、git 、Java、Keras 、Linux 、NLP 、Python 、Scala 、TensorFlow 、大数据 、推荐系统、数据库、数据挖掘 、机器学习 、深度学习 、算法等。
Stars: ✭ 222 (+89.74%)
Mutual labels:  big-data, ml
Dataengineeringproject
Example end to end data engineering project.
Stars: ✭ 82 (-29.91%)
Mutual labels:  airflow, big-data
cli
Polyaxon Core Client & CLI to streamline MLOps
Stars: ✭ 18 (-84.62%)
Mutual labels:  ml, workflows
siembol
An open-source, real-time Security Information & Event Management tool based on big data technologies, providing a scalable, advanced security analytics framework.
Stars: ✭ 153 (+30.77%)
Mutual labels:  big-data
colabs
This repository holds the Google Colabs for the EdX TinyML Specialization
Stars: ✭ 73 (-37.61%)
Mutual labels:  ml
mozilla-sprint-2018
DEPRECATED & Materials Moved: This sprint was to focus on brainstorming for the Joint Roadmap for Open Science Tools.
Stars: ✭ 24 (-79.49%)
Mutual labels:  workflows
CS-7641-assignments
CS 7641 - All the code
Stars: ✭ 135 (+15.38%)
Mutual labels:  ml
hana-ml-samples
This project provides code examples for SAP HANA Predictive and Machine Learning scenarios and is educational content. It covers simple Predictive Analysis Library SQL examples as well as complete SAP HANA design-time “ML scenario”-application content or HANA-ML Python Notebook examples.
Stars: ✭ 67 (-42.74%)
Mutual labels:  ml
kglib
TypeDB-ML is the Machine Learning integrations library for TypeDB
Stars: ✭ 523 (+347.01%)
Mutual labels:  ml
rastercube
rastercube is a python library for big data analysis of georeferenced time series data (e.g. MODIS NDVI)
Stars: ✭ 15 (-87.18%)
Mutual labels:  big-data
zpy
Synthetic data for computer vision. An open source toolkit using Blender and Python.
Stars: ✭ 251 (+114.53%)
Mutual labels:  ml
flutter-vision
iOS and Android app built with Flutter and Firebase. Includes Firebase ML Vision, Firestore, and Storage
Stars: ✭ 45 (-61.54%)
Mutual labels:  ml
guess-js.github.io
The website of Guess.js
Stars: ✭ 16 (-86.32%)
Mutual labels:  ml

Apache Liminal

Apache Liminal is an end-to-end platform for data engineers & scientists, allowing them to build, train and deploy machine learning models in a robust and agile way.

The platform provides the abstractions and declarative capabilities for data extraction & feature engineering followed by model training and serving. Liminal's goal is to operationalize the machine learning process, allowing data scientists to quickly transition from a successful experiment to an automated pipeline of model training, validation, deployment and inference in production, freeing them from engineering and non-functional tasks, and allowing them to focus on machine learning code and artifacts.

Basics

Using simple YAML configuration, create your own schedule data pipelines (a sequence of tasks to perform), application servers, and more.

Getting Started

A simple getting stated guide for Liminal can be found here

Apache Liminal Documentation

Full documentation of Apache Liminal can be found here

High Level Architecture

High level architecture documentation can be found here

Example YAML config file

---
name: MyLiminalStack
owner: Bosco Albert Baracus
volumes:
  - volume: myvol1
    local:
      path: /Users/me/myvol1
images:
  - image: my_python_task_img
    type: python
    source: write_inputs
  - image: my_parallelized_python_task_img
    source: write_outputs
  - image: my_server_image
    type: python_server
    source: myserver
    endpoints:
      - endpoint: /myendpoint1
        module: my_server
        function: myendpoint1func
pipelines:
  - pipeline: my_pipeline
    start_date: 1970-01-01
    timeout_minutes: 45
    schedule: 0 * 1 * *
    metrics:
      namespace: TestNamespace
      backends: [ 'cloudwatch' ]
    tasks:
      - task: my_python_task
        type: python
        description: static input task
        image: my_python_task_img
        env_vars:
          NUM_FILES: 10
          NUM_SPLITS: 3
        mounts:
          - mount: mymount
            volume: myvol1
            path: /mnt/vol1
        cmd: python -u write_inputs.py
      - task: my_parallelized_python_task
        type: python
        description: parallelized python task
        image: my_parallelized_python_task_img
        env_vars:
          FOO: BAR
        executors: 3
        mounts:
          - mount: mymount
            volume: myvol1
            path: /mnt/vol1
        cmd: python -u write_inputs.py
services:
  - service: my_python_server
    description: my python server
    image: my_server_image

Installation

  1. Install this repository (HEAD)
   pip install git+https://github.com/apache/incubator-liminal.git
  1. Optional: set LIMINAL_HOME to path of your choice (if not set, will default to ~/liminal_home)
echo 'export LIMINAL_HOME=</path/to/some/folder>' >> ~/.bash_profile && source ~/.bash_profile

Authoring pipelines

This involves at minimum creating a single file called liminal.yml as in the example above.

If your pipeline requires custom python code to implement tasks, they should be organized like this

If your pipeline introduces imports of external packages which are not already a part of the liminal framework (i.e. you had to pip install them yourself), you need to also provide a requirements.txt in the root of your project.

Testing the pipeline locally

When your pipeline code is ready, you can test it by running it locally on your machine.

  1. Ensure you have The Docker engine running locally, and enable a local Kubernetes cluster:

Kubernetes configured

And allocate it at least 3 CPUs (under "Resources" in the Docker preference UI).

If you want to execute your pipeline on a remote kubernetes cluster, make sure the cluster is configured using:

kubectl config set-context <your remote kubernetes cluster>
  1. Build the docker images used by your pipeline.

In the example pipeline above, you can see that tasks and services have an "image" field - such as "my_static_input_task_image". This means that the task is executed inside a docker container, and the docker container is created from a docker image where various code and libraries are installed.

You can take a look at what the build process looks like, e.g. here

In order for the images to be available for your pipeline, you'll need to build them locally:

cd </path/to/your/liminal/code>
liminal build

You'll see that a number of outputs indicating various docker images built.

  1. Create a kubernetes local volume
    In case your Yaml includes working with volumes please first run the following command:
cd </path/to/your/liminal/code>
liminal create
  1. Deploy the pipeline:
cd </path/to/your/liminal/code>
liminal deploy

Note: after upgrading liminal, it's recommended to issue the command

liminal deploy --clean

This will rebuild the airlfow docker containers from scratch with a fresh version of liminal, ensuring consistency.

  1. Start the server
liminal start
  1. Stop the server
liminal stop
  1. Display the server logs
liminal logs --follow/--tail

Number of lines to show from the end of the log:
liminal logs --tail=10

Follow log output:
liminal logs --follow
  1. Navigate to http://localhost:8080/admin

  2. You should see your pipeline The pipeline is scheduled to run according to the json schedule: 0 * 1 * * field in the .yml file you provided.

  3. To manually activate your pipeline:

  • Click your pipeline and then click "trigger DAG"
  • Click "Graph view" You should see the steps in your pipeline getting executed in "real time" by clicking "Refresh" periodically.

Pipeline activation

Contributing

More information on contributing can be found here

Community

The Liminal community holds a public call every Monday

Running Tests (for contributors)

When doing local development and running Liminal unit-tests, make sure to set LIMINAL_STAND_ALONE_MODE=True

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].