All Projects β†’ maiot-io β†’ Zenml

maiot-io / Zenml

Licence: apache-2.0
ZenML πŸ™: MLOps framework to create reproducible ML pipelines for production machine learning.

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Zenml

Orchest
A new kind of IDE for Data Science.
Stars: ✭ 694 (-31.89%)
Mutual labels:  data-science, pipelines
Airbyte
Airbyte is an open-source EL(T) platform that helps you replicate your data in your warehouses, lakes and databases.
Stars: ✭ 4,919 (+382.73%)
Mutual labels:  data-science, pipelines
Iceci
IceCI is a continuous integration system designed for Kubernetes from the ground up.
Stars: ✭ 29 (-97.15%)
Mutual labels:  pipelines, devops-tools
Polyaxon
Machine Learning Platform for Kubernetes (MLOps tools for experimentation and automation)
Stars: ✭ 2,966 (+191.07%)
Mutual labels:  data-science, pipelines
Mlj.jl
A Julia machine learning framework
Stars: ✭ 982 (-3.63%)
Mutual labels:  data-science, pipelines
Cv Pretrained Model
A collection of computer vision pre-trained models.
Stars: ✭ 995 (-2.36%)
Mutual labels:  data-science
Machine Learning From Scratch
Succinct Machine Learning algorithm implementations from scratch in Python, solving real-world problems (Notebooks and Book). Examples of Logistic Regression, Linear Regression, Decision Trees, K-means clustering, Sentiment Analysis, Recommender Systems, Neural Networks and Reinforcement Learning.
Stars: ✭ 42 (-95.88%)
Mutual labels:  data-science
Data Polygamy
Data Polygamy is a topology-based framework that allows users to query for statistically significant relationships between spatio-temporal data sets.
Stars: ✭ 39 (-96.17%)
Mutual labels:  data-science
Ugfraud
An Unsupervised Graph-based Toolbox for Fraud Detection
Stars: ✭ 38 (-96.27%)
Mutual labels:  data-science
Tadw
An implementation of "Network Representation Learning with Rich Text Information" (IJCAI '15).
Stars: ✭ 43 (-95.78%)
Mutual labels:  data-science
Rcongresso
Pacote R para acessar dados do congresso nacional.
Stars: ✭ 42 (-95.88%)
Mutual labels:  data-science
Mathematicavsr
Example projects, code, and documents for comparing Mathematica with R.
Stars: ✭ 41 (-95.98%)
Mutual labels:  data-science
Pixiedust
Python Helper library for Jupyter Notebooks
Stars: ✭ 998 (-2.06%)
Mutual labels:  data-science
Susi
SuSi: Python package for unsupervised, supervised and semi-supervised self-organizing maps (SOM)
Stars: ✭ 42 (-95.88%)
Mutual labels:  data-science
Scala Plotly Client
Visualise your data from Scala using Plotly
Stars: ✭ 39 (-96.17%)
Mutual labels:  data-science
Sklearn Porter
Transpile trained scikit-learn estimators to C, Java, JavaScript and others.
Stars: ✭ 1,014 (-0.49%)
Mutual labels:  data-science
Chef Plugin
This is jenkins plugin to run chef-client on remote host
Stars: ✭ 38 (-96.27%)
Mutual labels:  devops-tools
Ds Take Home
My solution to the book A Collection of Data Science Take-Home Challenges
Stars: ✭ 1,004 (-1.47%)
Mutual labels:  data-science
Computervision Recipes
Best Practices, code samples, and documentation for Computer Vision.
Stars: ✭ 8,214 (+706.08%)
Mutual labels:  data-science
Attaca
Robust, distributed version control for large files.
Stars: ✭ 41 (-95.98%)
Mutual labels:  data-science

ZenML.io β€’ docs.ZenML.io β€’ Quickstart β€’ GitHub Community β€’ Join Slack

PyPI - ZenML Version PyPI - Python Version PyPI Status GitHub

Join our Slack Slack Community and become part of the ZenML family
Give us a Slack GitHub star to show your love!

Why?

ZenML is built for ML practitioners who are ramping up their ML workflows towards production. We built ZenML because we could not find an easy framework that translates the patterns observed in the research phase with Jupyter notebooks into a production-ready ML environment. Here is what's hard to replicate in production:

  • It's hard to version data, code, configuration, and models.
  • It's difficult to reproduce experiments across environments.
  • There is no gold-standard to organize ML code and manage technical debt as complexity grows.
  • It's a struggle to establish a reliable link between training and deployment.
  • It's arduous to track metadata and artifacts that are produced.

ZenML is not here to replace the great tools that solve the individual problems above. Rather, it uses them as integrations to expose a coherent, simple path to getting any ML model in production.

What is ZenML?

ZenML is an extensible, open-source MLOps framework for creating production-ready Machine Learning pipelines - in a simple way.

A user of ZenML is asked to break down their ML development into individual Steps, each representing an individual task in the ML development process. A sequence of steps put together is a Pipeline. Each pipeline contains a Datasource, which represents a snapshot of a versioned dataset in time. Lastly, every pipeline (and indeed almost every step) can run in Backends, that specify how and where a step is executed.

By developing in pipelines, ML practitioners give themselves a platform to transition from research to production from the very beginning, and are also helped in the research phase by the powerful automations introduced by ZenML.

Quickstart

The quickest way to get started is to create a simple pipeline. The dataset used here is the Pima Indians Diabetes Dataset (originally from the National Institute of Diabetes and Digestive and Kidney Diseases)

Step 0: Installation

ZenML is available for easy installation into your environment via PyPI:

pip install zenml

Alternatively, if you’re feeling brave, feel free to install the bleeding edge: NOTE: Do so on your own risk, no guarantees given!

pip install git+https://github.com/maiot-io/[email protected] --upgrade

Step 1: Initialize a ZenML repo from within a git repo

zenml init

Step 2: Assemble, run and evaluate your pipeline locally

from zenml.datasources import CSVDatasource
from zenml.pipelines import TrainingPipeline
from zenml.steps.evaluator import TFMAEvaluator
from zenml.steps.split import RandomSplit
from zenml.steps.preprocesser import StandardPreprocesser
from zenml.steps.trainer import TFFeedForwardTrainer

training_pipeline = TrainingPipeline(name='Quickstart')

# Add a datasource. This will automatically track and version it.
ds = CSVDatasource(name='Pima Indians Diabetes Dataset',
                   path='gs://zenml_quickstart/diabetes.csv')
training_pipeline.add_datasource(ds)

# Add a random 70/30 train-eval split
training_pipeline.add_split(RandomSplit(split_map={'train': 0.7, 'eval': 0.3}))

# StandardPreprocesser() has sane defaults for normal preprocessing methods
training_pipeline.add_preprocesser(
    StandardPreprocesser(
        features=['times_pregnant', 'pgc', 'dbp', 'tst', 'insulin', 'bmi',
                  'pedigree', 'age'],
        labels=['has_diabetes'],
        overwrite={'has_diabetes': {
            'transform': [{'method': 'no_transform', 'parameters': {}}]}}
    ))

# Add a trainer
training_pipeline.add_trainer(TFFeedForwardTrainer(
    loss='binary_crossentropy',
    last_activation='sigmoid',
    output_units=1,
    metrics=['accuracy'],
    epochs=20))

# Add an evaluator
training_pipeline.add_evaluator(
    TFMAEvaluator(slices=[['has_diabetes']],
                  metrics={'has_diabetes': ['binary_crossentropy',
                                            'binary_accuracy']}))

# Run the pipeline locally
training_pipeline.run()

While the above is great to get a quick flavor of ZenML, a more practical way to start is to follow our guide to convert your legacy codebase into ZenML code here.

Leverage powerful integrations

Once code is organized into a ZenML pipeline, you can supercharge your ML development through powerful integrations. Some of the benefits you get are:

Work locally but switch seamlessly to the cloud

Switching from local experiments to cloud-based pipelines doesn't need to be complex.

From local to cloud with one parameter

Versioning galore

ZenML makes sure for every pipeline you can trust that:

βœ… Code is versioned
βœ… Data is versioned
βœ… Models are versioned
βœ… Configurations are versioned
ZenML declarative config

Automatically detect schema

# See the schema of your data
training_pipeline.view_schema()

Automatic schema dection

View statistics

# See statistics of train and eval
training_pipeline.view_statistics()
ZenML statistics visualization

Evaluate the model using built-in evaluators

# Creates a notebook for evaluation
training_pipeline.evaluate()
Tensorboard built-in

Compare training pipelines

repo.compare_training_pipelines()

ZenML built-in pipeline comparison

Distribute preprocessing to the cloud

Leverage distributed compute powered by Apache Beam:

training_pipeline.add_preprocesser(
    StandardPreprocesser(...).with_backend(
      ProcessingDataFlowBackend(
        project=GCP_PROJECT,
        num_workers=10,
    ))
)
ZenML distributed processing

Train on spot instances

Easily train on spot instances to save 80% cost.

training_pipeline.run(
  OrchestratorGCPBackend(
    preemptible=True,  # reduce costs by using preemptible instances
    machine_type='n1-standard-4',
    gpu='nvidia-tesla-k80',
    gpu_count=1,
    ...
  )
  ...
)

Deploy models automatically

Automatically deploy each model with powerful Deployment integrations like Cortex.

training_pipeline.add_deployment(
    CortexDeployer(
        api_spec=api_spec,
        predictor=PythonPredictor,
    )
)

The best part is that ZenML is extensible easily, and can be molded to your use-case. You can create your own custom logic or create a PR and contribute to the ZenML community, so that everyone can benefit.

Community

Our community is the backbone of making ZenML a success! We are currently actively maintaining two main channels for community discussions:

  • Our Slack Channel: Chat with us here.
  • The GitHub Community: Create your first thread here.

Contributing

We would love to receive your contributions! Check our Contributing Guide for more details on how to contribute best.

Copyright

ZenML is distributed under the terms of the Apache License Version 2.0. A complete version of the license is available in the LICENSE.md in this repository.

Any contribution made to this project will be licensed under the Apache License Version 2.0.

Credit

ZenML is built on the shoulders of giants: We leverage, and would like to give credit to, existing open-source libraries like TFX. The goal of our framework is neither to replace these libraries, nor to diminish their usage. ZenML is simply an opinionated, higher level interface with the focus being purely on easy-of-use and coherent intuitive design. You can read more about why we actually started building ZenML at our blog.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].