All Projects → kelvins → Awesome Mlops

kelvins / Awesome Mlops

😎 A curated list of awesome MLOps tools

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Awesome Mlops

Metaflow
🚀 Build and manage real-life data science projects with ease!
Stars: ✭ 5,108 (+1879.84%)
Mutual labels:  ai, data-science, ml
Polyaxon
Machine Learning Platform for Kubernetes (MLOps tools for experimentation and automation)
Stars: ✭ 2,966 (+1049.61%)
Mutual labels:  ai, data-science, ml
Csinva.github.io
Slides, paper notes, class notes, blog posts, and research on ML 📉, statistics 📊, and AI 🤖.
Stars: ✭ 342 (+32.56%)
Mutual labels:  ai, data-science, ml
Atlas
An Open Source, Self-Hosted Platform For Applied Deep Learning Development
Stars: ✭ 259 (+0.39%)
Mutual labels:  ai, data-science, ml
Modelchimp
Experiment tracking for machine and deep learning projects
Stars: ✭ 121 (-53.1%)
Mutual labels:  ai, data-science, ml
Awesome Mlops
A curated list of references for MLOps
Stars: ✭ 7,119 (+2659.3%)
Mutual labels:  ai, data-science, ml
Imodels
Interpretable ML package 🔍 for concise, transparent, and accurate predictive modeling (sklearn-compatible).
Stars: ✭ 194 (-24.81%)
Mutual labels:  ai, data-science, ml
Datasciencevm
Tools and Docs on the Azure Data Science Virtual Machine (http://aka.ms/dsvm)
Stars: ✭ 153 (-40.7%)
Mutual labels:  ai, data-science, ml
Pycm
Multi-class confusion matrix library in Python
Stars: ✭ 1,076 (+317.05%)
Mutual labels:  ai, data-science, ml
Hyperparameter hunter
Easy hyperparameter optimization and automatic result saving across machine learning algorithms and libraries
Stars: ✭ 648 (+151.16%)
Mutual labels:  ai, data-science, ml
Nlpaug
Data augmentation for NLP
Stars: ✭ 2,761 (+970.16%)
Mutual labels:  ai, data-science, ml
Hub
Dataset format for AI. Build, manage, & visualize datasets for deep learning. Stream data real-time to PyTorch/TensorFlow & version-control it. https://activeloop.ai
Stars: ✭ 4,003 (+1451.55%)
Mutual labels:  ai, data-science, ml
Pytorch Lightning
The lightweight PyTorch wrapper for high-performance AI research. Scale your models, not the boilerplate.
Stars: ✭ 16,641 (+6350%)
Mutual labels:  ai, data-science
Sk Dist
Distributed scikit-learn meta-estimators in PySpark
Stars: ✭ 260 (+0.78%)
Mutual labels:  data-science, ml
Delbot
It understands your voice commands, searches news and knowledge sources, and summarizes and reads out content to you.
Stars: ✭ 191 (-25.97%)
Mutual labels:  ai, data-science
Atari
AI research environment for the Atari 2600 games 🤖.
Stars: ✭ 174 (-32.56%)
Mutual labels:  ai, ml
Classifai
Enhance your WordPress content with Artificial Intelligence and Machine Learning services.
Stars: ✭ 188 (-27.13%)
Mutual labels:  ai, ml
Free Ai Resources
🚀 FREE AI Resources - 🎓 Courses, 👷 Jobs, 📝 Blogs, 🔬 AI Research, and many more - for everyone!
Stars: ✭ 192 (-25.58%)
Mutual labels:  ai, data-science
Transmogrifai
TransmogrifAI (pronounced trăns-mŏgˈrə-fī) is an AutoML library for building modular, reusable, strongly typed machine learning workflows on Apache Spark with minimal hand-tuning
Stars: ✭ 2,084 (+707.75%)
Mutual labels:  ai, ml
Bentoml
Model Serving Made Easy
Stars: ✭ 3,064 (+1087.6%)
Mutual labels:  ai, ml

Awesome MLOps Awesome

A curated list of awesome MLOps tools.

Inspired by awesome-python.


AutoML

Tools for performing AutoML.

  • AutoGluon - Automates machine learning tasks enabling you to easily achieve strong predictive performance.
  • AutoKeras - AutoKeras goal is to make machine learning accessible for everyone.
  • AutoPyTorch - Automatic architecture search and hyperparameter optimization for PyTorch.
  • AutoSKLearn - Automated machine learning toolkit and a drop-in replacement for a scikit-learn estimator.
  • H2O AutoML - Automates machine learning workflow, which includes automatic training and tuning of models.
  • MLBox - MLBox is a powerful Automated Machine Learning python library.

CI/CD for Machine Learning

Tools for performing CI/CD for Machine Learning.

  • CML - Open-source library for implementing CI/CD in machine learning projects.

Cron Job Monitoring

Tools for monitoring cron jobs (recurring jobs).

  • Cronitor - Monitor any cron job or scheduled task.
  • HealthchecksIO - Simple and effective cron job monitoring.

Data Exploration

Tools for performing data exploration.

  • Apache Zeppelin - Notebook that enables data-driven, interactive data analytics and collaborative documents.
  • Google Colab - Hosted Jupyter notebook service that requires no setup to use.
  • Jupyter Notebook - Web-based notebook environment for interactive computing.
  • JupyterLab - The next-generation user interface for Project Jupyter.
  • Jupytext - Jupyter Notebooks as Markdown Documents, Julia, Python or R scripts.
  • Polynote - The polyglot notebook with first-class Scala support.

Data Management

Tools for performing data management.

  • Arrikto - Dead simple, ultra fast storage for the hybrid Kubernetes world.
  • DVC - Management and versioning of datasets and machine learning models.
  • Intake - A lightweight set of tools for loading and sharing data in data science projects.
  • lakeFS - Repeatable, atomic and versioned data lake on top of object storage.

Data Processing

Tools related to data processing and data pipelines.

  • Airflow - Platform to programmatically author, schedule, and monitor workflows.
  • Dagster - A data orchestrator for machine learning, analytics, and ETL.
  • Hadoop - Framework that allows for the distributed processing of large data sets across clusters.
  • Spark - Unified analytics engine for large-scale data processing.

Data Validation

Tools related to data validation.

  • Cerberus - Lightweight, extensible data validation library for Python.
  • JSON Schema - A vocabulary that allows you to annotate and validate JSON documents.

Data Visualization

Tools for data visualization, reports and dashboards.

  • Count - SQL/drag-and-drop querying and visualisation tool based on notebooks.
  • Dash - Analytical Web Apps for Python, R, Julia, and Jupyter.
  • Data Studio - Reporting solution for power users who want to go beyond the data and dashboards of GA.
  • Facets - Visualizations for understanding and analyzing machine learning datasets.
  • Metabase - The simplest, fastest way to get business intelligence and analytics to everyone.
  • Redash - Connect to any data source, easily visualize, dashboard and share your data.
  • Superset - Modern, enterprise-ready business intelligence web application.
  • Tableau - Powerful and fastest growing data visualization tool used in the business intelligence industry.

Feature Store

Feature store tools for data serving.

  • Butterfree - A tool for building feature stores. Transform your raw data into beautiful features.
  • ByteHub - An easy-to-use feature store. Optimized for time-series data.
  • Feast - End-to-end open source feature store for machine learning.

Hyperparameter Tuning

Tools and libraries to perform hyperparameter tuning.

  • Hyperas - A very simple wrapper for convenient hyperparameter optimization.
  • Hyperopt - Distributed Asynchronous Hyperparameter Optimization in Python.
  • Katib - Kubernetes-based system for hyperparameter tuning and neural architecture search.
  • Optuna - Open source hyperparameter optimization framework to automate hyperparameter search.
  • Scikit Optimize - Simple and efficient library to minimize expensive and noisy black-box functions.
  • Tune - Python library for experiment execution and hyperparameter tuning at any scale.

Knowledge Sharing

Tools for sharing knowledge to the entire team/company.

  • Knowledge Repo - Knowledge sharing platform for data scientists and other technical professions.
  • Kyso - One place for data insights so your entire team can learn from your data.

Machine Learning Platform

Complete machine learning platform solutions.

  • Algorithmia - Securely govern your machine learning operations with a healthy ML lifecycle.
  • Allegro AI - Transform ML/DL research into products. Faster.
  • Bodywork - Deploys machine learning projects developed in Python, to Kubernetes.
  • CNVRG - An end-to-end machine learning platform to build and deploy AI models at scale.
  • Cubonacci - Intuitive code-first MLOps platform that streamlines the end-to-end machine learning workflow.
  • DAGsHub - A platform built on open source tools for data, model and pipeline management.
  • Dataiku - Platform democratizing access to data and enabling enterprises to build their own path to AI.
  • DataRobot - AI platform that democratizes data science and automates the end-to-end ML at scale.
  • Domino - One place for your data science tools, apps, results, models, and knowledge.
  • Gradient - Multicloud CI/CD and MLOps platform for machine learning teams.
  • H2O - Open source leader in AI with a mission to democratize AI for everyone.
  • Hopsworks - Open-source platform for developing and operating machine learning models at scale.
  • Iguazio - Data science platform that automates MLOps with end-to-end machine learning pipelines.
  • Knime - Create and productionize data science using one easy and intuitive environment.
  • Kubeflow - Making deployments of ML workflows on Kubernetes simple, portable and scalable.
  • LynxKite - A complete graph data science platform for very large graphs and other datasets.
  • ML Workspace - All-in-one web-based IDE specialized for machine learning and data science.
  • Modzy - AI platform and marketplace offering scalable, secure, and ready-to-deploy AI models.
  • Neu.ro - MLOps platform that integrates open-source and proprietary tools into client-oriented systems.
  • Pachyderm - Combines data lineage with end-to-end pipelines on Kubernetes, engineered for the enterprise.
  • Polyaxon - A platform for reproducible and scalable machine learning and deep learning on kubernetes.
  • Sagemaker - Fully managed service that provides the ability to build, train, and deploy ML models quickly.
  • Valohai - Takes you from POC to production while managing the whole model lifecycle.

Model Interpretability

Tools for performing model interpretability/explainability.

  • Alibi - Open-source Python library enabling ML model inspection and interpretation.
  • InterpretML - A toolkit to help understand models and enable responsible machine learning.
  • LIME - Explaining the predictions of any machine learning classifier.
  • SHAP - A game theoretic approach to explain the output of any machine learning model.

Model Lifecycle

Tools for managing model lifecycle (tracking experiments, parameters and metrics).

  • Comet - Track your datasets, code changes, experimentation history, and models.
  • Mlflow - Open source platform for the machine learning lifecycle.
  • ModelDB - Open source ML model versioning, metadata, and experiment management.
  • Neptune AI - The most lightweight experiment management tool that fits any workflow.
  • Replicate - Library that uploads files and metadata (like hyperparameters) to S3 or GCS.
  • Sacred - A tool to help you configure, organize, log and reproduce experiments.

Model Serving

Tools for serving models in production.

  • BentoML - Open-source platform for high-performance ML model serving.
  • BudgetML - Deploy a ML inference service on a budget in less than 10 lines of code.
  • Cortex - Machine learning model serving infrastructure.
  • GraphPipe - Machine learning model deployment made simple.
  • KFServing - Kubernetes custom resource definition for serving ML models on arbitrary frameworks.
  • PredictionIO - Event collection, deployment of algorithms, evaluation, querying predictive results via APIs.
  • Seldon - Take your ML projects from POC to production with maximum efficiency and minimal risk.
  • Streamlit - Lets you create apps for your ML projects with deceptively simple Python scripts.
  • TensorFlow Serving - Flexible, high-performance serving system for ML models, designed for production.
  • TorchServe - A flexible and easy to use tool for serving PyTorch models.

Optimization Tools

Optimization tools related to model scalability in production.

  • Dask - Provides advanced parallelism for analytics, enabling performance at scale for the tools you love.
  • DeepSpeed - Deep learning optimization library that makes distributed training easy, efficient, and effective.
  • Fiber - Python distributed computing library for modern computer clusters.
  • Horovod - Distributed deep learning training framework for TensorFlow, Keras, PyTorch, and Apache MXNet.
  • Mahout - Distributed linear algebra framework and mathematically expressive Scala DSL.
  • MLlib - Apache Spark's scalable machine learning library.
  • Modin - Speed up your Pandas workflows by changing a single line of code.
  • Petastorm - Enables single machine or distributed training and evaluation of deep learning models.
  • Rapids - Gives the ability to execute end-to-end data science and analytics pipelines entirely on GPUs.
  • Ray - Fast and simple framework for building and running distributed applications.
  • Singa - Apache top level project, focusing on distributed training of DL and ML models.
  • Tpot - Automated ML tool that optimizes machine learning pipelines using genetic programming.

Simplification Tools

Tools related to machine learning simplification and standardization.

  • Hermione - Help Data Scientists on setting up more organized codes, in a quicker and simpler way.
  • Koalas - Pandas API on Apache Spark. Makes data scientists more productive when interacting with big data.
  • Ludwig - Allows users to train and test deep learning models without the need to write code.
  • PyCaret - Open source, low-code machine learning library in Python.
  • TrainGenerator - A web app to generate template code for machine learning.
  • Turi Create - Simplifies the development of custom machine learning models.

Visual Analysis and Debugging

Tools for performing visual analysis and debugging of ML/DL models.

  • Manifold - A model-agnostic visual debugging tool for machine learning.
  • Netron - Visualizer for neural network, deep learning, and machine learning models.
  • Yellowbrick - Visual analysis and diagnostic tools to facilitate machine learning model selection.

Workflow Tools

Tools and frameworks to create workflows or pipelines in the machine learning context.

  • Argo - Open source container-native workflow engine for orchestrating parallel jobs on Kubernetes.
  • Couler - Unified interface for constructing and managing workflows on different workflow engines.
  • Flyte - Easy to create concurrent, scalable, and maintainable workflows for machine learning.
  • Kale - Aims at simplifying the Data Science experience of deploying Kubeflow Pipelines workflows.
  • Kedro - Library that implements software engineering best-practice for data and ML pipelines.
  • Luigi - Python module that helps you build complex pipelines of batch jobs.
  • Metaflow - Human-friendly lib that helps scientists and engineers build and manage data science projects.
  • MLRun - Generic mechanism for data scientists to build, run, and monitor ML tasks and pipelines.
  • Prefect - A workflow management system, designed for modern infrastructure.
  • ZenML - An extensible open-source MLOps framework to create reproducible pipelines.

Resources

Where to discover new tools and discuss about existing ones.

Articles

Books

Events

Other Lists

Podcasts

Slack

Websites

Contributing

All contributions are welcome! Please take a look at the contribution guidelines first.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].