All Projects → MLBazaar → MLBlocks

MLBazaar / MLBlocks

Licence: MIT License
A library for composing end-to-end tunable machine learning pipelines.

Programming Languages

python
139335 projects - #7 most used programming language
Makefile
30231 projects

Projects that are alternatives of or similar to MLBlocks

scalaj
scala/java interoperability, as deep as you want
Stars: ✭ 20 (-78.72%)
Mutual labels:  primitives
WebSocketPipe
System.IO.Pipelines API adapter for System.Net.WebSockets
Stars: ✭ 17 (-81.91%)
Mutual labels:  pipelines
scrapy-pipelines
A collection of pipelines for Scrapy
Stars: ✭ 16 (-82.98%)
Mutual labels:  pipelines
allennlp-optuna
⚡️ AllenNLP plugin for adding subcommands to use Optuna, making hyperparameter optimization easy
Stars: ✭ 33 (-64.89%)
Mutual labels:  hyperparameters
gee
🏵 Gee is tool of stdin to each files and stdout. It is similar to the tee command, but there are more functions for convenience. In addition, it was written as go
Stars: ✭ 65 (-30.85%)
Mutual labels:  pipelines
painless-continuous-delivery
A cookiecutter for projects with continuous delivery baked in.
Stars: ✭ 46 (-51.06%)
Mutual labels:  pipelines
django-slack-oauth
Handles OAuth and stores slack token
Stars: ✭ 51 (-45.74%)
Mutual labels:  pipelines
pipeline-editor
Cloud Pipelines Editor is a web app that allows the users to build and run Machine Learning pipelines without having to set up development environment.
Stars: ✭ 22 (-76.6%)
Mutual labels:  pipelines
tibanna
Tibanna helps you run your genomic pipelines on Amazon cloud (AWS). It is used by the 4DN DCIC (4D Nucleome Data Coordination and Integration Center) to process data. Tibanna supports CWL/WDL (w/ docker), Snakemake (w/ conda) and custom Docker/shell command.
Stars: ✭ 61 (-35.11%)
Mutual labels:  pipelines
julia-workshop
"Integrating Julia in real-world, distributed pipelines" for JuliaCon 2017
Stars: ✭ 39 (-58.51%)
Mutual labels:  pipelines
codechain-primitives-js
JavaScript functions and classes for CodeChain primitives
Stars: ✭ 13 (-86.17%)
Mutual labels:  primitives
NewmanPostman VSTS Task
A task for Azure DevOps Pipelines to run newman tests.
Stars: ✭ 31 (-67.02%)
Mutual labels:  pipelines
paradox
ParamHelpers Next Generation
Stars: ✭ 23 (-75.53%)
Mutual labels:  hyperparameters
k8s-knative-gitlab-harbor
Build container images with Knative + Gitlab + Harbor inside Kops cluster running on AWS
Stars: ✭ 23 (-75.53%)
Mutual labels:  pipelines
tfx-kubeflow-pipelines
Kubeflow pipelines built on top of Tensorflow TFX library
Stars: ✭ 17 (-81.91%)
Mutual labels:  pipelines
prime-re.github.io
Open resource exchange platform for non-human primate neuroimaging
Stars: ✭ 13 (-86.17%)
Mutual labels:  pipelines
codeflare
Simplifying the definition and execution, scaling and deployment of pipelines on the cloud.
Stars: ✭ 163 (+73.4%)
Mutual labels:  pipelines
nbtransom
Machines and people collaborating together through Jupyter notebooks.
Stars: ✭ 17 (-81.91%)
Mutual labels:  pipelines
devops-101-workshop
Serves as documentation, starter code, and companion guide for a DevOps 101 workshop using the JFrog platform.
Stars: ✭ 36 (-61.7%)
Mutual labels:  pipelines
dspatch
The Refreshingly Simple Cross-Platform C++ Dataflow / Pipelining / Stream Processing / Reactive Programming Framework
Stars: ✭ 124 (+31.91%)
Mutual labels:  pipelines

DAI-Lab An Open Source Project from the Data to AI Lab, at MIT

“MLBlocks”

Pipelines and Primitives for Machine Learning and Data Science.

Development Status PyPi Tests CodeCov Downloads Binder


MLBlocks

Overview

MLBlocks is a simple framework for composing end-to-end tunable Machine Learning Pipelines by seamlessly combining tools from any python library with a simple, common and uniform interface.

Features include:

  • Build Machine Learning Pipelines combining any Machine Learning Library in Python.
  • Access a repository with hundreds of primitives and pipelines ready to be used with little to no python code to write, carefully curated by Machine Learning and Domain experts.
  • Extract machine-readable information about which hyperparameters can be tuned and within which ranges, allowing automated integration with Hyperparameter Optimization tools like BTB.
  • Complex multi-branch pipelines and DAG configurations, with unlimited number of inputs and outputs per primitive.
  • Easy save and load Pipelines using JSON Annotations.

Install

Requirements

MLBlocks has been developed and tested on Python 3.6, 3.7 and 3.8

Install with pip

The easiest and recommended way to install MLBlocks is using pip:

pip install mlblocks

This will pull and install the latest stable release from PyPi.

If you want to install from source or contribute to the project please read the Contributing Guide.

MLPrimitives

In order to be usable, MLBlocks requires a compatible primitives library.

The official library, required in order to follow the following MLBlocks tutorial, is MLPrimitives, which you can install with this command:

pip install mlprimitives

Quickstart

Below there is a short example about how to use MLBlocks to solve the Adult Census Dataset classification problem using a pipeline which combines primitives from MLPrimitives, scikit-learn and xgboost.

from mlblocks import MLPipeline
from mlprimitives.datasets import load_dataset

dataset = load_dataset('census')
X_train, X_test, y_train, y_test = dataset.get_splits(1)

primitives = [
    'mlprimitives.custom.preprocessing.ClassEncoder',
    'mlprimitives.custom.feature_extraction.CategoricalEncoder',
    'sklearn.impute.SimpleImputer',
    'xgboost.XGBClassifier',
    'mlprimitives.custom.preprocessing.ClassDecoder'
]
pipeline = MLPipeline(primitives)

pipeline.fit(X_train, y_train)
predictions = pipeline.predict(X_test)

dataset.score(y_test, predictions)

What's Next?

If you want to learn more about how to tune the pipeline hyperparameters, save and load the pipelines using JSON annotations or build complex multi-branched pipelines, please check our documentation site.

Also do not forget to have a look at the notebook tutorials!

Citing MLBlocks

If you use MLBlocks for your research, please consider citing our related papers.

For the current design of MLBlocks and its usage within the larger Machine Learning Bazaar project at the MIT Data To AI Lab, please see:

Micah J. Smith, Carles Sala, James Max Kanter, and Kalyan Veeramachaneni. "The Machine Learning Bazaar: Harnessing the ML Ecosystem for Effective System Development." arXiv Preprint 1905.08942. 2019.

@article{smith2019mlbazaar,
  author = {Smith, Micah J. and Sala, Carles and Kanter, James Max and Veeramachaneni, Kalyan},
  title = {The Machine Learning Bazaar: Harnessing the ML Ecosystem for Effective System Development},
  journal = {arXiv e-prints},
  year = {2019},
  eid = {arXiv:1905.08942},
  pages = {arXiv:1905.08942},
  archivePrefix = {arXiv},
  eprint = {1905.08942},
}

For the first MLBlocks version from 2015, designed for only multi table, multi entity temporal data, please refer to Bryan Collazo’s thesis:

With recent availability of a multitude of libraries and tools, we decided it was time to integrate them and expand the library to address other data types: images, text, graph, time series and integrate with deep learning libraries.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].