All Projects → ADALabUCSD → cerebro-system

ADALabUCSD / cerebro-system

Licence: Apache-2.0 license
Data System for Optimized Deep Learning Model Selection

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to cerebro-system

skrobot
skrobot is a Python module for designing, running and tracking Machine Learning experiments / tasks. It is built on top of scikit-learn framework.
Stars: ✭ 22 (+46.67%)
Mutual labels:  model-selection, hyperparameter-tuning
mltb
Machine Learning Tool Box
Stars: ✭ 25 (+66.67%)
Mutual labels:  hyperparameter-tuning
Hypernets
A General Automated Machine Learning framework to simplify the development of End-to-end AutoML toolkits in specific domains.
Stars: ✭ 221 (+1373.33%)
Mutual labels:  hyperparameter-tuning
naturalselection
A general-purpose pythonic genetic algorithm.
Stars: ✭ 17 (+13.33%)
Mutual labels:  hyperparameter-tuning
open-box
Generalized and Efficient Blackbox Optimization System.
Stars: ✭ 64 (+326.67%)
Mutual labels:  hyperparameter-tuning
Machine-learning
This repository will contain all the stuffs required for beginners in ML and DL do follow and star this repo for regular updates
Stars: ✭ 27 (+80%)
Mutual labels:  model-selection
Yellowbrick
Visual analysis and diagnostic tools to facilitate machine learning model selection.
Stars: ✭ 3,439 (+22826.67%)
Mutual labels:  model-selection
pathpy
pathpy is an OpenSource python package for the modeling and analysis of pathways and temporal networks using higher-order and multi-order graphical models
Stars: ✭ 124 (+726.67%)
Mutual labels:  model-selection
irace
Iterated Racing for Automatic Algorithm Configuration
Stars: ✭ 26 (+73.33%)
Mutual labels:  hyperparameter-tuning
sklearndf
DataFrame support for scikit-learn.
Stars: ✭ 54 (+260%)
Mutual labels:  model-selection
pyAudioProcessing
Audio feature extraction and classification
Stars: ✭ 165 (+1000%)
Mutual labels:  hyperparameter-tuning
mlr3tuning
Hyperparameter optimization package of the mlr3 ecosystem
Stars: ✭ 44 (+193.33%)
Mutual labels:  hyperparameter-tuning
differential-privacy-bayesian-optimization
This repo contains the underlying code for all the experiments from the paper: "Automatic Discovery of Privacy-Utility Pareto Fronts"
Stars: ✭ 22 (+46.67%)
Mutual labels:  hyperparameter-tuning
scikit-hyperband
A scikit-learn compatible implementation of hyperband
Stars: ✭ 68 (+353.33%)
Mutual labels:  hyperparameter-tuning
BAS
BAS R package https://merliseclyde.github.io/BAS/
Stars: ✭ 36 (+140%)
Mutual labels:  model-selection
Tpot
A Python Automated Machine Learning tool that optimizes machine learning pipelines using genetic programming.
Stars: ✭ 8,378 (+55753.33%)
Mutual labels:  model-selection
map-floodwater-satellite-imagery
This repository focuses on training semantic segmentation models to predict the presence of floodwater for disaster prevention. Models were trained using SageMaker and Colab.
Stars: ✭ 21 (+40%)
Mutual labels:  hyperparameter-tuning
sl3
💪 🤔 Modern Super Learning with Machine Learning Pipelines
Stars: ✭ 93 (+520%)
Mutual labels:  model-selection
diviner
Diviner is a serverless machine learning and hyper parameter tuning platform
Stars: ✭ 19 (+26.67%)
Mutual labels:  hyperparameter-tuning
maggy
Distribution transparent Machine Learning experiments on Apache Spark
Stars: ✭ 83 (+453.33%)
Mutual labels:  hyperparameter-tuning

Cerebro

Cerebro is a data system for optimized deep learning model selection. It uses a novel parallel execution strategy called Model Hopper Parallelism (MOP) to execute end-to-end deep learning model selection workloads in a more resource-efficient manner. Detailed technical information about Cerebro can be found in our Technical Report.

Install

Prerequisites: You MUST be running on Python >= 3.6 with Tensorflow >= 2.3 and Apache Spark >= 2.4. You will need to install these separately, and you will also need to install pyspark with a matching version of your Spark. For most users, these (except for Spark, which you will need to follow their instructions) can be installed by

pip install tensorflow==2.3

and

pip install pyspark==<your spark version>

It's worth mentioning pyspark itself can be run in local/single-node mode without Spark installed. If you are just checking out/not using a cluster, then you can run

sudo apt-get update
sudo apt-get install -y openjdk-8-jdk
pip install pyspark==3.2.0

This alone should be sufficient for running the examples, but remember, to utilize a cluster with multiple machines, you will need Spark eventually.

Cerebro: The best way to install the Cerebro is via pip (may not contain the latest changes). WARNING: if you are using Spark/PySpark 3.x, then you must use the alternative method for installation

pip install -U cerebro-dl

Alternatively, you can git clone and run the provided Makefile script

git clone https://github.com/ADALabUCSD/cerebro-system.git && cd cerebro-system && make

Documentation

Detailed documentation about the system can be found here.

Acknowledgement

This project was/is supported in part by a Hellman Fellowship, the NIDDK of the NIH under award number R01DK114945, and an NSF CAREER Award.

We used the following projects when building Cerebro.

  • Horovod: Cerebro's Apache Spark implementation uses code from the Horovod's implementation for Apache Spark.
  • Petastorm: We use Petastorm to read Apache Parquet data from remote storage (e.g., HDFS)

Publications

If you use this software for research, plase cite the following papers:

@inproceedings{nakandala2019cerebro,
  title={Cerebro: Efficient and Reproducible Model Selection on Deep Learning Systems},
  author={Nakandala, Supun and Zhang, Yuhao and Kumar, Arun},
  booktitle={Proceedings of the 3rd International Workshop on Data Management for End-to-End Machine Learning},
  pages={1--4},
  year={2019}
}
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].