All Projects → crawles → Automl_service

crawles / Automl_service

Licence: mit
Deploy AutoML as a service using Flask

Projects that are alternatives of or similar to Automl service

Allensdk
code for reading and processing Allen Institute for Brain Science data
Stars: ✭ 200 (-0.99%)
Mutual labels:  jupyter-notebook
Urban Informatics And Visualization
Urban Informatics and Visualization (UC Berkeley CP255)
Stars: ✭ 200 (-0.99%)
Mutual labels:  jupyter-notebook
Zaoqi Python
公众号:早起Python
Stars: ✭ 202 (+0%)
Mutual labels:  jupyter-notebook
Pyqstrat
A fast, extensible, transparent python library for backtesting quantitative strategies.
Stars: ✭ 200 (-0.99%)
Mutual labels:  jupyter-notebook
Pycon2017 Optimizing Pandas
Materials for PyCon 2017 presentation on optimizing Pandas code
Stars: ✭ 201 (-0.5%)
Mutual labels:  jupyter-notebook
Echomods
Open source ultrasound processing modules and building blocks
Stars: ✭ 200 (-0.99%)
Mutual labels:  jupyter-notebook
Sdc Lane And Vehicle Detection Tracking
OpenCV in Python for lane line and vehicle detection/tracking in autonomous cars
Stars: ✭ 200 (-0.99%)
Mutual labels:  jupyter-notebook
Geostatspy
GeostatsPy Python package for spatial data analytics and geostatistics. Mostly a reimplementation of GSLIB, Geostatistical Library (Deutsch and Journel, 1992) in Python. Geostatistics in a Python package. I hope this resources is helpful, Prof. Michael Pyrcz
Stars: ✭ 200 (-0.99%)
Mutual labels:  jupyter-notebook
Lsstc Dsfp Sessions
Lecture slides, Jupyter notebooks, and other material from the LSSTC Data Science Fellowship Program
Stars: ✭ 201 (-0.5%)
Mutual labels:  jupyter-notebook
Data Science Online
Stars: ✭ 202 (+0%)
Mutual labels:  jupyter-notebook
Keras deep clustering
How to do Unsupervised Clustering with Keras
Stars: ✭ 202 (+0%)
Mutual labels:  jupyter-notebook
Trump Lies
Tutorial: Web scraping in Python with Beautiful Soup
Stars: ✭ 201 (-0.5%)
Mutual labels:  jupyter-notebook
Fastpages
An easy to use blogging platform, with enhanced support for Jupyter Notebooks.
Stars: ✭ 2,888 (+1329.7%)
Mutual labels:  jupyter-notebook
Rosetta
Tools, wrappers, etc... for data science with a concentration on text processing
Stars: ✭ 200 (-0.99%)
Mutual labels:  jupyter-notebook
Csa Inpainting
Coherent Semantic Attention for image inpainting(ICCV 2019)
Stars: ✭ 202 (+0%)
Mutual labels:  jupyter-notebook
Spark Practice
Apache Spark (PySpark) Practice on Real Data
Stars: ✭ 200 (-0.99%)
Mutual labels:  jupyter-notebook
Cs231n
my assignment solutions for CS231n Convolutional Neural Networks for Visual Recognition
Stars: ✭ 201 (-0.5%)
Mutual labels:  jupyter-notebook
Release
Deep Reinforcement Learning for de-novo Drug Design
Stars: ✭ 201 (-0.5%)
Mutual labels:  jupyter-notebook
Face toolbox keras
A collection of deep learning frameworks ported to Keras for face analysis.
Stars: ✭ 202 (+0%)
Mutual labels:  jupyter-notebook
Joyful Pandas
pandas中文教程
Stars: ✭ 2,788 (+1280.2%)
Mutual labels:  jupyter-notebook

AutoML Service

Deploy automated machine learning (AutoML) as a service using Flask, for both pipeline training and pipeline serving.

The framework implements a fully automated time series classification pipeline, automating both feature engineering and model selection and optimization using Python libraries, TPOT and tsfresh.

Check out the blog post for more info.

Resources:

  • TPOT– Automated feature preprocessing and model optimization tool
  • tsfresh– Automated time series feature engineering and selection
  • Flask– A web development microframework for Python

Architecture

The application exposes both model training and model predictions with a RESTful API. For model training, input data and labels are sent via POST request, a pipeline is trained, and model predictions are accessible via a prediction route.

Pipelines are stored to a unique key, and thus, live predictions can be made on the same data using different feature construction and modeling pipelines.

An automated pipeline for time-series classification.

The model training logic is exposed as a REST endpoint. Raw, labeled training data is uploaded via a POST request and an optimal model is developed.

Raw training data is uploaded via a POST request and a model prediction is returned.

Using the app

View the Jupyter Notebook for an example.

Deploying

# deploy locally
python automl_service.py
# deploy on cloud foundry
cf push

Usage

Train a pipeline:

train_url = 'http://0.0.0.0:8080/train_pipeline'
train_files = {'raw_data': open('data/data_train.json', 'rb'),
               'labels'  : open('data/label_train.json', 'rb'),
               'params'  : open('parameters/train_parameters_model2.yml', 'rb')}

# post request to train pipeline
r_train = requests.post(train_url, files=train_files)
result_df = json.loads(r_train.json())

returns:

{'featureEngParams': {'default_fc_parameters': "['median', 'minimum', 'standard_deviation', 
                                                 'sum_values', 'variance', 'maximum', 
                                                 'length', 'mean']",
                      'impute_function': 'impute',
                      ...},
 'mean_cv_accuracy': 0.865,
 'mean_cv_roc_auc': 0.932,
 'modelId': 1,
 'modelType': "Pipeline(steps=[('stackingestimator', StackingEstimator(estimator=LinearSVC(...))),
                               ('logisticregression', LogisticRegressionClassifier(solver='liblinear',...))])"
 'trainShape': [1647, 8],
 'trainTime': 1.953}

Serve pipeline predictions:

serve_url = 'http://0.0.0.0:8080/serve_prediction'
test_files = {'raw_data': open('data/data_test.json', 'rb'),
              'params' : open('parameters/test_parameters_model2.yml', 'rb')}

# post request to serve predictions from trained pipeline
r_test  = requests.post(serve_url, files=test_files)
result = pd.read_json(r_test.json()).set_index('id')
example_id prediction
1 0.853
2 0.991
3 0.060
4 0.995
5 0.003
... ...

View all trained models:

r = requests.get('http://0.0.0.0:8080/models')
pipelines = json.loads(r.json())
{'1':
    {'mean_cv_accuracy': 0.873,
     'modelType': "RandomForestClassifier(...),
     ...},
 '2':
    {'mean_cv_accuracy': 0.895,
     'modelType': "GradientBoostingClassifier(...),
     ...},
 '3':
    {'mean_cv_accuracy': 0.859,
     'modelType': "LogisticRegressionClassifier(...),
     ...},
...}

Running the tests

Supply a user argument for the host.

# use local app
py.test --host http://0.0.0.0:8080
# use cloud-deployed app
py.test --host http://ROUTE-HERE

Scaling the architecture

For production, I would suggest splitting training and serving into seperate applications, and incorporating a fascade API. Also it would be best to use a shared cache such as Redis or Pivotal Cloud Cache to allow other applications and multiple instances of the pipeline to access the trained model. Here is a potential architecture.

A scalable model training and model serving architecture.

Author

Chris Rawles

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].