All Projects → ClimbsRocks → Auto_ml

ClimbsRocks / Auto_ml

Licence: mit
[UNMAINTAINED] Automated machine learning for analytics & production

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Auto ml

Mljar Supervised
Automated Machine Learning Pipeline with Feature Engineering and Hyper-Parameters Tuning 🚀
Stars: ✭ 961 (-38.36%)
Mutual labels:  data-science, scikit-learn, automl, xgboost, hyperparameter-optimization, feature-engineering, lightgbm, automated-machine-learning
Tpot
A Python Automated Machine Learning tool that optimizes machine learning pipelines using genetic programming.
Stars: ✭ 8,378 (+437.4%)
Mutual labels:  data-science, scikit-learn, automl, xgboost, hyperparameter-optimization, feature-engineering, automated-machine-learning, gradient-boosting
Hyperactive
A hyperparameter optimization and data collection toolbox for convenient and fast prototyping of machine-learning models.
Stars: ✭ 182 (-88.33%)
Mutual labels:  artificial-intelligence, data-science, scikit-learn, xgboost, hyperparameter-optimization, feature-engineering, automated-machine-learning
Autodl
Automated Deep Learning without ANY human intervention. 1'st Solution for AutoDL [email protected]
Stars: ✭ 854 (-45.22%)
Mutual labels:  artificial-intelligence, data-science, deeplearning, automl, feature-engineering, lightgbm, automated-machine-learning
Hyperparameter hunter
Easy hyperparameter optimization and automatic result saving across machine learning algorithms and libraries
Stars: ✭ 648 (-58.43%)
Mutual labels:  artificial-intelligence, data-science, scikit-learn, xgboost, hyperparameter-optimization, feature-engineering, lightgbm
Lale
Library for Semi-Automated Data Science
Stars: ✭ 198 (-87.3%)
Mutual labels:  artificial-intelligence, data-science, scikit-learn, automl, hyperparameter-optimization, automated-machine-learning
My Data Competition Experience
本人多次机器学习与大数据竞赛Top5的经验总结,满满的干货,拿好不谢
Stars: ✭ 271 (-82.62%)
Mutual labels:  data-science, automl, xgboost, hyperparameter-optimization, feature-engineering, lightgbm
Autogluon
AutoGluon: AutoML for Text, Image, and Tabular Data
Stars: ✭ 3,920 (+151.44%)
Mutual labels:  data-science, scikit-learn, automl, hyperparameter-optimization, automated-machine-learning
AutoTabular
Automatic machine learning for tabular data. ⚡🔥⚡
Stars: ✭ 51 (-96.73%)
Mutual labels:  scikit-learn, xgboost, lightgbm, feature-engineering, automl
Nni
An open source AutoML toolkit for automate machine learning lifecycle, including feature engineering, neural architecture search, model compression and hyper-parameter tuning.
Stars: ✭ 10,698 (+586.21%)
Mutual labels:  data-science, automl, hyperparameter-optimization, feature-engineering, automated-machine-learning
Igel
a delightful machine learning tool that allows you to train, test, and use models without writing code
Stars: ✭ 2,956 (+89.61%)
Mutual labels:  artificial-intelligence, data-science, scikit-learn, automl, machine-learning-library
Machinejs
[UNMAINTAINED] Automated machine learning- just give it a data file! Check out the production-ready version of this project at ClimbsRocks/auto_ml
Stars: ✭ 412 (-73.57%)
Mutual labels:  data-science, scikit-learn, automl, machine-learning-library, automated-machine-learning
Automl alex
State-of-the art Automated Machine Learning python library for Tabular Data
Stars: ✭ 132 (-91.53%)
Mutual labels:  data-science, automl, xgboost, hyperparameter-optimization, machine-learning-library
Lightautoml
LAMA - automatic model creation framework
Stars: ✭ 196 (-87.43%)
Mutual labels:  data-science, automl, feature-engineering, automated-machine-learning, gradient-boosting
Featuretools
An open source python library for automated feature engineering
Stars: ✭ 5,891 (+277.87%)
Mutual labels:  data-science, scikit-learn, automl, feature-engineering, automated-machine-learning
Mlbox
MLBox is a powerful Automated Machine Learning python library.
Stars: ✭ 1,199 (-23.09%)
Mutual labels:  data-science, automl, xgboost, lightgbm, automated-machine-learning
mindware
An efficient open-source AutoML system for automating machine learning lifecycle, including feature engineering, neural architecture search, and hyper-parameter tuning.
Stars: ✭ 34 (-97.82%)
Mutual labels:  hyperparameter-optimization, feature-engineering, automl, automated-machine-learning
Autoviz
Automatically Visualize any dataset, any size with a single line of code. Created by Ram Seshadri. Collaborators Welcome. Permission Granted upon Request.
Stars: ✭ 310 (-80.12%)
Mutual labels:  scikit-learn, automl, xgboost, automated-machine-learning
Pba
Efficient Learning of Augmentation Policy Schedules
Stars: ✭ 461 (-70.43%)
Mutual labels:  artificial-intelligence, data-science, automl, automated-machine-learning
Auto Sklearn
Automated Machine Learning with scikit-learn
Stars: ✭ 5,916 (+279.47%)
Mutual labels:  scikit-learn, automl, hyperparameter-optimization, automated-machine-learning

auto_ml

Automated machine learning for production and analytics

Build Status Documentation Status PyPI version Coverage Status license

Installation

  • pip install auto_ml

Getting started

from auto_ml import Predictor
from auto_ml.utils import get_boston_dataset

df_train, df_test = get_boston_dataset()

column_descriptions = {
    'MEDV': 'output',
    'CHAS': 'categorical'
}

ml_predictor = Predictor(type_of_estimator='regressor', column_descriptions=column_descriptions)

ml_predictor.train(df_train)

ml_predictor.score(df_test, df_test.MEDV)

Show off some more features!

auto_ml is designed for production. Here's an example that includes serializing and loading the trained model, then getting predictions on single dictionaries, roughly the process you'd likely follow to deploy the trained model.

from auto_ml import Predictor
from auto_ml.utils import get_boston_dataset
from auto_ml.utils_models import load_ml_model

# Load data
df_train, df_test = get_boston_dataset()

# Tell auto_ml which column is 'output'
# Also note columns that aren't purely numerical
# Examples include ['nlp', 'date', 'categorical', 'ignore']
column_descriptions = {
  'MEDV': 'output'
  , 'CHAS': 'categorical'
}

ml_predictor = Predictor(type_of_estimator='regressor', column_descriptions=column_descriptions)

ml_predictor.train(df_train)

# Score the model on test data
test_score = ml_predictor.score(df_test, df_test.MEDV)

# auto_ml is specifically tuned for running in production
# It can get predictions on an individual row (passed in as a dictionary)
# A single prediction like this takes ~1 millisecond
# Here we will demonstrate saving the trained model, and loading it again
file_name = ml_predictor.save()

trained_model = load_ml_model(file_name)

# .predict and .predict_proba take in either:
# A pandas DataFrame
# A list of dictionaries
# A single dictionary (optimized for speed in production evironments)
predictions = trained_model.predict(df_test)
print(predictions)

3rd Party Packages- Deep Learning with TensorFlow & Keras, XGBoost, LightGBM, CatBoost

auto_ml has all of these awesome libraries integrated! Generally, just pass one of them in for model_names. ml_predictor.train(data, model_names=['DeepLearningClassifier'])

Available options are

  • DeepLearningClassifier and DeepLearningRegressor
  • XGBClassifier and XGBRegressor
  • LGBMClassifier and LGBMRegressor
  • CatBoostClassifier and CatBoostRegressor

All of these projects are ready for production. These projects all have prediction time in the 1 millisecond range for a single prediction, and are able to be serialized to disk and loaded into a new environment after training.

Depending on your machine, they can occasionally be difficult to install, so they are not included in auto_ml's default installation. You are responsible for installing them yourself. auto_ml will run fine without them installed (we check what's installed before choosing which algorithm to use).

Feature Responses

Get linear-model-esque interpretations from non-linear models. See the docs for more information and caveats.

Classification

Binary and multiclass classification are both supported. Note that for now, labels must be integers (0 and 1 for binary classification). auto_ml will automatically detect if it is a binary or multiclass classification problem - you just have to pass in ml_predictor = Predictor(type_of_estimator='classifier', column_descriptions=column_descriptions)

Feature Learning

Also known as "finally found a way to make this deep learning stuff useful for my business". Deep Learning is great at learning important features from your data. But the way it turns these learned features into a final prediction is relatively basic. Gradient boosting is great at turning features into accurate predictions, but it doesn't do any feature learning.

In auto_ml, you can now automatically use both types of models for what they're great at. If you pass feature_learning=True, fl_data=some_dataframe to .train(), we will do exactly that: train a deep learning model on your fl_data. We won't ask it for predictions (standard stacking approach), instead, we'll use it's penultimate layer to get it's 10 most useful features. Then we'll train a gradient boosted model (or any other model of your choice) on those features plus all the original features.

Across some problems, we've witnessed this lead to a 5% gain in accuracy, while still making predictions in 1-4 milliseconds, depending on model complexity.

ml_predictor.train(df_train, feature_learning=True, fl_data=df_fl_data)

This feature only supports regression and binary classification currently. The rest of auto_ml supports multiclass classification.

Categorical Ensembling

Ever wanted to train one market for every store/customer, but didn't want to maintain hundreds of thousands of independent models? With ml_predictor.train_categorical_ensemble(), we will handle that for you. You'll still have just one consistent API, ml_predictor.predict(data), but behind this single API will be one model for each category you included in your training data.

Just tell us which column holds the category you want to split on, and we'll handle the rest. As always, saving the model, loading it in a different environment, and getting speedy predictions live in production is baked right in.

ml_predictor.train_categorical_ensemble(df_train, categorical_column='store_name')

More details available in the docs

http://auto-ml.readthedocs.io/en/latest/

Advice

Before you go any further, try running the code. Load up some data (either a DataFrame, or a list of dictionaries, where each dictionary is a row of data). Make a column_descriptions dictionary that tells us which attribute name in each row represents the value we're trying to predict. Pass all that into auto_ml, and see what happens!

Everything else in these docs assumes you have done at least the above. Start there and everything else will build on top. But this part gets you the output you're probably interested in, without unnecessary complexity.

Docs

The full docs are available at https://auto_ml.readthedocs.io Again though, I'd strongly recommend running this on an actual dataset before referencing the docs any futher.

What this project does

Automates the whole machine learning process, making it super easy to use for both analytics, and getting real-time predictions in production.

A quick overview of buzzwords, this project automates:

  • Analytics (pass in data, and auto_ml will tell you the relationship of each variable to what it is you're trying to predict).
  • Feature Engineering (particularly around dates, and NLP).
  • Robust Scaling (turning all values into their scaled versions between the range of 0 and 1, in a way that is robust to outliers, and works with sparse data).
  • Feature Selection (picking only the features that actually prove useful).
  • Data formatting (turning a DataFrame or a list of dictionaries into a sparse matrix, one-hot encoding categorical variables, taking the natural log of y for regression problems, etc).
  • Model Selection (which model works best for your problem- we try roughly a dozen apiece for classification and regression problems, including favorites like XGBoost if it's installed on your machine).
  • Hyperparameter Optimization (what hyperparameters work best for that model).
  • Big Data (feed it lots of data- it's fairly efficient with resources).
  • Unicorns (you could conceivably train it to predict what is a unicorn and what is not).
  • Ice Cream (mmm, tasty...).
  • Hugs (this makes it much easier to do your job, hopefully leaving you more time to hug those those you care about).

Running the tests

If you've cloned the source code and are making any changes (highly encouraged!), or just want to make sure everything works in your environment, run nosetests -v tests.

CI is also set up, so if you're developing on this, you can just open a PR, and the tests will run automatically on Travis-CI.

The tests are relatively comprehensive, though as with everything with auto_ml, I happily welcome your contributions here!

Analytics

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].