All Projects → pfnet-research → autogbt-alt

pfnet-research / autogbt-alt

Licence: MIT license
An experimental Python package that reimplements AutoGBT using LightGBM and Optuna.

Programming Languages

python
139335 projects - #7 most used programming language
shell
77523 projects

Projects that are alternatives of or similar to autogbt-alt

Lightautoml
LAMA - automatic model creation framework
Stars: ✭ 196 (+157.89%)
Mutual labels:  kaggle, automl, gradient-boosting
Mlbox
MLBox is a powerful Automated Machine Learning python library.
Stars: ✭ 1,199 (+1477.63%)
Mutual labels:  kaggle, lightgbm, automl
Lightgbm
A fast, distributed, high performance gradient boosting (GBT, GBDT, GBRT, GBM or MART) framework based on decision tree algorithms, used for ranking, classification and many other machine learning tasks.
Stars: ✭ 13,293 (+17390.79%)
Mutual labels:  kaggle, lightgbm, gradient-boosting
Apartment-Interest-Prediction
Predict people interest in renting specific NYC apartments. The challenge combines structured data, geolocalization, time data, free text and images.
Stars: ✭ 17 (-77.63%)
Mutual labels:  kaggle, lightgbm, gradient-boosting
Auto ml
[UNMAINTAINED] Automated machine learning for analytics & production
Stars: ✭ 1,559 (+1951.32%)
Mutual labels:  lightgbm, automl, gradient-boosting
Kaggler
Code for Kaggle Data Science Competitions
Stars: ✭ 614 (+707.89%)
Mutual labels:  kaggle, automl
Benchmarks
Comparison tools
Stars: ✭ 139 (+82.89%)
Mutual labels:  kaggle, lightgbm
Machine Learning Workflow With Python
This is a comprehensive ML techniques with python: Define the Problem- Specify Inputs & Outputs- Data Collection- Exploratory data analysis -Data Preprocessing- Model Design- Training- Evaluation
Stars: ✭ 157 (+106.58%)
Mutual labels:  kaggle, gradient-boosting
Open Solution Home Credit
Open solution to the Home Credit Default Risk challenge 🏡
Stars: ✭ 397 (+422.37%)
Mutual labels:  kaggle, lightgbm
Kaggle Competition Favorita
5th place solution for Kaggle competition Favorita Grocery Sales Forecasting
Stars: ✭ 169 (+122.37%)
Mutual labels:  kaggle, lightgbm
Chefboost
A Lightweight Decision Tree Framework supporting regular algorithms: ID3, C4,5, CART, CHAID and Regression Trees; some advanced techniques: Gradient Boosting (GBDT, GBRT, GBM), Random Forest and Adaboost w/categorical features support for Python
Stars: ✭ 176 (+131.58%)
Mutual labels:  kaggle, gradient-boosting
Hungabunga
HungaBunga: Brute-Force all sklearn models with all parameters using .fit .predict!
Stars: ✭ 614 (+707.89%)
Mutual labels:  kaggle, automl
fast retraining
Show how to perform fast retraining with LightGBM in different business cases
Stars: ✭ 56 (-26.32%)
Mutual labels:  kaggle, lightgbm
Machinejs
[UNMAINTAINED] Automated machine learning- just give it a data file! Check out the production-ready version of this project at ClimbsRocks/auto_ml
Stars: ✭ 412 (+442.11%)
Mutual labels:  kaggle, automl
decision-trees-for-ml
Building Decision Trees From Scratch In Python
Stars: ✭ 61 (-19.74%)
Mutual labels:  lightgbm, gradient-boosting
kaggle-recruit-restaurant
🏆 Kaggle 8th place solution
Stars: ✭ 102 (+34.21%)
Mutual labels:  kaggle, lightgbm
MSDS696-Masters-Final-Project
Earthquake Prediction Challenge with LightGBM and XGBoost
Stars: ✭ 58 (-23.68%)
Mutual labels:  kaggle, lightgbm
HumanOrRobot
a solution for competition of kaggle `Human or Robot`
Stars: ✭ 16 (-78.95%)
Mutual labels:  kaggle, lightgbm
Open Solution Mapping Challenge
Open solution to the Mapping Challenge 🌎
Stars: ✭ 291 (+282.89%)
Mutual labels:  kaggle, lightgbm
docker-kaggle-ko
머신러닝/딥러닝(PyTorch, TensorFlow) 전용 도커입니다. 한글 폰트, 한글 자연어처리 패키지(konlpy), 형태소 분석기, Timezone 등의 설정 등을 추가 하였습니다.
Stars: ✭ 46 (-39.47%)
Mutual labels:  kaggle, lightgbm

About

This is an experimental Python package that reimplements AutoGBT using LightGBM and Optuna. AutoGBT is an automatically tuned machine learning classifier which won the first prize at NeurIPS'18 AutoML Challenge. AutoGBT has the following features:

  • Automatic Hyperparameter Tuning: the hyperparameters of LightGBM are automatically optimized,
  • Automatic Feature Engineering: simple feature engineering is applied for categorical and datetime features, and
  • Automatic Sampling: data rows are sampled for handling imbalanced and large datasets.

This implementation has the following differences from original AutoGBT:

  1. This implementation uses Optuna for the hyperparameter tuning of LightGBM instead of Hyperopt,
  2. it optimizes k-fold cross-validation AUC score, and
  3. it equips simplified scikit-learn-like API interface.

Installation

$ pip install git+https://github.com/pfnet-research/autogbt-alt.git

or

$ pip install git+ssh://[email protected]/pfnet-research/autogbt-alt.git

Usage

Basic Usage: LightGBM with Automatic Hyperparameter Tuning

from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.metrics import roc_auc_score
from autogbt import AutoGBTClassifier

X, y = load_breast_cancer(return_X_y=True)
train_X, valid_X, train_y, valid_y = train_test_split(X, y, test_size=0.1)
model = AutoGBTClassifier()
model.fit(train_X, train_y)
print('valid AUC: %.3f' % (roc_auc_score(valid_y, model.predict(valid_X))))
print('CV AUC: %.3f' % (model.best_score))

Feature Engineering

from autogbt import Preprocessor

preprocessor = Preprocessor(train_frac=0.5, test_frac=0.5)
train_X, valid_X, train_y = preprocessor.transform(train_X, valid_X, train_y)

Training with Sampling

from autogbt import TrainDataSampler

sampler = TrainDataSampler(train_frac=0.5, valid_frac=0.5)
model = AutoGBTClassifier(sampler=sampler)
model.fit(train_X, train_y)
model.predict(test_X)

Experimental Evaluation

Please see benchmark directory for the details.

Comparison against Vanilla XGBoost and LightGBM

The default values are used for all hyperparameters of AutoGBT, XGBoost and LightGBM.

Airline Dataset

model duration[s] CV AUC
AutoGBT 6515.254±340.231 0.900±0.001
Xgboost 78.561±7.265 0.872±0.000
LightGBM 34.000±2.285 0.891±0.000

Amazon Challenge

model duration[s] CV AUC
AutoGBT 359.834±29.188 0.832±0.002
Xgboost 2.558±0.661 0.749±0.002
LightGBM 1.789±0.165 0.834±0.002

Avazu CTR Prediction

model duration[s] CV AUC
AutoGBT 20322.601±676.702 0.744±0.000
Xgboost OoM OoM
LightGBM OoM OoM

Bank Marketing Data Set

model duration[s] CV AUC
AutoGBT 372.090±32.857 0.925±0.001
Xgboost 2.683±0.204 0.912±0.001
LightGBM 2.406±0.236 0.927±0.001

Parameter Comparison

Performance on various train_frac and n_trials parameters

Testing

$ ./test.sh

Reference

Jobin Wilson and Amit Kumar Meher and Bivin Vinodkumar Bindu and Manoj Sharma and Vishakha Pareek and Santanu Chaudhury and Brejesh Lall, AutoGBT: Automatically Optimized Gradient Boosting Trees for Classifying Large Volume High Cardinality Data Streams under Concept-Drift, 2018, https://github.com/flytxtds/AutoGBT.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].