All Projects → Azure → fast_retraining

Azure / fast_retraining

Licence: MIT license
Show how to perform fast retraining with LightGBM in different business cases

Programming Languages

Jupyter Notebook
11667 projects
python
139335 projects - #7 most used programming language
shell
77523 projects

Projects that are alternatives of or similar to fast retraining

Lightgbm
A fast, distributed, high performance gradient boosting (GBT, GBDT, GBRT, GBM or MART) framework based on decision tree algorithms, used for ranking, classification and many other machine learning tasks.
Stars: ✭ 13,293 (+23637.5%)
Mutual labels:  kaggle, gbdt, gbm, lightgbm, gbrt
JLBoost.jl
A 100%-Julia implementation of Gradient-Boosting Regression Tree algorithms
Stars: ✭ 65 (+16.07%)
Mutual labels:  xgboost, gbdt, lightgbm, gbrt
stackgbm
🌳 Stacked Gradient Boosting Machines
Stars: ✭ 24 (-57.14%)
Mutual labels:  xgboost, gbdt, gbm, lightgbm
Xgboost
Scalable, Portable and Distributed Gradient Boosting (GBDT, GBRT or GBM) Library, for Python, R, Java, Scala, C++ and more. Runs on single machine, Hadoop, Spark, Dask, Flink and DataFlow
Stars: ✭ 22,017 (+39216.07%)
Mutual labels:  xgboost, gbdt, gbm, gbrt
RobustTrees
[ICML 2019, 20 min long talk] Robust Decision Trees Against Adversarial Examples
Stars: ✭ 62 (+10.71%)
Mutual labels:  xgboost, gbdt, gbm, gbrt
Kaggle-Competition-Sberbank
Top 1% rankings (22/3270) code sharing for Kaggle competition Sberbank Russian Housing Market: https://www.kaggle.com/c/sberbank-russian-housing-market
Stars: ✭ 31 (-44.64%)
Mutual labels:  kaggle, xgboost, lightgbm
decision-trees-for-ml
Building Decision Trees From Scratch In Python
Stars: ✭ 61 (+8.93%)
Mutual labels:  xgboost, gbm, lightgbm
Apartment-Interest-Prediction
Predict people interest in renting specific NYC apartments. The challenge combines structured data, geolocalization, time data, free text and images.
Stars: ✭ 17 (-69.64%)
Mutual labels:  kaggle, xgboost, lightgbm
docker-kaggle-ko
머신러닝/딥러닝(PyTorch, TensorFlow) 전용 도커입니다. 한글 폰트, 한글 자연어처리 패키지(konlpy), 형태소 분석기, Timezone 등의 설정 등을 추가 하였습니다.
Stars: ✭ 46 (-17.86%)
Mutual labels:  kaggle, xgboost, lightgbm
HumanOrRobot
a solution for competition of kaggle `Human or Robot`
Stars: ✭ 16 (-71.43%)
Mutual labels:  kaggle, xgboost, lightgbm
MSDS696-Masters-Final-Project
Earthquake Prediction Challenge with LightGBM and XGBoost
Stars: ✭ 58 (+3.57%)
Mutual labels:  kaggle, xgboost, lightgbm
Benchmarks
Comparison tools
Stars: ✭ 139 (+148.21%)
Mutual labels:  kaggle, xgboost, lightgbm
HyperGBM
A full pipeline AutoML tool for tabular data
Stars: ✭ 172 (+207.14%)
Mutual labels:  xgboost, gbm, lightgbm
Open Solution Home Credit
Open solution to the Home Credit Default Risk challenge 🏡
Stars: ✭ 397 (+608.93%)
Mutual labels:  kaggle, xgboost, lightgbm
Mlbox
MLBox is a powerful Automated Machine Learning python library.
Stars: ✭ 1,199 (+2041.07%)
Mutual labels:  kaggle, xgboost, lightgbm
kaggle getting started
Kaggle getting started competition examples
Stars: ✭ 18 (-67.86%)
Mutual labels:  kaggle, xgboost
kaggle-plasticc
Solution to Kaggle's PLAsTiCC Astronomical Classification Competition
Stars: ✭ 50 (-10.71%)
Mutual labels:  kaggle, lightgbm
autogbt-alt
An experimental Python package that reimplements AutoGBT using LightGBM and Optuna.
Stars: ✭ 76 (+35.71%)
Mutual labels:  kaggle, lightgbm
Kaggle Competition Favorita
5th place solution for Kaggle competition Favorita Grocery Sales Forecasting
Stars: ✭ 169 (+201.79%)
Mutual labels:  kaggle, lightgbm
kaggle-berlin
Material of the Kaggle Berlin meetup group!
Stars: ✭ 36 (-35.71%)
Mutual labels:  kaggle, xgboost

Fast Retraining

In this repo we compare two of the fastest boosted decision tree libraries: XGBoost and LightGBM. We will evaluate them across datasets of several domains and different sizes.

On July 25, 2017, we published a blog post evaluating both libraries and discussing the benchmark results. The post is Lessons Learned From Benchmarking Fast Machine Learning Algorithms.

Installation and Setup

The installation instructions can be found here.

Project

In the folder experiments you can find the different experiments of the project. We developed 6 experiments with the CPU and GPU versions of the libraries.

  • Airline
  • BCI
  • Football
  • Planet Kaggle
  • Fraud Detection
  • HIGGS

In the folder experiment/libs there is the common code for the project.

Benchmark

In the following table there are summarized the time results (in seconds) and the ratio of the benchmarks performed in the experiments:

Dataset Experiment Data size Features xgb time:
CPU (GPU)
xgb_hist time:
CPU (GPU)
lgb time:
CPU (GPU)
ratio xgb/lgb:
CPU (GPU)
ratio xgb_hist/lgb:
CPU
(GPU)
Football Link CPU
Link GPU
19673 46 2.27 (7.09) 2.47 (4.58) 0.58 (0.97) 3.90
(7.26)
4.25
(4.69)
Fraud Detection Link CPU
Link GPU
284807 30 4.34 (5.80) 2.01 (1.64) 0.66 (0.29) 6.58
(19.74)
3.04
(5.58)
BCI Link CPU
Link GPU
20497 2048 11.51 (12.93) 41.84 (42.69) 7.31 (2.76) 1.57
(4.67)
5.72
(15.43)
Planet Kaggle Link CPU
Link GPU
40479 2048 313.89 (-) 2115.28 (2028.43) 194.57 (317.68) 1.61
(-)
10.87
(6.38)
HIGGS Link CPU
Link GPU
11000000 28 2996.16 (-) 121.21 (114.88) 119.34 (71.87) 25.10
(-)
1.01
(1.59)
Airline Link CPU
Link GPU
115069017 13 - (-) 1242.09 (1271.91) 1056.20 (645.40) -
(-)
1.17
(1.97)

In the next table we summarize the performance results using the F1-Score.

Dataset Experiment Data size Features xgb F1:
CPU (GPU)
xgb_hist F1:
CPU (GPU)
lgb F1:
CPU (GPU)
Football Link
Link
19673 46 0.458 (0.470) 0.460 (0.472) 0.459 (0.470)
Fraud Detection Link
Link
284807 30 0.824 (0.821) 0.802 (0.814) 0.813 (0.811)
BCI Link
Link
20497 2048 0.110 (0.093) 0.142 (0.120) 0.137 (0.138)
Planet Kaggle Link
Link
40479 2048 0.805 (-) 0.822 (0.822) 0.822 (0.821)
HIGGS Link
Link
11000000 28 0.763 (-) 0.767 (0.767) 0.768 (0.767)
Airline Link
Link
115069017 13 - (-) 0.741 (0.745) 0.732 (0.745)

The experiments were run on an Azure NV24 VM with 24 cores and 224 GB memory. The machine has 4 NVIDIA M60 GPUs. In both cases we used Ubuntu 16.04.

Contributing

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact [email protected] with any additional questions or comments.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].