All Projects → DataCanvasIO → HyperGBM

DataCanvasIO / HyperGBM

Licence: Apache-2.0 License
A full pipeline AutoML tool for tabular data

Programming Languages

python
139335 projects - #7 most used programming language
Dockerfile
14818 projects

Projects that are alternatives of or similar to HyperGBM

AutoTabular
Automatic machine learning for tabular data. ⚡🔥⚡
Stars: ✭ 51 (-70.35%)
Mutual labels:  tabular-data, xgboost, lightgbm, automl, catboost
stackgbm
🌳 Stacked Gradient Boosting Machines
Stars: ✭ 24 (-86.05%)
Mutual labels:  xgboost, gbm, lightgbm, ensemble-learning, catboost
Awesome Decision Tree Papers
A collection of research papers on decision, classification and regression trees with implementations.
Stars: ✭ 1,908 (+1009.3%)
Mutual labels:  xgboost, lightgbm, ensemble-learning, catboost
aws-machine-learning-university-dte
Machine Learning University: Decision Trees and Ensemble Methods
Stars: ✭ 119 (-30.81%)
Mutual labels:  tabular-data, xgboost, lightgbm, catboost
Auto ml
[UNMAINTAINED] Automated machine learning for analytics & production
Stars: ✭ 1,559 (+806.4%)
Mutual labels:  xgboost, lightgbm, automl
mlforecast
Scalable machine 🤖 learning for time series forecasting.
Stars: ✭ 96 (-44.19%)
Mutual labels:  xgboost, lightgbm, dask
datascienv
datascienv is package that helps you to setup your environment in single line of code with all dependency and it is also include pyforest that provide single line of import all required ml libraries
Stars: ✭ 53 (-69.19%)
Mutual labels:  xgboost, lightgbm, catboost
Autogluon
AutoGluon: AutoML for Text, Image, and Tabular Data
Stars: ✭ 3,920 (+2179.07%)
Mutual labels:  tabular-data, ensemble-learning, automl
H2o 3
H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.
Stars: ✭ 5,656 (+3188.37%)
Mutual labels:  gbm, ensemble-learning, automl
Mars
Mars is a tensor-based unified framework for large-scale data computation which scales numpy, pandas, scikit-learn and Python functions.
Stars: ✭ 2,308 (+1241.86%)
Mutual labels:  xgboost, lightgbm, dask
recsys2019
The complete code and notebooks used for the ACM Recommender Systems Challenge 2019
Stars: ✭ 26 (-84.88%)
Mutual labels:  xgboost, lightgbm, catboost
Mlbox
MLBox is a powerful Automated Machine Learning python library.
Stars: ✭ 1,199 (+597.09%)
Mutual labels:  xgboost, lightgbm, automl
Mljar Supervised
Automated Machine Learning Pipeline with Feature Engineering and Hyper-Parameters Tuning 🚀
Stars: ✭ 961 (+458.72%)
Mutual labels:  xgboost, lightgbm, automl
JLBoost.jl
A 100%-Julia implementation of Gradient-Boosting Regression Tree algorithms
Stars: ✭ 65 (-62.21%)
Mutual labels:  xgboost, lightgbm, catboost
My Data Competition Experience
本人多次机器学习与大数据竞赛Top5的经验总结,满满的干货,拿好不谢
Stars: ✭ 271 (+57.56%)
Mutual labels:  xgboost, lightgbm, automl
fast retraining
Show how to perform fast retraining with LightGBM in different business cases
Stars: ✭ 56 (-67.44%)
Mutual labels:  xgboost, gbm, lightgbm
Kaggle-Competition-Sberbank
Top 1% rankings (22/3270) code sharing for Kaggle competition Sberbank Russian Housing Market: https://www.kaggle.com/c/sberbank-russian-housing-market
Stars: ✭ 31 (-81.98%)
Mutual labels:  xgboost, lightgbm, ensemble-learning
decision-trees-for-ml
Building Decision Trees From Scratch In Python
Stars: ✭ 61 (-64.53%)
Mutual labels:  xgboost, gbm, lightgbm
sagemaker-xgboost-container
This is the Docker container based on open source framework XGBoost (https://xgboost.readthedocs.io/en/latest/) to allow customers use their own XGBoost scripts in SageMaker.
Stars: ✭ 93 (-45.93%)
Mutual labels:  xgboost, gbm, distributed-training
ETCI-2021-Competition-on-Flood-Detection
Experiments on Flood Segmentation on Sentinel-1 SAR Imagery with Cyclical Pseudo Labeling and Noisy Student Training
Stars: ✭ 102 (-40.7%)
Mutual labels:  semi-supervised-learning, pseudo-labeling

HyperGBM

Python Versions Downloads PyPI Version

Doc | 中文

We Are Hiring!

Dear folks, we are offering challenging opportunities located in Beijing for both professionals and students who are keen on AutoML/NAS. Come be a part of DataCanvas! Please send your CV to [email protected]. (Application deadline: TBD.)

What is HyperGBM

HyperGBM is a full pipeline automated machine learning (AutoML) toolkit designed for tabular data. It covers the complete end-to-end ML processing stages, consisting of data cleaning, preprocessing, feature generation and selection, model selection and hyperparameter optimization.

Overview

HyperGBM optimizes the end-to-end ML processing stages within one search space, which differs from most existing AutoML approaches that only tackle partial stages, for instance, hyperparameter optimazation. This full pipeline optimization process is very similar to a sequential decision process (SDP). Therefore, HyperGBM utilizes reinforcement learning, Monte Carlo Tree Search, evolution algorithm combined with a meta-learner to efficiently solve the pipeline optimization problem.

HyperGBM, as indicated in the name, involves several gradient boosting tree models (GBM), namely, XGBoost, LightGBM and Catboost. What's more, it could access the Hypernets, a general automated machine learning framework, and introduce its advanced characteristics in data cleaning, feature engineering and model ensemble. Additionally, the search space representation and search algorithm inside Hyper GBM are also supported by Hypernets.

Hypergbm also supports full pipeline GPU acceleration, including all data processing and model training steps. When training with NVIDIA A100, we got a 50x performance improvement! More importantly, the model trained on GPU could be deployed to the environment without GPU hardware and software (CUDA and cuML), which greatly reduces the cost of model deployment.

Tutorial

Installation

Conda

Install HyperGBM with conda from the channel conda-forge:

conda install -c conda-forge hypergbm

On the Windows system, recommend install pyarrow(required by hypernets) 4.0 or earlier version with HyperGBM:

conda install -c conda-forge hypergbm "pyarrow<=4.0"

Pip

Install HyperGBM with different pip options:

  • Typical installation:
pip install hypergbm
  • To run HyperGBM in JupyterLab/Jupyter notebook, install with command:
pip install hypergbm[notebook]
  • To support dataset with simplified Chinese in feature generation,
    • Install jieba package before running HyperGBM.
    • OR install with command:
pip install hypergbm[zhcn]
  • Install all above with one command:
pip install hypergbm[all]

Examples

  • Use HyperGBM with Python

Users can quickly create and run an experiment with make_experiment, which only needs one required input parameter train_data. The example shown below is using the blood dataset as train_data from hypernet.tabular. If the target column of the dataset is not y, it needs to be manually set through the argument target.

An example codes:

from hypergbm import make_experiment
from hypernets.tabular.datasets import dsutils

train_data = dsutils.load_blood()
experiment = make_experiment(train_data, target='Class')
estimator = experiment.run()
print(estimator)

This training experiment returns a pipeline with two default steps, data_clean and estimator. In particular, the estimator returns a final model which consists of various models. The outputs:

Pipeline(steps=[('data_clean',
                 DataCleanStep(...),
                ('estimator',
                 GreedyEnsemble(...)])

To see more examples, please read Quick Start and Examples.

  • Use HyperGBM with Command line tools

Hypergbm also supports command line tools to perform model training, evaluation and prediction. The following codes enable the user to view command line help:

hypergbm -h

usage: hypergbm [-h] [--log-level LOG_LEVEL] [-error] [-warn] [-info] [-debug]
                [--verbose VERBOSE] [-v] [--enable-gpu ENABLE_GPU] [-gpu] 
                [--enable-dask ENABLE_DASK] [-dask] [--overload OVERLOAD]
                {train,evaluate,predict} ...

The example of training a model for dataset blood.csv is shown below:

hypergbm train --train-file=blood.csv --target=Class --model-file=model.pkl

For more details, please read Quick Start.

HyperGBM related projects

  • Hypernets: A general automated machine learning (AutoML) framework.
  • HyperGBM: A full pipeline AutoML tool integrated various GBM models.
  • HyperDT/DeepTables: An AutoDL tool for tabular data.
  • HyperKeras: An AutoDL tool for Neural Architecture Search and Hyperparameter Optimization on Tensorflow and Keras.
  • Cooka: Lightweight interactive AutoML system.

DataCanvas AutoML Toolkit

Documents

DataCanvas

HyperGBM is an open source project created by DataCanvas.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].