datascienv is package that helps you to setup your environment in single line of code with all dependency and it is also include pyforest that provide single line of import all required ml libraries

Stars: ✭ 53 (-69.19%)

Mutual labels: xgboost, lightgbm, catboost

Autogluon

AutoGluon: AutoML for Text, Image, and Tabular Data

Stars: ✭ 3,920 (+2179.07%)

Mutual labels: tabular-data, ensemble-learning, automl

H2o 3

H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.

Stars: ✭ 5,656 (+3188.37%)

Mutual labels: gbm, ensemble-learning, automl

Mars

Mars is a tensor-based unified framework for large-scale data computation which scales numpy, pandas, scikit-learn and Python functions.

Stars: ✭ 2,308 (+1241.86%)

Mutual labels: xgboost, lightgbm, dask

recsys2019

The complete code and notebooks used for the ACM Recommender Systems Challenge 2019

Stars: ✭ 26 (-84.88%)

Mutual labels: xgboost, lightgbm, catboost

Mlbox

MLBox is a powerful Automated Machine Learning python library.

Stars: ✭ 1,199 (+597.09%)

Mutual labels: xgboost, lightgbm, automl

Mljar Supervised

Automated Machine Learning Pipeline with Feature Engineering and Hyper-Parameters Tuning 🚀

Stars: ✭ 961 (+458.72%)

Mutual labels: xgboost, lightgbm, automl

JLBoost.jl

A 100%-Julia implementation of Gradient-Boosting Regression Tree algorithms

Stars: ✭ 65 (-62.21%)

Mutual labels: xgboost, lightgbm, catboost

My Data Competition Experience

本人多次机器学习与大数据竞赛Top5的经验总结，满满的干货，拿好不谢

Stars: ✭ 271 (+57.56%)

Mutual labels: xgboost, lightgbm, automl

fast retraining

Show how to perform fast retraining with LightGBM in different business cases

Stars: ✭ 56 (-67.44%)

Mutual labels: xgboost, gbm, lightgbm

Kaggle-Competition-Sberbank

Top 1% rankings (22/3270) code sharing for Kaggle competition Sberbank Russian Housing Market: https://www.kaggle.com/c/sberbank-russian-housing-market

Stars: ✭ 31 (-81.98%)

Mutual labels: xgboost, lightgbm, ensemble-learning

decision-trees-for-ml

Building Decision Trees From Scratch In Python

Stars: ✭ 61 (-64.53%)

Mutual labels: xgboost, gbm, lightgbm

sagemaker-xgboost-container

This is the Docker container based on open source framework XGBoost (https://xgboost.readthedocs.io/en/latest/) to allow customers use their own XGBoost scripts in SageMaker.

Stars: ✭ 93 (-45.93%)

Mutual labels: xgboost, gbm, distributed-training

ETCI-2021-Competition-on-Flood-Detection

Experiments on Flood Segmentation on Sentinel-1 SAR Imagery with Cyclical Pseudo Labeling and Noisy Student Training

Stars: ✭ 102 (-40.7%)

Mutual labels: semi-supervised-learning, pseudo-labeling

View All Similar Projects ➔

HyperGBM

Doc | 中文

We Are Hiring！

Dear folks, we are offering challenging opportunities located in Beijing for both professionals and students who are keen on AutoML/NAS. Come be a part of DataCanvas! Please send your CV to [email protected]. (Application deadline: TBD.)

What is HyperGBM

HyperGBM is a full pipeline automated machine learning (AutoML) toolkit designed for tabular data. It covers the complete end-to-end ML processing stages, consisting of data cleaning, preprocessing, feature generation and selection, model selection and hyperparameter optimization.

Overview

HyperGBM optimizes the end-to-end ML processing stages within one search space, which differs from most existing AutoML approaches that only tackle partial stages, for instance, hyperparameter optimazation. This full pipeline optimization process is very similar to a sequential decision process (SDP). Therefore, HyperGBM utilizes reinforcement learning, Monte Carlo Tree Search, evolution algorithm combined with a meta-learner to efficiently solve the pipeline optimization problem.

HyperGBM, as indicated in the name, involves several gradient boosting tree models (GBM), namely, XGBoost, LightGBM and Catboost. What's more, it could access the Hypernets, a general automated machine learning framework, and introduce its advanced characteristics in data cleaning, feature engineering and model ensemble. Additionally, the search space representation and search algorithm inside Hyper GBM are also supported by Hypernets.

Hypergbm also supports full pipeline GPU acceleration, including all data processing and model training steps. When training with NVIDIA A100, we got a 50x performance improvement! More importantly, the model trained on GPU could be deployed to the environment without GPU hardware and software (CUDA and cuML), which greatly reduces the cost of model deployment.

Tutorial

Installation

Conda

Install HyperGBM with conda from the channel conda-forge:

conda install -c conda-forge hypergbm

On the Windows system, recommend install pyarrow(required by hypernets) 4.0 or earlier version with HyperGBM:

conda install -c conda-forge hypergbm "pyarrow<=4.0"

Pip

Install HyperGBM with different pip options:

Typical installation:

pip install hypergbm

To run HyperGBM in JupyterLab/Jupyter notebook, install with command:

pip install hypergbm[notebook]

To support dataset with simplified Chinese in feature generation,
- Install jieba package before running HyperGBM.
- OR install with command:

pip install hypergbm[zhcn]

Install all above with one command:

pip install hypergbm[all]

Examples

Use HyperGBM with Python

Users can quickly create and run an experiment with make_experiment, which only needs one required input parameter train_data. The example shown below is using the blood dataset as train_data from hypernet.tabular. If the target column of the dataset is not y, it needs to be manually set through the argument target.

An example codes:

from hypergbm import make_experiment
from hypernets.tabular.datasets import dsutils

train_data = dsutils.load_blood()
experiment = make_experiment(train_data, target='Class')
estimator = experiment.run()
print(estimator)

This training experiment returns a pipeline with two default steps, data_clean and estimator. In particular, the estimator returns a final model which consists of various models. The outputs：

Pipeline(steps=[('data_clean',
                 DataCleanStep(...),
                ('estimator',
                 GreedyEnsemble(...)])

To see more examples, please read Quick Start and Examples.

Use HyperGBM with Command line tools

Hypergbm also supports command line tools to perform model training, evaluation and prediction. The following codes enable the user to view command line help:

hypergbm -h

usage: hypergbm [-h] [--log-level LOG_LEVEL] [-error] [-warn] [-info] [-debug]
                [--verbose VERBOSE] [-v] [--enable-gpu ENABLE_GPU] [-gpu] 
                [--enable-dask ENABLE_DASK] [-dask] [--overload OVERLOAD]
                {train,evaluate,predict} ...

The example of training a model for dataset blood.csv is shown below:

hypergbm train --train-file=blood.csv --target=Class --model-file=model.pkl

For more details, please read Quick Start.

HyperGBM related projects

Hypernets: A general automated machine learning (AutoML) framework.
HyperGBM: A full pipeline AutoML tool integrated various GBM models.
HyperDT/DeepTables: An AutoDL tool for tabular data.
HyperKeras: An AutoDL tool for Neural Architecture Search and Hyperparameter Optimization on Tensorflow and Keras.
Cooka: Lightweight interactive AutoML system.

Documents

DataCanvas

HyperGBM is an open source project created by DataCanvas.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

DataCanvasIO / HyperGBM

Programming Languages

Labels

Projects that are alternatives of or similar to HyperGBM

HyperGBM

We Are Hiring！

What is HyperGBM

Overview

Tutorial

Installation

Conda

Pip

Examples

HyperGBM related projects

Documents

DataCanvas