All Projects → ikki407 → Stacking

ikki407 / Stacking

Licence: mit
Stacked Generalization (Ensemble Learning)

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Stacking

handson-ml
도서 "핸즈온 머신러닝"의 예제와 연습문제를 담은 주피터 노트북입니다.
Stars: ✭ 285 (+64.74%)
Mutual labels:  scikit-learn, xgboost, ensemble-learning
Mlbox
MLBox is a powerful Automated Machine Learning python library.
Stars: ✭ 1,199 (+593.06%)
Mutual labels:  prediction, xgboost, stacking
Predicting real estate prices using scikit Learn
Predicting Amsterdam house / real estate prices using Ordinary Least Squares-, XGBoost-, KNN-, Lasso-, Ridge-, Polynomial-, Random Forest-, and Neural Network MLP Regression (via scikit-learn)
Stars: ✭ 78 (-54.91%)
Mutual labels:  xgboost, ensemble-learning
Dc Hi guides
[Data Castle 算法竞赛] 精品旅行服务成单预测 final rank 11
Stars: ✭ 83 (-52.02%)
Mutual labels:  xgboost, stacking
Btctrading
Time Series Forecast with Bitcoin value, to detect upward/down trends with Machine Learning Algorithms
Stars: ✭ 99 (-42.77%)
Mutual labels:  prediction, xgboost
Mlj.jl
A Julia machine learning framework
Stars: ✭ 982 (+467.63%)
Mutual labels:  ensemble-learning, stacking
Tpot
A Python Automated Machine Learning tool that optimizes machine learning pipelines using genetic programming.
Stars: ✭ 8,378 (+4742.77%)
Mutual labels:  scikit-learn, xgboost
Mlr3pipelines
Dataflow Programming for Machine Learning in R
Stars: ✭ 96 (-44.51%)
Mutual labels:  ensemble-learning, stacking
Hyperparameter hunter
Easy hyperparameter optimization and automatic result saving across machine learning algorithms and libraries
Stars: ✭ 648 (+274.57%)
Mutual labels:  scikit-learn, xgboost
Auto ml
[UNMAINTAINED] Automated machine learning for analytics & production
Stars: ✭ 1,559 (+801.16%)
Mutual labels:  scikit-learn, xgboost
Dat8
General Assembly's 2015 Data Science course in Washington, DC
Stars: ✭ 1,516 (+776.3%)
Mutual labels:  scikit-learn, ensemble-learning
Nyoka
Nyoka is a Python library to export ML/DL models into PMML (PMML 4.4.1 Standard).
Stars: ✭ 127 (-26.59%)
Mutual labels:  scikit-learn, xgboost
Mljar Supervised
Automated Machine Learning Pipeline with Feature Engineering and Hyper-Parameters Tuning 🚀
Stars: ✭ 961 (+455.49%)
Mutual labels:  scikit-learn, xgboost
Machine Learning Alpine
Alpine Container for Machine Learning
Stars: ✭ 30 (-82.66%)
Mutual labels:  scikit-learn, xgboost
Mlens
ML-Ensemble – high performance ensemble learning
Stars: ✭ 680 (+293.06%)
Mutual labels:  ensemble-learning, stacking
Xcessiv
A web-based application for quick, scalable, and automated hyperparameter tuning and stacked ensembling in Python.
Stars: ✭ 1,255 (+625.43%)
Mutual labels:  scikit-learn, ensemble-learning
Awesome Decision Tree Papers
A collection of research papers on decision, classification and regression trees with implementations.
Stars: ✭ 1,908 (+1002.89%)
Mutual labels:  xgboost, ensemble-learning
Openscoring
REST web service for the true real-time scoring (<1 ms) of Scikit-Learn, R and Apache Spark models
Stars: ✭ 536 (+209.83%)
Mutual labels:  scikit-learn, xgboost
Vecstack
Python package for stacking (machine learning technique)
Stars: ✭ 587 (+239.31%)
Mutual labels:  ensemble-learning, stacking
Dtreeviz
A python library for decision tree visualization and model interpretation.
Stars: ✭ 1,857 (+973.41%)
Mutual labels:  scikit-learn, xgboost

Stacking (stacked generalization)

PyPI version license

Overview

ikki407/stacking - Simple and useful stacking library, written in Python.

User can use models of scikit-learn, XGboost, and Keras for stacking.
As a feature of this library, all out-of-fold predictions can be saved for further analisys after training.

Description

Stacking (sometimes called stacked generalization) involves training a learning algorithm to combine the predictions of several other learning algorithms. The basic idea is to use a pool of base classifiers, then using another classifier to combine their predictions, with the aim of reducing the generalization error.

This blog is very helpful to understand stacking and ensemble learning.

Usage

See working example:

To run these examples, just run sh run.sh. Note that:

  1. Set train and test dataset under data/input

  2. Created features from original dataset need to be under data/output/features

  3. Models for stacking are defined in scripts.py under scripts folder

  4. Need to define created features in that scripts

  5. Just run sh run.sh (python scripts/XXX.py).

Detailed Usage

  1. Set train dataset with its target data and test dataset.

    FEATURE_LIST_stage1 = {
                    'train':(
                             INPUT_PATH + 'train.csv',
                             FEATURES_PATH + 'train_log.csv',
                            ),
    
                    'target':(
                             INPUT_PATH + 'target.csv',
                            ),
    
                    'test':(
                             INPUT_PATH + 'test.csv',
                             FEATURES_PATH + 'test_log.csv',
                            ),
                    }
    
  2. Define model classes that inherit BaseModel class, which are used in Stage 1, Stage 2, ..., Stage N.

    # For Stage 1
    PARAMS_V1 = {
            'colsample_bytree':0.80,
            'learning_rate':0.1,"eval_metric":"auc",
            'max_depth':5, 'min_child_weight':1,
            'nthread':4,
            'objective':'binary:logistic','seed':407,
            'silent':1, 'subsample':0.60,
            }
    
    class ModelV1(BaseModel):
            def build_model(self):
                return XGBClassifier(params=self.params, num_round=10)
    
    ...
    
    # For Stage 2
    PARAMS_V1_stage2 = {
                        'penalty':'l2',
                        'tol':0.0001, 
                        'C':1.0, 
                        'random_state':None, 
                        'verbose':0, 
                        'n_jobs':8
                        }
    
    class ModelV1_stage2(BaseModel):
            def build_model(self):
                return LR(**self.params)
    
  3. Train each models of Stage 1 for stacking.

    m = ModelV1(name="v1_stage1",
                flist=FEATURE_LIST_stage1,
                params = PARAMS_V1,
                kind = 'st'
                )
    m.run()
    
    ...
    
  4. Train each model(s) of Stage 2 by using the prediction of Stage-1 models.

    FEATURE_LIST_stage2 = {
                'train': (
                         TEMP_PATH + 'v1_stage1_all_fold.csv',
                         TEMP_PATH + 'v2_stage1_all_fold.csv',
                         TEMP_PATH + 'v3_stage1_all_fold.csv',
                         TEMP_PATH + 'v4_stage1_all_fold.csv',
                         ...
                         ),
    
                'target':(
                         INPUT_PATH + 'target.csv',
                         ),
    
                'test': (
                        TEMP_PATH + 'v1_stage1_test.csv',
                        TEMP_PATH + 'v2_stage1_test.csv',
                        TEMP_PATH + 'v3_stage1_test.csv',
                        TEMP_PATH + 'v4_stage1_test.csv',
                        ...                     
                        ),
                }
    
    # Models
    m = ModelV1_stage2(name="v1_stage2",
                    flist=FEATURE_LIST_stage2,
                    params = PARAMS_V1_stage2,
                    kind = 'st',
                    )
    m.run()
    
  5. Final result is saved as v1_stage2_TestInAllTrainingData.csv.

Prerequisite

  • (MaxOS) Install xgboost first manually: pip install xgboost
  • (Optional) Install paratext: fast csv loading library

Installation

To install stacking, cd to the stacking folder and run the install command**(up-to-date version, recommended)**:

sudo python setup.py install

You can also install stacking from PyPI:

pip install stacking

Files

Details of scripts

  • base.py:
    • Base models for stacking are defined here (using sklearn.base.BaseEstimator).
    • Some models are defined here. e.g., XGBoost, Keras, Vowpal Wabbit.
    • These models are wrapped as scikit-learn like (using sklearn.base.ClassifierMixin, sklearn.base.RegressorMixin).
    • That is, model class has some methods, fit(), predict_proba(), and predict().

New user-defined models can be added here.

Scikit-learn models can be used.

Base model have some arguments.

  • 's': Stacking. Saving oof(out-of-fold) prediction({model_name}_all_fold.csv) and average of test prediction based on train-fold models({model_name}_test.csv). These files will be used for next level stacking.

  • 't': Training with all data and predict test({model_name}_TestInAllTrainingData.csv). In this training, no validation data are used.

  • 'st': Stacking and then training with all data and predict test ('s' and 't').

  • 'cv': Only cross validation without saving the prediction.

Define several models and its parameters used for stacking. Define task details on the top of script. Train and test feature set are defined here. Need to define CV-fold index.

Any level stacking can be defined.

PredictionFiles

Reference

[1] Wolpert, David H. Stacked generalization, Neural Networks, 5(2), 241-259

[2] Ensemble learning(Stacking)

[3] KAGGLE ENSEMBLING GUIDE

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].