All Projects β†’ tlverse β†’ sl3

tlverse / sl3

Licence: GPL-3.0 license
πŸ’ͺ πŸ€” Modern Super Learning with Machine Learning Pipelines

Programming Languages

javascript
184084 projects - #8 most used programming language
r
7636 projects
HTML
75241 projects
CSS
56736 projects

Projects that are alternatives of or similar to sl3

Mlj.jl
A Julia machine learning framework
Stars: ✭ 982 (+955.91%)
Mutual labels:  regression, ensemble-learning, stacking
Stacking-Blending-Voting-Ensembles
This repository contains an example of each of the Ensemble Learning methods: Stacking, Blending, and Voting. The examples for Stacking and Blending were made from scratch, the example for Voting was using the scikit-learn utility.
Stars: ✭ 34 (-63.44%)
Mutual labels:  ensemble-learning, ensemble-model, stacking
modeltime.ensemble
Time Series Ensemble Forecasting
Stars: ✭ 65 (-30.11%)
Mutual labels:  ensemble-learning, r-package, stacking
Mlr
Machine Learning in R
Stars: ✭ 1,542 (+1558.06%)
Mutual labels:  regression, r-package, stacking
BAS
BAS R package https://merliseclyde.github.io/BAS/
Stars: ✭ 36 (-61.29%)
Mutual labels:  regression, model-selection, r-package
Vecstack
Python package for stacking (machine learning technique)
Stars: ✭ 587 (+531.18%)
Mutual labels:  ensemble-learning, stacking
Mlr3pipelines
Dataflow Programming for Machine Learning in R
Stars: ✭ 96 (+3.23%)
Mutual labels:  ensemble-learning, stacking
Automlpipeline.jl
A package that makes it trivial to create and evaluate machine learning pipeline architectures.
Stars: ✭ 223 (+139.78%)
Mutual labels:  ensemble-learning, stacking
Mlens
ML-Ensemble – high performance ensemble learning
Stars: ✭ 680 (+631.18%)
Mutual labels:  ensemble-learning, stacking
Stacking
Stacked Generalization (Ensemble Learning)
Stars: ✭ 173 (+86.02%)
Mutual labels:  ensemble-learning, stacking
Mlbox
MLBox is a powerful Automated Machine Learning python library.
Stars: ✭ 1,199 (+1189.25%)
Mutual labels:  regression, stacking
Lightautoml
LAMA - automatic model creation framework
Stars: ✭ 196 (+110.75%)
Mutual labels:  regression, stacking
imbalanced-ensemble
Class-imbalanced / Long-tailed ensemble learning in Python. Modular, flexible, and extensible. | ζ¨‘ε—εŒ–γ€η΅ζ΄»γ€ζ˜“ζ‰©ε±•ηš„η±»εˆ«δΈεΉ³θ‘‘/ι•Ώε°ΎζœΊε™¨ε­¦δΉ εΊ“
Stars: ✭ 199 (+113.98%)
Mutual labels:  ensemble-learning, ensemble-model
pycobra
python library implementing ensemble methods for regression, classification and visualisation tools including Voronoi tesselations.
Stars: ✭ 111 (+19.35%)
Mutual labels:  regression, ensemble-learning
VOSONDash
R Shiny application for interactive analysis of networks created by vosonSML.
Stars: ✭ 44 (-52.69%)
Mutual labels:  r-package
bird species classification
Supervised Classification of bird species 🐦 in high resolution images, especially for, Himalayan birds, having diverse species with fairly low amount of labelled data
Stars: ✭ 59 (-36.56%)
Mutual labels:  ensemble-learning
LandR
Landscape Ecosystem Modelling in R
Stars: ✭ 14 (-84.95%)
Mutual labels:  r-package
geoknife
R tools for geo-web processing of gridded data via the Geo Data Portal. geoknife slices up gridded data according to overlap with irregular features, such as watersheds, lakes, points, etc.
Stars: ✭ 64 (-31.18%)
Mutual labels:  r-package
WeightedTreemaps
Create Voronoi and Sunburst Treemaps from Hierarchical data
Stars: ✭ 33 (-64.52%)
Mutual labels:  r-package
tbltools
πŸ—œπŸ”’ Tools for Working with Tibbles
Stars: ✭ 34 (-63.44%)
Mutual labels:  r-package

R/sl3: Super Machine Learning with Pipelines

R-CMD-check Coverage Status Project Status: Active – The project has reached a stable, usable state and is being actively developed. License: GPL v3 DOI

A flexible implementation of the Super Learner ensemble machine learning system

Authors: Jeremy Coyle, Nima Hejazi, Ivana Malenica, Rachael Phillips, and Oleg Sofrygin


What’s sl3?

sl3 is an implementation of the Super Learner ensemble machine learning algorithm of van der Laan, Polley, and Hubbard (2007). The Super Learner algorithm performs ensemble learning in one of two fashions:

  1. The discrete Super Learner can be used to select the best prediction algorithm from among a supplied library of machine learning algorithms (β€œlearners” in the sl3 nomenclature) – that is, the discrete Super Learner is the single learning algorithm that minimizes the cross-validated risk.
  2. The ensemble Super Learner can be used to assign weights to a set of specified learning algorithms (from a user-supplied library of such algorithms) so as to create a combination of these learners that minimizes the cross-validated risk. This notion of weighted combinations has also been referred to as stacked regression (Breiman 1996) and stacked generalization (Wolpert 1992).

Looking for long-form documentation or a walkthrough of the sl3 package? Don’t worry! Just browse the chapter in our book.


Installation

Install the most recent version from the master branch on GitHub via remotes:

remotes::install_github("tlverse/sl3")

Past stable releases may be located via the releases page on GitHub and may be installed by including the appropriate major version tag. For example,

remotes::install_github("tlverse/[email protected]")

To contribute, check out the devel branch and consider submitting a pull request.


Issues

If you encounter any bugs or have any specific feature requests, please file an issue.


Examples

sl3 makes the process of applying screening algorithms, learning algorithms, combining both types of algorithms into a stacked regression model, and cross-validating this whole process essentially trivial. The best way to understand this is to see the sl3 package in action:

set.seed(49753)
library(tidyverse)
library(data.table)
library(SuperLearner)
library(origami)
library(sl3)

# load example data set
data(cpp)
cpp <- cpp %>%
  dplyr::filter(!is.na(haz)) %>%
  mutate_all(~ replace(., is.na(.), 0))

# use covariates of intest and the outcome to build a task object
covars <- c("apgar1", "apgar5", "parity", "gagebrth", "mage", "meducyrs",
            "sexn")
task <- sl3_Task$new(
  data = cpp,
  covariates = covars,
  outcome = "haz"
)

# set up screeners and learners via built-in functions and pipelines
slscreener <- Lrnr_pkg_SuperLearner_screener$new("screen.glmnet")
glm_learner <- Lrnr_glm$new()
screen_and_glm <- Pipeline$new(slscreener, glm_learner)
SL.glmnet_learner <- Lrnr_pkg_SuperLearner$new(SL_wrapper = "SL.glmnet")

# stack learners into a model (including screeners and pipelines)
learner_stack <- Stack$new(SL.glmnet_learner, glm_learner, screen_and_glm)
stack_fit <- learner_stack$train(task)
preds <- stack_fit$predict()
head(preds)
#>    Lrnr_pkg_SuperLearner_SL.glmnet Lrnr_glm_TRUE
#> 1:                       0.3525946    0.36298498
#> 2:                       0.3525946    0.36298498
#> 3:                       0.2442593    0.25993072
#> 4:                       0.2442593    0.25993072
#> 5:                       0.2442593    0.25993072
#> 6:                       0.0269504    0.05680264
#>    Pipeline(Lrnr_pkg_SuperLearner_screener_screen.glmnet->Lrnr_glm_TRUE)
#> 1:                                                            0.36228209
#> 2:                                                            0.36228209
#> 3:                                                            0.25870995
#> 4:                                                            0.25870995
#> 5:                                                            0.25870995
#> 6:                                                            0.05600958

Parallelization with futures

While it’s straightforward to fit a stack of learners (as above), it’s easy to take advantage of sl3’s built-in parallelization support too. To do this, you can simply choose a plan() from the future ecosystem.

# let's load the future package and set 4 cores for parallelization
library(future)
plan(multicore, workers = 4L)

# now, let's re-train our Stack in parallel
stack_fit <- learner_stack$train(task)
preds <- stack_fit$predict()

Controlling the number of CV folds

In the above examples, we fit stacks of learners, but didn’t create a Super Learner ensemble, which uses cross-validation (CV) to build the ensemble model. For the sake of computational expedience, we may be interested in lowering the number of CV folds (from 10). Let’s take a look at how to do both below.

# first, let's instantiate some more learners and create a Super Learner
mean_learner <- Lrnr_mean$new()
rf_learner <- Lrnr_ranger$new()
sl <- Lrnr_sl$new(mean_learner, glm_learner, rf_learner)

# CV folds are controlled in the sl3_Task object; we can lower the number of
# folds simply by specifying this in creating the Task
task <- sl3_Task$new(
  data = cpp,
  covariates = covars,
  outcome = "haz",
  folds = 5L
)

# now, let's fit the Super Learner with just 5-fold CV, then get predictions
sl_fit <- sl$train(task)
sl_preds <- sl_fit$predict()

The folds argument to sl3_Task supports both integers (for V-fold CV) and all of the CV schemes supported in the origami package. To see the full list, query ?fold_funs from within R or take a look at origami’s online documentation.


Learner Properties

Properties supported by sl3 learners are presented in the following table:

binomial

categorical

continuous

cv

density

h2o

ids

importance

offset

preprocessing

sampling

screener

timeseries

weights

wrapper

Lrnr_arima

x

x

√

x

x

x

x

x

x

x

x

x

√

x

x

Lrnr_bartMachine

√

x

√

x

x

x

x

x

x

x

x

x

x

x

x

Lrnr_bayesglm

√

x

√

x

x

x

x

x

√

x

x

x

x

√

x

Lrnr_bilstm

x

x

√

x

x

x

x

x

x

x

x

x

√

x

x

Lrnr_bound

√

√

√

x

x

x

x

x

x

x

x

x

x

√

√

Lrnr_caret

√

√

√

x

x

x

x

x

x

x

x

x

x

x

√

Lrnr_cv

x

x

x

√

x

x

x

x

x

x

x

x

x

x

√

Lrnr_cv_selector

√

√

√

x

x

x

x

x

x

x

x

x

x

√

√

Lrnr_dbarts

√

x

√

x

x

x

x

x

x

x

x

x

x

√

x

Lrnr_define_interactions

x

x

x

x

x

x

x

x

x

√

x

x

x

x

x

Lrnr_density_discretize

x

x

x

x

√

x

x

x

x

x

x

x

x

x

x

Lrnr_density_hse

x

x

x

x

√

x

x

x

x

x

x

x

x

x

x

Lrnr_density_semiparametric

x

x

x

x

√

x

x

x

x

x

√

x

x

x

x

Lrnr_earth

√

x

√

x

x

x

x

x

x

x

x

x

x

x

x

Lrnr_expSmooth

x

x

√

x

x

x

x

x

x

x

x

x

√

x

x

Lrnr_gam

√

x

√

x

x

x

x

x

x

x

x

x

x

x

x

Lrnr_gbm

√

x

√

x

x

x

x

x

x

x

x

x

x

x

x

Lrnr_glm

√

x

√

x

x

x

x

x

√

x

x

x

x

√

x

Lrnr_glm_fast

√

x

√

x

x

x

x

x

√

x

x

x

x

√

x

Lrnr_glmnet

√

√

√

x

x

x

√

x

x

x

x

x

x

√

x

Lrnr_grf

√

√

√

x

x

x

x

x

x

x

x

x

x

√

x

Lrnr_gru_keras

√

√

√

x

x

x

x

x

x

x

x

x

√

x

x

Lrnr_gts

x

x

√

x

x

x

x

x

x

x

x

x

√

x

x

Lrnr_h2o_glm

√

√

√

x

x

√

x

x

√

x

x

x

x

√

x

Lrnr_h2o_grid

√

√

√

x

x

√

x

x

√

x

x

x

x

√

x

Lrnr_hal9001

√

x

√

x

x

x

√

x

x

x

x

x

x

√

x

Lrnr_haldensify

x

x

x

x

√

x

x

x

x

x

x

x

x

x

x

Lrnr_HarmonicReg

x

x

√

x

x

x

x

x

x

x

x

x

√

x

x

Lrnr_hts

x

x

√

x

x

x

x

x

x

x

x

x

√

x

x

Lrnr_independent_binomial

x

√

x

x

x

x

x

x

x

x

x

x

x

x

x

Lrnr_lightgbm

√

√

√

x

x

x

x

√

√

x

x

x

x

√

x

Lrnr_lstm_keras

√

√

√

x

x

x

x

x

x

x

x

x

√

x

x

Lrnr_mean

√

√

√

x

x

x

x

x

√

x

x

x

x

√

x

Lrnr_multiple_ts

x

x

√

x

x

x

x

x

x

x

x

x

√

x

x

Lrnr_multivariate

x

√

x

x

x

x

x

x

x

x

x

x

x

x

x

Lrnr_nnet

√

√

√

x

x

x

x

x

x

x

x

x

x

√

x

Lrnr_nnls

x

x

√

x

x

x

x

x

x

x

x

x

x

x

x

Lrnr_optim

√

√

√

x

x

x

x

x

√

x

x

x

x

√

x

Lrnr_pca

x

x

x

x

x

x

x

x

x

√

x

x

x

x

x

Lrnr_pkg_SuperLearner

√

x

√

x

x

x

√

x

x

x

x

x

x

√

√

Lrnr_pkg_SuperLearner_method

√

x

√

x

x

x

x

x

x

x

x

x

x

√

√

Lrnr_pkg_SuperLearner_screener

√

x

√

x

x

x

√

x

x

x

x

x

x

√

√

Lrnr_polspline

√

√

√

x

x

x

x

x

x

x

x

x

x

√

x

Lrnr_pooled_hazards

x

√

x

x

x

x

x

x

x

x

x

x

x

x

x

Lrnr_randomForest

√

√

√

x

x

x

x

√

x

x

x

x

x

x

x

Lrnr_ranger

√

√

√

x

x

x

x

√

x

x

x

x

x

√

x

Lrnr_revere_task

x

x

x

√

x

x

x

x

x

x

x

x

x

x

√

Lrnr_rpart

√

√

√

x

x

x

x

x

x

x

x

x

x

√

x

Lrnr_rugarch

x

x

√

x

x

x

x

x

x

x

x

x

√

x

x

Lrnr_screener_augment

x

x

x

x

x

x

x

x

x

x

x

√

x

x

x

Lrnr_screener_coefs

x

x

x

x

x

x

x

x

x

x

x

√

x

x

x

Lrnr_screener_correlation

√

√

√

x

x

x

x

x

x

x

x

√

x

x

x

Lrnr_screener_importance

x

x

x

x

x

x

x

x

x

x

x

√

x

x

x

Lrnr_sl

x

x

x

√

x

x

x

x

x

x

x

x

x

x

√

Lrnr_solnp

√

√

√

x

x

x

x

x

√

x

x

x

x

√

x

Lrnr_solnp_density

x

x

x

x

√

x

x

x

x

x

x

x

x

x

x

Lrnr_stratified

√

x

√

x

x

x

x

x

x

x

x

x

x

x

√

Lrnr_subset_covariates

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

Lrnr_svm

√

√

√

x

x

x

x

x

x

x

x

x

x

x

x

Lrnr_ts_weights

x

x

x

√

x

x

x

x

x

x

x

x

x

x

√

Lrnr_tsDyn

x

x

√

x

x

x

x

x

x

x

x

x

√

x

x

Lrnr_xgboost

√

√

√

x

x

x

x

√

√

x

x

x

x

√

x


Contributions

Contributions are very welcome. Interested contributors should consult our contribution guidelines prior to submitting a pull request.


Citation

After using the sl3 R package, please cite the following:

 @software{coyle2021sl3-rpkg,
      author = {Coyle, Jeremy R and Hejazi, Nima S and Malenica, Ivana and
        Phillips, Rachael V and Sofrygin, Oleg},
      title = {{sl3}: Modern Pipelines for Machine Learning and {Super
        Learning}},
      year = {2021},
      howpublished = {\url{https://github.com/tlverse/sl3}},
      note = {{R} package version 1.4.2},
      url = {https://doi.org/10.5281/zenodo.1342293},
      doi = {10.5281/zenodo.1342293}
    }

License

Β© 2017-2021 Jeremy R. Coyle, Nima S. Hejazi, Ivana Malenica, Rachael V. Phillips, Oleg Sofrygin

The contents of this repository are distributed under the GPL-3 license. See file LICENSE for details.


References

Breiman, Leo. 1996. β€œStacked Regressions.” Machine Learning 24 (1): 49–64.

van der Laan, Mark J, Eric C Polley, and Alan E Hubbard. 2007. β€œSuper Learner.” Statistical Applications in Genetics and Molecular Biology 6 (1).

Wolpert, David H. 1992. β€œStacked Generalization.” Neural Networks 5 (2): 241–59.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].