All Projects → fabsig → GPBoost

fabsig / GPBoost

Licence: other
Combining tree-boosting with Gaussian process and mixed effects models

Programming Languages

C++
36643 projects - #6 most used programming language
fortran
972 projects
r
7636 projects
python
139335 projects - #7 most used programming language
c
50402 projects - #5 most used programming language
CMake
9771 projects

Projects that are alternatives of or similar to GPBoost

Good Papers
I try my best to keep updated cutting-edge knowledge in Machine Learning/Deep Learning and Natural Language Processing. These are my notes on some good papers
Stars: ✭ 248 (-31.11%)
Mutual labels:  gaussian-processes
compv
Insanely fast Open Source Computer Vision library for ARM and x86 devices (Up to #50 times faster than OpenCV)
Stars: ✭ 155 (-56.94%)
Mutual labels:  boosting
metafor
A meta-analysis package for R
Stars: ✭ 174 (-51.67%)
Mutual labels:  mixed-effects
TemporalGPs.jl
Fast inference for Gaussian processes in problems involving time. Partly built on results from https://proceedings.mlr.press/v161/tebbutt21a.html
Stars: ✭ 89 (-75.28%)
Mutual labels:  gaussian-processes
boundary-gp
Know Your Boundaries: Constraining Gaussian Processes by Variational Harmonic Features
Stars: ✭ 21 (-94.17%)
Mutual labels:  gaussian-processes
gp
Differentiable Gaussian Process implementation for PyTorch
Stars: ✭ 18 (-95%)
Mutual labels:  gaussian-processes
Stheno.jl
Probabilistic Programming with Gaussian processes in Julia
Stars: ✭ 233 (-35.28%)
Mutual labels:  gaussian-processes
Machine Learning
A repository of resources for understanding the concepts of machine learning/deep learning.
Stars: ✭ 29 (-91.94%)
Mutual labels:  boosting
approxposterior
A Python package for approximate Bayesian inference and optimization using Gaussian processes
Stars: ✭ 36 (-90%)
Mutual labels:  gaussian-processes
ml course
"Learning Machine Learning" Course, Bogotá, Colombia 2019 #LML2019
Stars: ✭ 22 (-93.89%)
Mutual labels:  gaussian-processes
decision-trees-for-ml
Building Decision Trees From Scratch In Python
Stars: ✭ 61 (-83.06%)
Mutual labels:  boosting
pyrff
pyrff: Python implementation of random fourier feature approximations for gaussian processes
Stars: ✭ 24 (-93.33%)
Mutual labels:  gaussian-processes
kalman-jax
Approximate inference for Markov Gaussian processes using iterated Kalman smoothing, in JAX
Stars: ✭ 84 (-76.67%)
Mutual labels:  gaussian-processes
Metida.jl
Julia package for fitting mixed-effects models with flexible random/repeated covariance structure.
Stars: ✭ 19 (-94.72%)
Mutual labels:  mixed-effects
mango
Parallel Hyperparameter Tuning in Python
Stars: ✭ 241 (-33.06%)
Mutual labels:  gaussian-processes
Bayesian Optimization
Python code for bayesian optimization using Gaussian processes
Stars: ✭ 245 (-31.94%)
Mutual labels:  gaussian-processes
go-bayesopt
A library for doing Bayesian Optimization using Gaussian Processes (blackbox optimizer) in Go/Golang.
Stars: ✭ 47 (-86.94%)
Mutual labels:  gaussian-processes
GPJax
A didactic Gaussian process package for researchers in Jax.
Stars: ✭ 159 (-55.83%)
Mutual labels:  gaussian-processes
sciblox
sciblox - Easier Data Science and Machine Learning
Stars: ✭ 48 (-86.67%)
Mutual labels:  boosting
Statistical-Learning-using-R
This is a Statistical Learning application which will consist of various Machine Learning algorithms and their implementation in R done by me and their in depth interpretation.Documents and reports related to the below mentioned techniques can be found on my Rpubs profile.
Stars: ✭ 27 (-92.5%)
Mutual labels:  boosting

GPBoost icon

GPBoost: Combining Tree-Boosting with Gaussian Process and Mixed Effects Models

Table of Contents

  1. Get Started
  2. Modeling background
  3. News
  4. Open issues - contribute
  5. References
  6. License

Get started

GPBoost is a software library for combining tree-boosting with Gaussian process and grouped random effects models (aka mixed effects models or latent Gaussian models). It also allows for independently applying tree-boosting as well as Gaussian process and (generalized) linear mixed effects models (LMMs and GLMMs). The GPBoost library is predominantly written in C++, it has a C interface, and there exist both a Python package and an R package.

For more information, you may want to have a look at:

Modeling background

The GPBoost library allows for combining tree-boosting with Gaussian process (GP) and grouped random effects models (aka mixed effects models or latent Gaussian models) in order to leverage advantages and remedy drawbacks of these two approaches (see below for a list with advantages and disadvantages of these modeling techniques). In particular, the GPBoost / LaGaBoost algorithms are generalizations of classical boosting algorithms which assume (conditional) independence across samples. Advantages include that (i) this can allow for more efficient learning of predictor functions which, among other things, can translate into increased prediction accuracy, (ii) it can be used as a solution for high-cardinality categorical variables in tree-boosting, and (iii) it can be used for modeling spatial or spatio-temporal data when, e.g., spatial predictions should vary continuously , or smoothly, over space. Further, the GPBoost / LaGaBoost algorithms are non-linear extensions of classical mixed effects or latent Gaussian models, where the linear predictor function is replaced by a non-linear function which is learned using tree-boosting (=arguably, often the "best" approach for tabular data).

GPBoost and LaGaBoost algorithms

The GPBoost library implements two algorithms for combining tree-boosting with Gaussian process and grouped random effects models:

  • The GPBoost algorithm (Sigrist, 2020) for data with a Gaussian likelihood (conditional distribution of data)
  • The LaGaBoost algorithm (Sigrist, 2021) for data with non-Gaussian likelihoods

For Gaussian likelihoods (GPBoost algorithm), it is assumed that the response variable (aka label) y is the sum of a potentially non-linear mean function F(X) and random effects Zb:

y = F(X) + Zb + xi

where xi is an independent error term and X are predictor variables (aka covariates or features).

For non-Gaussian likelihoods (LaGaBoost algorithm), it is assumed that the response variable y follows some distribution p(y|m) and that a (potentially multivariate) parameter m of this distribution is related to a non-linear function F(X) and random effects Zb:

y ~ p(y|m)
m = G(F(X) + Zb)

where G() is a so-called link function.

In the GPBoost library, the random effects can consist of

  • Gaussian processes (including random coefficient processes)
  • Grouped random effects (including nested, crossed, and random coefficient effects)
  • Combinations of the above

Learning the above-mentioned models means learning both the covariance parameters (aka hyperparameters) of the random effects and the predictor function F(X). Both the GPBoost and the LaGaBoost algorithms iteratively learn the covariance parameters and add a tree to the ensemble of trees F(X) using a gradient and/or a Newton boosting step. In the GPBoost library, covariance parameters can (currently) be learned using (Nesterov accelerated) gradient descent, Fisher scoring (aka natural gradient descent), and Nelder-Mead. Further, trees are learned using the LightGBM library.

See Sigrist (2020) and Sigrist (2021) for more details.

Background on Gaussian process and grouped random effects models

Tree-boosting has the following advantages and disadvantages:

Advantages of tree-boosting Disadvantages of tree-boosting
- State-of-the-art prediction accuracy - Assumes conditional independence of samples
- Automatic modeling of non-linearities, discontinuities, and complex high-order interactions - Produces discontinuous predictions for, e.g., spatial data
- Robust to outliers in and multicollinearity among predictor variables - Can have difficulty with high-cardinality categorical variables
- Scale-invariant to monotone transformations of predictor variables
- Automatic handling of missing values in predictor variables

Gaussian process (GPs) and grouped random effects models (aka mixed effects models or latent Gaussian models) have the following advantages and disadvantages:

Advantages of GPs / random effects models Disadvantages of GPs / random effects models
- Probabilistic predictions which allows for uncertainty quantification - Zero or a linear prior mean (predictor, fixed effects) function
- Incorporation of reasonable prior knowledge. E.g. for spatial data: "close samples are more similar to each other than distant samples" and a function should vary continuously / smoothly over space
- Modeling of dependency which, among other things, can allow for more efficient learning of the fixed effects (predictor) function
- Grouped random effects can be used for modeling high-cardinality categorical variables

News

Open issues - contribute

Software issues

Computational issues

  • Add GPU support for Gaussian processes
  • Add CHOLMOD support

Methodological issues

  • Add multivariate models, e.g., using coregionalization
  • Add spatio-temporal Gaussian process models
  • Add possibility to predict latent Gaussian processes and random effects (e.g., random coefficients)
  • Implement more approaches such that computations scale well (memory and time) for Gaussian process models and mixed effects models with more than one grouping variable for non-Gaussian data
  • Support sample weights

References

License

This project is licensed under the terms of the Apache License 2.0. See LICENSE for more information.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].