All Projects → malmgrek → gammy

malmgrek / gammy

Licence: MIT license
🐙 Generalized additive models in Python with a Bayesian twist

Programming Languages

python
139335 projects - #7 most used programming language
Nix
1067 projects

Projects that are alternatives of or similar to gammy

variational-bayes-cs
Scalable sparse Bayesian learning for large CS recovery problems
Stars: ✭ 17 (-73.85%)
Mutual labels:  bayesian-inference
cgpm
Library of composable generative population models which serve as the modeling and inference backend of BayesDB.
Stars: ✭ 24 (-63.08%)
Mutual labels:  bayesian-inference
webmc3
A web interface for exploring PyMC3 traces
Stars: ✭ 46 (-29.23%)
Mutual labels:  bayesian-inference
svae cf
[ WSDM '19 ] Sequential Variational Autoencoders for Collaborative Filtering
Stars: ✭ 38 (-41.54%)
Mutual labels:  bayesian-inference
metaBMA
Bayesian Model Averaging for Random and Fixed Effects Meta-Analysis
Stars: ✭ 20 (-69.23%)
Mutual labels:  bayesian-inference
TransformVariables.jl
Transformations to contrained variables from ℝⁿ.
Stars: ✭ 52 (-20%)
Mutual labels:  bayesian-inference
pysgmcmc
Bayesian Deep Learning with Stochastic Gradient MCMC Methods
Stars: ✭ 31 (-52.31%)
Mutual labels:  bayesian-inference
EpiModelHIV
Network Models of HIV Transmission Dynamics among MSM and Heterosexuals
Stars: ✭ 20 (-69.23%)
Mutual labels:  mathematical-modelling
statrethink course in pymc3
Statistical Rethinking course in pymc3
Stars: ✭ 141 (+116.92%)
Mutual labels:  bayesian-inference
blangSDK
Blang's software development kit
Stars: ✭ 21 (-67.69%)
Mutual labels:  bayesian-inference
SCICoNE
Single-cell copy number calling and event history reconstruction.
Stars: ✭ 20 (-69.23%)
Mutual labels:  bayesian-inference
PyCBC-Tutorials
Learn how to use PyCBC to analyze gravitational-wave data and do parameter inference.
Stars: ✭ 91 (+40%)
Mutual labels:  bayesian-inference
DP means
Dirichlet Process K-means
Stars: ✭ 36 (-44.62%)
Mutual labels:  bayesian-inference
tramp
Tree Approximate Message Passing
Stars: ✭ 30 (-53.85%)
Mutual labels:  bayesian-inference
Bridge.jl
A statistical toolbox for diffusion processes and stochastic differential equations. Named after the Brownian Bridge.
Stars: ✭ 99 (+52.31%)
Mutual labels:  bayesian-inference
nestle
Pure Python, MIT-licensed implementation of nested sampling algorithms for evaluating Bayesian evidence.
Stars: ✭ 66 (+1.54%)
Mutual labels:  bayesian-inference
human-memory
Course materials for Dartmouth course: Human Memory (PSYC 51.09)
Stars: ✭ 239 (+267.69%)
Mutual labels:  mathematical-modelling
Landmark Detection Robot Tracking SLAM-
Simultaneous Localization and Mapping(SLAM) also gives you a way to track the location of a robot in the world in real-time and identify the locations of landmarks such as buildings, trees, rocks, and other world features.
Stars: ✭ 14 (-78.46%)
Mutual labels:  bayesian-inference
kendrick
Domain-Specific Modeling for Epidemiology
Stars: ✭ 43 (-33.85%)
Mutual labels:  mathematical-modelling
geostan
Bayesian spatial analysis
Stars: ✭ 40 (-38.46%)
Mutual labels:  bayesian-inference

Gammy – Generalized additive models in Python with a Bayesian twist

A Generalized additive model is a predictive mathematical model defined as a sum of terms that are calibrated (fitted) with observation data.

Generalized additive models form a surprisingly general framework for building models for both production software and scientific research. This Python package offers tools for building the model terms as decompositions of various basis functions. It is possible to model the terms e.g. as Gaussian processes (with reduced dimensionality) of various kernels, as piecewise linear functions, and as B-splines, among others. Of course, very simple terms like lines and constants are also supported (these are just very simple basis functions).

The uncertainty in the weight parameter distributions is modeled using Bayesian statistical analysis with the help of the superb package BayesPy. Alternatively, it is possible to fit models using just NumPy.

Table of Contents

Installation

The package is found in PyPi.

pip install gammy

Examples

In this overview, we demonstrate the package's most important features through common usage examples.

Polynomial regression on 'roids

A typical simple (but sometimes non-trivial) modeling task is to estimate an unknown function from noisy data. First we import the bare minimum dependencies to be used in the below examples:

>>> import numpy as np

>>> import gammy
>>> from gammy.models.bayespy import GAM

Let's simulate a dataset:

>>> np.random.seed(42)

>>> n = 30
>>> input_data = 10 * np.random.rand(n)
>>> y = 5 * input_data + 2.0 * input_data ** 2 + 7 + 10 * np.random.randn(n)

The object x is just a convenience tool for defining input data maps as if they were just Numpy arrays:

>>> from gammy.arraymapper import x

Define and fit the model:

>>> a = gammy.formulae.Scalar(prior=(0, 1e-6))
>>> b = gammy.formulae.Scalar(prior=(0, 1e-6))
>>> bias = gammy.formulae.Scalar(prior=(0, 1e-6))
>>> formula = a * x + b * x ** 2 + bias
>>> model = GAM(formula).fit(input_data, y)

The model attribute model.theta characterizes the Gaussian posterior distribution of the model parameters vector.

>>> model.mean_theta
[array([3.20130444]), array([2.0420961]), array([11.93437195])]

Variance of additive zero-mean normally distributed noise is estimated automagically:

>>> round(model.inv_mean_tau, 8)
74.51660744

Predicting with model

>>> model.predict(input_data[:2])
array([ 52.57112684, 226.9460579 ])

Predictions with uncertainty, that is, posterior predictive mean and variance can be calculated as follows:

>>> model.predict_variance(input_data[:2])
(array([ 52.57112684, 226.9460579 ]), array([79.35827362, 95.16358131]))

Plotting results

>>> fig = gammy.plot.validation_plot(
...     model,
...     input_data,
...     y,
...     grid_limits=[0, 10],
...     input_maps=[x, x, x],
...     titles=["a", "b", "bias"]
... )

The grey band in the top figure is two times the prediction standard deviation and, in the partial residual plots, two times the respective marginal posterior standard deviation.

It is also possible to plot the estimated Γ-distribution of the noise precision (inverse variance) as well as the 1-D Normal distributions of each individual model parameter.

Plot (prior or posterior) probability density functions of all model parameters:

>>> fig = gammy.plot.gaussian1d_density_plot(model)

Saving model on hard disk

Saving:

>> model.save("/home/foobar/test.hdf5")

Loading:

>> model = GAM(formula).load("/home/foobar/test.hdf5")

Gaussian process regression

Create fake dataset:

>>> n = 50
>>> input_data = np.vstack((2 * np.pi * np.random.rand(n), np.random.rand(n))).T
>>> y = (
...     np.abs(np.cos(input_data[:, 0])) * input_data[:, 1] +
...     1 + 0.1 * np.random.randn(n)
... )

Define model:

>>> a = gammy.formulae.ExpSineSquared1d(
...     np.arange(0, 2 * np.pi, 0.1),
...     corrlen=1.0,
...     sigma=1.0,
...     period=2 * np.pi,
...     energy=0.99
... )
>>> bias = gammy.Scalar(prior=(0, 1e-6))
>>> formula = a(x[:, 0]) * x[:, 1] + bias
>>> model = gammy.models.bayespy.GAM(formula).fit(input_data, y)

>>> round(model.mean_theta[0][0], 8)
-0.8343458

Plot predictions and partial residuals:

>>> fig = gammy.plot.validation_plot(
...     model,
...     input_data,
...     y,
...     grid_limits=[[0, 2 * np.pi], [0, 1]],
...     input_maps=[x[:, 0:2], x[:, 1]],
...     titles=["Surface estimate", "intercept"]
... )

Plot parameter probability density functions

>>> fig = gammy.plot.gaussian1d_density_plot(model)

More covariance kernels

The package contains covariance functions for many well-known options such as the Exponential squared, Periodic exponential squared, Rational quadratic, and the Ornstein-Uhlenbeck kernels. Please see the documentation section More on Gaussian Process kernels for a gallery of kernels.

Defining custom kernels

Please read the documentation section: Customize Gaussian Process kernels

Spline regression

Constructing B-Spline based 1-D basis functions is also supported. Let's define dummy data:

>>> n = 30
>>> input_data = 10 * np.random.rand(n)
>>> y = 2.0 * input_data ** 2 + 7 + 10 * np.random.randn(n)

Define model:

>>> grid = np.arange(0, 11, 2.0)
>>> order = 2
>>> N = len(grid) + order - 2
>>> sigma = 10 ** 2
>>> formula = gammy.BSpline1d(
...     grid,
...     order=order,
...     prior=(np.zeros(N), np.identity(N) / sigma),
...     extrapolate=True
... )(x)
>>> model = gammy.models.bayespy.GAM(formula).fit(input_data, y)

>>> round(model.mean_theta[0][0], 8)
-49.00019115

Plot validation figure:

>>> fig = gammy.plot.validation_plot(
...     model,
...     input_data,
...     y,
...     grid_limits=[-2, 12],
...     input_maps=[x],
...     titles=["a"]
... )

Plot parameter probability densities:

>>> fig = gammy.plot.gaussian1d_density_plot(model)

Non-linear manifold regression

In this example we try estimating the bivariate "MATLAB function" using a Gaussian process model with Kronecker tensor structure (see e.g. PyMC3). The main point in the below example is that it is quite straightforward to build models that can learn arbitrary 2D-surfaces.

Let us first create some artificial data using the MATLAB function!

>>> n = 100
>>> input_data = np.vstack((
...     6 * np.random.rand(n) - 3, 6 * np.random.rand(n) - 3
... )).T
>>> y = (
...     gammy.utils.peaks(input_data[:, 0], input_data[:, 1]) +
...     4 + 0.3 * np.random.randn(n)
... )

There is support for forming two-dimensional basis functions given two one-dimensional formulas. The new combined basis is essentially the outer product of the given bases. The underlying weight prior distribution priors and covariances are constructed using the Kronecker product.

>>> a = gammy.ExpSquared1d(
...     np.arange(-3, 3, 0.1),
...     corrlen=0.5,
...     sigma=4.0,
...     energy=0.99
... )(x[:, 0])  # NOTE: Input map is defined here!
>>> b = gammy.ExpSquared1d(
...     np.arange(-3, 3, 0.1),
...     corrlen=0.5,
...     sigma=4.0,
...     energy=0.99
... )(x[:, 1]) # NOTE: Input map is defined here!
>>> A = gammy.formulae.Kron(a, b)
>>> bias = gammy.formulae.Scalar(prior=(0, 1e-6))
>>> formula = A + bias
>>> model = GAM(formula).fit(input_data, y)

>>> round(model.mean_theta[0][0], 8)
0.37426986

Note that same logic could be used to construct higher dimensional bases, that is, one could define a 3D-formula:

>> formula_3d = gammy.Kron(gammy.Kron(a, b), c)

Plot predictions and partial residuals:

>>> fig = gammy.plot.validation_plot(
...     model,
...     input_data,
...     y,
...     grid_limits=[[-3, 3], [-3, 3]],
...     input_maps=[x, x[:, 0]],
...     titles=["Surface estimate", "intercept"]
... )

Plot parameter probability density functions:

>>> fig = gammy.plot.gaussian1d_density_plot(model)

Testing

The package's unit tests can be ran with PyTest (cd to repository root):

pytest -v

Running this documentation as a Doctest:

python -m doctest -v README.md

Package documentation

Documentation of the package with code examples: https://malmgrek.github.io/gammy.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].