All Projects → tare → Gpmicrobiome

tare / Gpmicrobiome

Licence: mit
A novel probabilistic approach to explicitly model overdispersion and sampling zeros in 16S rRNA sequencing data by considering the temporal correlation between nearby time points using Gaussian Processes

Labels

Projects that are alternatives of or similar to Gpmicrobiome

lgpr
R-package for interpretable nonparametric modeling of longitudinal data using additive Gaussian processes. Contains functionality for inferring covariate effects and assessing covariate relevances. Various models can be specified using a convenient formula syntax.
Stars: ✭ 22 (+340%)
Mutual labels:  stan
covidseir
Bayesian SEIR model to estimate the effects of social-distancing on COVID-19
Stars: ✭ 23 (+360%)
Mutual labels:  stan
Bda r demos
Bayesian Data Analysis demos for R
Stars: ✭ 409 (+8080%)
Mutual labels:  stan
CausalQueries
Bayesian inference from binary causal models
Stars: ✭ 20 (+300%)
Mutual labels:  stan
notebooks
Collection of (unfinished) notebooks
Stars: ✭ 13 (+160%)
Mutual labels:  stan
EmbracingUncertainty
Material for AMLD 2020 workshop "Bayesian Inference: embracing uncertainty"
Stars: ✭ 23 (+360%)
Mutual labels:  stan
ubms
Fit models to data from unmarked animals using Stan. Uses a similar interface to the R package 'unmarked', while providing the advantages of Bayesian inference and allowing estimation of random effects.
Stars: ✭ 27 (+440%)
Mutual labels:  stan
Rstan
RStan, the R interface to Stan
Stars: ✭ 760 (+15100%)
Mutual labels:  stan
tsbook
『基礎からわかる時系列分析』(技術評論社,2018年)のサポートサイトです。
Stars: ✭ 52 (+940%)
Mutual labels:  stan
Orbit
A Python package for Bayesian forecasting with object-oriented design and probabilistic models under the hood.
Stars: ✭ 346 (+6820%)
Mutual labels:  stan
stan-vim
A Vim plugin for the Stan probabilistic programming language.
Stars: ✭ 41 (+720%)
Mutual labels:  stan
cmdstanr
CmdStanR: the R interface to CmdStan
Stars: ✭ 82 (+1540%)
Mutual labels:  stan
Bayesplot
bayesplot R package for plotting Bayesian models
Stars: ✭ 276 (+5420%)
Mutual labels:  stan
natsclient
NATS 2.x Client Library
Stars: ✭ 37 (+640%)
Mutual labels:  stan
Math
The Stan Math Library is a C++ template library for automatic differentiation of any order using forward, reverse, and mixed modes. It includes a range of built-in functions for probabilistic modeling, linear algebra, and equation solving.
Stars: ✭ 494 (+9780%)
Mutual labels:  stan
stan4bart
Uses Stan sampler and math library to semiparametrically fit linear and multilevel models with additive Bayesian Additive Regression Tree (BART) components.
Stars: ✭ 13 (+160%)
Mutual labels:  stan
Torsten
library of C++ functions that support applications of Stan in Pharmacometrics
Stars: ✭ 38 (+660%)
Mutual labels:  stan
Bda py demos
Bayesian Data Analysis demos for Python
Stars: ✭ 781 (+15520%)
Mutual labels:  stan
Tidybayes
Bayesian analysis + tidy data + geoms (R package)
Stars: ✭ 557 (+11040%)
Mutual labels:  stan
Rstanarm
rstanarm R package for Bayesian applied regression modeling
Stars: ✭ 285 (+5600%)
Mutual labels:  stan

GPMicrobiome

Prerequisites

For more information on Stan and PyStan, please see the documentation at http://mc-stan.org/interfaces/pystan.html.

Command line interface

Usage

The correct command line usage of the program is summarized by the following usage message

$ python gpmicrobiome.py --help 
usage: gpmicrobiome.py [-h] -t TIME_POINTS [-p TIME_POINTS_I] -d COUNT_DATA -o OUTPUT_FILE [-v]

GPMicrobiome

optional arguments: 
  -h, --help                                   show this help message and exit
  -t TIME_POINTS, --time TIME_POINTS           file containing time points of measurements (required)
  -p TIME_POINTS_I, --prediction TIME_POINTS_I file containing prediction time points (optional)
  -d COUNT_DATA, --data COUNT_DATA             file containing read counts (required)
  -o OUTPUT_FILE, --output OUTPUT_FILE         output file for pickling posterior samples (required)
  -v, --version                                show program's version number and exit

The user has to supply either two or three input data files and one output file. The two mandatory input data files have measurement time points (in days) and read counts for each species at every time point. The optional input data file contains time points for predictions (interpolation/extrapolation). The obtained posterior samples are written to the output file (existing file is overwritten).

The formats of the input files are explained below.

Input data format

For demonstration purposes, let us assume that the names of the input files are timepoints.tsv, prediction_timepoints.tsv, and data.tsv. The file containing measurement time points (timepoints.tsv) should have T lines where each line has one value representing measurement time point (in days). For instance, if there are seven measurements, which are taken daily, then

$ cat timepoints.tsv 
0
1
2
3
4
5
6

Additionally, for the sake of simplicity, let us assume that there are three (M=3) species. Then the file data.tsv containing read counts should have M lines and T tab-separated values per line

$ cat data.tsv 
9421  11123 10032 12132 76321 10923 8023
33134 31203 24103 26190 29893 35023 32310
62310 61032 57904 0 61203 60231 62031

Note that the order of columns in data.tsv should match the order of measurement time points specified in timepoints.tsv.

The optional input file prediction_timepoints.tsv has the same format as timepoints.tsv. For instance, if the goal is to predict compositions at 4.5 and 9 days, then

$ cat prediction_timepoints.tsv 
4.5
9

Sampling

If the goal is to estimate the underlying compositions at measurement time points without producing predictions, then the following command should be executed

python gpmicrobiome.py -t timepoints.tsv -d counts.tsv -o samples.p

Whereas, if the goal is also to produce predictions, then the following command should be executed

python gpmicrobiome.py -t timepoints.tsv -p prediction_timepoints.tsv -d counts.tsv -o samples.p

In both cases, samples.p will contain measurement time points, prediction time points, and posterior samples.

Output handling

The output file samples.p can be read in Python as follows

import pickle
T,T_p,samples = pickle.load(open('samples.p','rb'))

Then the posterior means of Thetas can be printed as

print samples['Theta_G'].mean(0).T
if samples.has_key('Theta_G_i'):
  print samples['Theta_G_i'].mean(0).T

Note that the if statement is used to check whether predictions were made. The orders of rows and columns correspond the orders of timepoints.tsv, prediction_timepoints.tsv, and data.tsv.

Application programming interface

In addition to the command line interface, GPMicrobiome can be used directly from Python.

Assume that the user has data in numpy arrays T (1D array containing measurement time points), T_p (1D array containing prediction time points, empty array corresponds to the prediction-free case), and counts (2D array containing counts so that rows and columns represent species and time points, respectively). Then the sampling procedure can be done as follows

from gpmicrobiome import stan_init_data, get_samples 
init, data = stan_init_data(X,T,T_p)
samples = get_samples('gpmicrobiome.stan',data,init)
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].