Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → tare → Gpmicrobiome

tare / Gpmicrobiome

Licence: mit

A novel probabilistic approach to explicitly model overdispersion and sampling zeros in 16S rRNA sequencing data by considering the temporal correlation between nearby time points using Gaussian Processes

Labels

stan

Projects that are alternatives of or similar to Gpmicrobiome

lgpr

R-package for interpretable nonparametric modeling of longitudinal data using additive Gaussian processes. Contains functionality for inferring covariate effects and assessing covariate relevances. Various models can be specified using a convenient formula syntax.

Stars: ✭ 22 (+340%)

Mutual labels: stan

covidseir

Bayesian SEIR model to estimate the effects of social-distancing on COVID-19

Stars: ✭ 23 (+360%)

Mutual labels: stan

Bda r demos

Bayesian Data Analysis demos for R

Stars: ✭ 409 (+8080%)

Mutual labels: stan

CausalQueries

Bayesian inference from binary causal models

Stars: ✭ 20 (+300%)

Mutual labels: stan

notebooks

Collection of (unfinished) notebooks

Stars: ✭ 13 (+160%)

Mutual labels: stan

EmbracingUncertainty

Material for AMLD 2020 workshop "Bayesian Inference: embracing uncertainty"

Stars: ✭ 23 (+360%)

Mutual labels: stan

ubms

Fit models to data from unmarked animals using Stan. Uses a similar interface to the R package 'unmarked', while providing the advantages of Bayesian inference and allowing estimation of random effects.

Stars: ✭ 27 (+440%)

Mutual labels: stan

Rstan

RStan, the R interface to Stan

Stars: ✭ 760 (+15100%)

Mutual labels: stan

tsbook

『基礎からわかる時系列分析』（技術評論社，2018年）のサポートサイトです。

Stars: ✭ 52 (+940%)

Mutual labels: stan

Orbit

A Python package for Bayesian forecasting with object-oriented design and probabilistic models under the hood.

Stars: ✭ 346 (+6820%)

Mutual labels: stan

stan-vim

A Vim plugin for the Stan probabilistic programming language.

Stars: ✭ 41 (+720%)

Mutual labels: stan

cmdstanr

CmdStanR: the R interface to CmdStan

Stars: ✭ 82 (+1540%)

Mutual labels: stan

Bayesplot

bayesplot R package for plotting Bayesian models

Stars: ✭ 276 (+5420%)

Mutual labels: stan

natsclient

NATS 2.x Client Library

Stars: ✭ 37 (+640%)

Mutual labels: stan

Math

The Stan Math Library is a C++ template library for automatic differentiation of any order using forward, reverse, and mixed modes. It includes a range of built-in functions for probabilistic modeling, linear algebra, and equation solving.

Stars: ✭ 494 (+9780%)

Mutual labels: stan

stan4bart

Uses Stan sampler and math library to semiparametrically fit linear and multilevel models with additive Bayesian Additive Regression Tree (BART) components.

Stars: ✭ 13 (+160%)

Mutual labels: stan

Torsten

library of C++ functions that support applications of Stan in Pharmacometrics

Stars: ✭ 38 (+660%)

Mutual labels: stan

Bda py demos

Bayesian Data Analysis demos for Python

Stars: ✭ 781 (+15520%)

Mutual labels: stan

Tidybayes

Bayesian analysis + tidy data + geoms (R package)

Stars: ✭ 557 (+11040%)

Mutual labels: stan

Rstanarm

rstanarm R package for Bayesian applied regression modeling

Stars: ✭ 285 (+5600%)

Mutual labels: stan

View All Similar Projects ➔

GPMicrobiome

Prerequisites

Python 2.7 (https://www.python.org/)
PyStan (http://pystan.readthedocs.org/en/latest/)
NumPy (http://www.numpy.org/)

For more information on Stan and PyStan, please see the documentation at http://mc-stan.org/interfaces/pystan.html.

Command line interface

Usage

The correct command line usage of the program is summarized by the following usage message

$ python gpmicrobiome.py --help 
usage: gpmicrobiome.py [-h] -t TIME_POINTS [-p TIME_POINTS_I] -d COUNT_DATA -o OUTPUT_FILE [-v]

GPMicrobiome

optional arguments: 
  -h, --help                                   show this help message and exit
  -t TIME_POINTS, --time TIME_POINTS           file containing time points of measurements (required)
  -p TIME_POINTS_I, --prediction TIME_POINTS_I file containing prediction time points (optional)
  -d COUNT_DATA, --data COUNT_DATA             file containing read counts (required)
  -o OUTPUT_FILE, --output OUTPUT_FILE         output file for pickling posterior samples (required)
  -v, --version                                show program's version number and exit

The user has to supply either two or three input data files and one output file. The two mandatory input data files have measurement time points (in days) and read counts for each species at every time point. The optional input data file contains time points for predictions (interpolation/extrapolation). The obtained posterior samples are written to the output file (existing file is overwritten).

The formats of the input files are explained below.

Input data format

For demonstration purposes, let us assume that the names of the input files are timepoints.tsv, prediction_timepoints.tsv, and data.tsv. The file containing measurement time points (timepoints.tsv) should have T lines where each line has one value representing measurement time point (in days). For instance, if there are seven measurements, which are taken daily, then

$ cat timepoints.tsv 
0
1
2
3
4
5
6

Additionally, for the sake of simplicity, let us assume that there are three (M=3) species. Then the file data.tsv containing read counts should have M lines and T tab-separated values per line

$ cat data.tsv 
9421  11123 10032 12132 76321 10923 8023
33134 31203 24103 26190 29893 35023 32310
62310 61032 57904 0 61203 60231 62031

Note that the order of columns in data.tsv should match the order of measurement time points specified in timepoints.tsv.

The optional input file prediction_timepoints.tsv has the same format as timepoints.tsv. For instance, if the goal is to predict compositions at 4.5 and 9 days, then

$ cat prediction_timepoints.tsv 
4.5
9

Sampling

If the goal is to estimate the underlying compositions at measurement time points without producing predictions, then the following command should be executed

python gpmicrobiome.py -t timepoints.tsv -d counts.tsv -o samples.p

Whereas, if the goal is also to produce predictions, then the following command should be executed

python gpmicrobiome.py -t timepoints.tsv -p prediction_timepoints.tsv -d counts.tsv -o samples.p

In both cases, samples.p will contain measurement time points, prediction time points, and posterior samples.

Output handling

The output file samples.p can be read in Python as follows

import pickle
T,T_p,samples = pickle.load(open('samples.p','rb'))

Then the posterior means of Thetas can be printed as

print samples['Theta_G'].mean(0).T
if samples.has_key('Theta_G_i'):
  print samples['Theta_G_i'].mean(0).T

Note that the if statement is used to check whether predictions were made. The orders of rows and columns correspond the orders of timepoints.tsv, prediction_timepoints.tsv, and data.tsv.

Application programming interface

In addition to the command line interface, GPMicrobiome can be used directly from Python.

Assume that the user has data in numpy arrays T (1D array containing measurement time points), T_p (1D array containing prediction time points, empty array corresponds to the prediction-free case), and counts (2D array containing counts so that rows and columns represent species and time points, respectively). Then the sampling procedure can be done as follows

from gpmicrobiome import stan_init_data, get_samples 
init, data = stan_init_data(X,T,T_p)
samples = get_samples('gpmicrobiome.stan',data,init)

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 5

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (1) 🔗