All Projects → david-barnett → microViz

david-barnett / microViz

Licence: GPL-3.0 license
R package for microbiome data visualization and statistics. Uses phyloseq, vegan and the tidyverse. Docker image available.

Programming Languages

r
7636 projects

Projects that are alternatives of or similar to microViz

qiita
Qiita - A multi-omics databasing effort
Stars: ✭ 103 (+68.85%)
Mutual labels:  microbiome, microbiome-analysis
MicEco
Various functions for analysis of microbial community data
Stars: ✭ 25 (-59.02%)
Mutual labels:  microbiome, microbiome-analysis
q2-qemistree
Hierarchical orderings for mass spectrometry data. Canonically pronounced "chemis-tree".
Stars: ✭ 23 (-62.3%)
Mutual labels:  microbiome
microbiomeMarker
R package for microbiome biomarker discovery
Stars: ✭ 89 (+45.9%)
Mutual labels:  microbiome
metacal
Metagenomics calibration R package
Stars: ✭ 16 (-73.77%)
Mutual labels:  microbiome-analysis
qurro
Visualize differentially ranked features (taxa, metabolites, ...) and their log-ratios across samples
Stars: ✭ 24 (-60.66%)
Mutual labels:  microbiome
schtools
Schloss Lab Tools for Reproducible Microbiome Research 💩
Stars: ✭ 22 (-63.93%)
Mutual labels:  microbiome
RCM
A model-based visualization method for microbiome data
Stars: ✭ 16 (-73.77%)
Mutual labels:  microbiome
Maaslin2
MaAsLin2: Microbiome Multivariate Association with Linear Models
Stars: ✭ 76 (+24.59%)
Mutual labels:  microbiome
calour
exploratory and interactive microbiome analyses based on heatmaps
Stars: ✭ 22 (-63.93%)
Mutual labels:  microbiome
DAtest
Compare different differential abundance and expression methods
Stars: ✭ 34 (-44.26%)
Mutual labels:  microbiome
melonnpan
Model-based Genomically Informed High-dimensional Predictor of Microbial Community Metabolic Profiles
Stars: ✭ 20 (-67.21%)
Mutual labels:  microbiome
mitre
The Microbiome Interpretable Temporal Rule Engine
Stars: ✭ 37 (-39.34%)
Mutual labels:  microbiome
ganon
ganon classifies short DNA sequences against large sets of genomic sequences efficiently, with download and update of references (RefSeq/Genbank), taxonomic (NCBI/GTDB) and hierarchical classification, customized reporting and more
Stars: ✭ 57 (-6.56%)
Mutual labels:  microbiome
empress
A fast and scalable phylogenetic tree viewer for microbiome data analysis
Stars: ✭ 37 (-39.34%)
Mutual labels:  microbiome
microbiomeHD
Cross-disease comparison of case-control gut microbiome studies
Stars: ✭ 58 (-4.92%)
Mutual labels:  microbiome
DRAM
Distilled and Refined Annotation of Metabolism: A tool for the annotation and curation of function for microbial and viral genomes
Stars: ✭ 159 (+160.66%)
Mutual labels:  microbiome
ANCOMBC
Differential abundance (DA) and correlation analyses for microbial absolute abundance data
Stars: ✭ 60 (-1.64%)
Mutual labels:  microbiome
RDPTools
Collection of commonly used RDP Tools for easy building
Stars: ✭ 44 (-27.87%)
Mutual labels:  microbiome
redbiom
Sample search by metadata and features
Stars: ✭ 27 (-55.74%)
Mutual labels:  microbiome

microViz

R-CMD-check codecov GitHub R package version GitHub release (including pre-releases) Docker Image Version (latest by date) r-universe microViz status JOSS article Citations Zenodo DOI

Overview

📦 microViz is an R package for analysis and visualization of microbiome sequencing data.

🔨 microViz functions are intended to be easy to use and flexible.

🔬 microViz extends and complements popular microbial ecology packages like phyloseq, vegan, & microbiome.

Upgrading microViz versions ≤ 0.9.7 to version 0.10.0 +

microViz version 0.10.0 introduces changes that may break some code written with older versions of microViz. If you were already using microViz prior to version 0.10.0 please read this document before upgrading to the latest version.

Learn more

📎 This website is the best place for documentation and examples: https://david-barnett.github.io/microViz/

Installation

microViz is not (yet) available from CRAN. You can install microViz from R Universe, or from GitHub.

I recommend you first install the Bioconductor dependencies using the code below.

if (!requireNamespace("BiocManager", quietly = TRUE)) install.packages("BiocManager")
BiocManager::install(c("phyloseq", "microbiome", "ComplexHeatmap"), update = FALSE)

Installation of microViz from R Universe

install.packages(
  "microViz",
  repos = c(davidbarnett = "https://david-barnett.r-universe.dev", getOption("repos"))
)

I also highly recommend you install the following suggested CRAN packages.

install.packages("ggraph") # for taxatree_plots()
install.packages("DT") # for tax_fix_interactive()
install.packages("corncob") # for example datasets and beta binomial models

Installation of microViz from GitHub

# Installing from github requires the devtools package
install.packages("devtools")
# Windows users will also need to have RTools installed! http://jtleek.com/modules/01_DataScientistToolbox/02_10_rtools/

# To install the very latest version:
devtools::install_github("david-barnett/microViz")

# To install a specific "release" version of this package
devtools::install_github("david-barnett/[email protected]") 

Installation notes

🍎 macOS users - might need to install xquartz to make the heatmaps work (to do this with homebrew, run the following command in your mac’s Terminal: brew install --cask xquartz

📦 I highly recommend using renv for managing your R package installations across multiple projects.

🐳 For Docker users an image with the main branch installed is available at: https://hub.docker.com/r/barnettdavid/microviz-rocker-verse

📅 microViz is tested to work with R version 4 on Windows, MacOS, and Ubuntu 18 and 20. R version 3.6.* should probably work, but I don’t formally test this.

Interactive ordination exploration

library(microViz)
#> microViz version 0.10.5 - Copyright (C) 2022 David Barnett
#> ! Website: https://david-barnett.github.io/microViz
#> ✔ Useful?  For citation details, run: `citation("microViz")`
#> ✖ Silence? `suppressPackageStartupMessages(library(microViz))`

microViz provides a Shiny app for an easy way to start exploring your microbiome data: all you need is a phyloseq object.

# example data from corncob package
pseq <- corncob::ibd_phylo %>%
  tax_fix() %>%
  phyloseq_validate()
ord_explore(pseq) # gif generated with microViz version 0.7.4 (plays at 1.75x speed)

Example analyses

library(phyloseq)
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following object is masked from 'package:testthat':
#> 
#>     matches
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(ggplot2)
# get some example data
data("dietswap", package = "microbiome")

# create a couple of numerical variables to use as constraints or conditions
dietswap <- dietswap %>%
  ps_mutate(
    weight = recode(bmi_group, obese = 3, overweight = 2, lean = 1),
    female = if_else(sex == "female", true = 1, false = 0),
    african = if_else(nationality == "AFR", true = 1, false = 0)
  )
# add a couple of missing values to show how microViz handles missing data
sample_data(dietswap)$african[c(3, 4)] <- NA

Looking at your data

You have quite a few samples in your phyloseq object, and would like to visualize their compositions. Perhaps these example data differ by participant nationality?

dietswap %>%
  comp_barplot(
    tax_level = "Genus", n_taxa = 15, other_name = "Other",
    taxon_renamer = function(x) stringr::str_remove(x, " [ae]t rel."),
    palette = distinct_palette(n = 15, add = "grey90"),
    merge_other = FALSE, bar_outline_colour = "darkgrey"
  ) +
  coord_flip() +
  facet_wrap("nationality", nrow = 1, scales = "free") +
  labs(x = NULL, y = NULL) +
  theme(axis.text.y = element_blank(), axis.ticks.y = element_blank())
#> Registered S3 method overwritten by 'seriation':
#>   method         from 
#>   reorder.hclust vegan

htmp <- dietswap %>%
  ps_mutate(nationality = as.character(nationality)) %>%
  tax_transform("log2", add = 1, chain = TRUE) %>%
  comp_heatmap(
    taxa = tax_top(dietswap, n = 30), grid_col = NA, name = "Log2p",
    taxon_renamer = function(x) stringr::str_remove(x, " [ae]t rel."),
    colors = heat_palette(palette = viridis::turbo(11)),
    row_names_side = "left", row_dend_side = "right", sample_side = "bottom",
    sample_anno = sampleAnnotation(
      Nationality = anno_sample_cat(
        var = "nationality", col = c(AAM = "grey35", AFR = "grey85"),
        box_col = NA, legend_title = "Nationality", size = grid::unit(4, "mm")
      )
    )
  )

ComplexHeatmap::draw(
  object = htmp, annotation_legend_list = attr(htmp, "AnnoLegends"),
  merge_legends = TRUE
)

Example ordination plot workflow

Ordination methods can also help you to visualize if overall microbial ecosystem composition differs markedly between groups, e.g. BMI.

Here is one option as an example:

  1. Filter out rare taxa (e.g. remove Genera not present in at least 10% of samples) - use tax_filter()
  2. Aggregate the taxa into bacterial families (for example) - use tax_agg()
  3. Transform the microbial data with the centered-log-ratio transformation - use tax_transform()
  4. Perform PCA with the clr-transformed features (equivalent to Aitchison distance PCoA) - use ord_calc()
  5. Plot the first 2 axes of this PCA ordination, colouring samples by group and adding taxon loading arrows to visualize which taxa generally differ across your samples - use ord_plot()
  6. Customise the theme of the ggplot as you like and/or add features like ellipses or annotations
# perform ordination
unconstrained_aitchison_pca <- dietswap %>%
  tax_filter(min_prevalence = 0.1, tax_level = "Genus") %>%
  tax_agg("Family") %>%
  tax_transform("clr") %>%
  ord_calc()
#> Proportional min_prevalence given: 0.1 --> min 23/222 samples.
# ord_calc will automatically infer you want a "PCA" here
# specify explicitly with method = "PCA", or you can pick another method

# create plot
pca_plot <- unconstrained_aitchison_pca %>%
  ord_plot(
    plot_taxa = 1:6, colour = "bmi_group", size = 1.5,
    tax_vec_length = 0.325,
    tax_lab_style = tax_lab_style(max_angle = 90, aspect_ratio = 0.5),
    auto_caption = 8
  )

# customise plot
customised_plot <- pca_plot +
  stat_ellipse(aes(linetype = bmi_group, colour = bmi_group), linewidth = 0.3) + # linewidth not size, since ggplot 3.4.0
  scale_colour_brewer(palette = "Set1") +
  theme(legend.position = "bottom") +
  coord_fixed(ratio = 0.5, clip = "off") # makes rotated labels align correctly

# show plot
customised_plot

PERMANOVA

You visualised your ordinated data in the plot above. Below you can see how to perform a PERMANOVA to test the significance of BMI’s association with overall microbial composition. This example uses the Family-level aitchison distance to correspond with the plot above.

# calculate distances
aitchison_dists <- dietswap %>%
  tax_filter(min_prevalence = 0.1) %>%
  tax_transform("identity", rank = "Family") %>%
  dist_calc("aitchison")
#> Proportional min_prevalence given: 0.1 --> min 23/222 samples.

# the more permutations you request, the longer it takes
# but also the more stable and precise your p-values become
aitchison_perm <- aitchison_dists %>%
  dist_permanova(
    seed = 1234, # for set.seed to ensure reproducibility of random process
    n_processes = 1, n_perms = 99, # you should use at least 999!
    variables = "bmi_group"
  )
#> 2022-11-24 01:58:13 - Starting PERMANOVA with 99 perms with 1 processes
#> 2022-11-24 01:58:13 - Finished PERMANOVA

# view the permanova results
perm_get(aitchison_perm) %>% as.data.frame()
#>            Df  SumOfSqs         R2        F Pr(>F)
#> bmi_group   2  104.0678 0.04177157 4.773379   0.01
#> Residual  219 2387.2862 0.95822843       NA     NA
#> Total     221 2491.3540 1.00000000       NA     NA

# view the info stored about the distance calculation
info_get(aitchison_perm)
#> psExtra info:
#> tax_agg = "Family" tax_trans = "identity" dist_method = "aitchison"

Constrained partial ordination

You could visualise the effect of the (numeric/logical) variables in your permanova directly using the ord_plot function with constraints (and conditions).

perm2 <- aitchison_dists %>%
  dist_permanova(variables = c("weight", "african", "sex"), seed = 321)
#> Dropping samples with missings: 2
#> 2022-11-24 01:58:13 - Starting PERMANOVA with 999 perms with 1 processes
#> 2022-11-24 01:58:15 - Finished PERMANOVA

We’ll visualise the effect of nationality and bodyweight on sample composition, after first removing the effect of sex.

perm2 %>%
  ord_calc(constraints = c("weight", "african"), conditions = "female") %>%
  ord_plot(
    colour = "nationality", size = 2.5, alpha = 0.35,
    auto_caption = 7,
    constraint_vec_length = 1,
    constraint_vec_style = vec_constraint(1.5, colour = "grey15"),
    constraint_lab_style = constraint_lab_style(
      max_angle = 90, size = 3, aspect_ratio = 0.35, colour = "black"
    )
  ) +
  stat_ellipse(aes(colour = nationality), linewidth = 0.2) + # linewidth not size since ggplot 3.4.0
  scale_color_brewer(palette = "Set1") +
  coord_fixed(ratio = 0.35, clip = "off") +
  theme(legend.position = c(0.9, 0.1), legend.background = element_rect())
#> 
#> Centering (mean) and scaling (sd) the constraints and/or conditions:
#>  weight
#>  african
#>  female

Correlation Heatmaps

microViz heatmaps are powered by ComplexHeatmap and annotated with taxa prevalence and/or abundance.

# set up the data with numerical variables and filter to top taxa
psq <- dietswap %>%
  ps_mutate(
    weight = recode(bmi_group, obese = 3, overweight = 2, lean = 1),
    female = if_else(sex == "female", true = 1, false = 0),
    african = if_else(nationality == "AFR", true = 1, false = 0)
  ) %>%
  tax_filter(
    tax_level = "Genus", min_prevalence = 1 / 10, min_sample_abundance = 1 / 10
  ) %>%
  tax_transform("identity", rank = "Genus")
#> Proportional min_prevalence given: 0.1 --> min 23/222 samples.

# randomly select 30 taxa from the 50 most abundant taxa (just for an example)
set.seed(123)
taxa <- sample(tax_top(psq, n = 50), size = 30)
# actually draw the heatmap
cor_heatmap(
  data = psq, taxa = taxa,
  taxon_renamer = function(x) stringr::str_remove(x, " [ae]t rel."),
  tax_anno = taxAnnotation(
    Prev. = anno_tax_prev(undetected = 50),
    Log2 = anno_tax_box(undetected = 50, trans = "log2", zero_replace = 1)
  )
)

Citation

😇 If you find any part of microViz useful to your work, please consider citing the JOSS article:

Barnett et al., (2021). microViz: an R package for microbiome data visualization and statistics. Journal of Open Source Software, 6(63), 3201, https://doi.org/10.21105/joss.03201

Contributing

Bug reports, questions, suggestions for new features, and other contributions are all welcome. Feel free to create a GitHub Issue or write on the Discussions page. Alternatively you could also contact me (David) on Twitter @_david_barnett_ .

This project is released with a Contributor Code of Conduct and by participating in this project you agree to abide by its terms.

Session info

sessionInfo()
#> R version 4.2.1 (2022-06-23)
#> Platform: x86_64-apple-darwin17.0 (64-bit)
#> Running under: macOS Big Sur ... 10.16
#> 
#> Matrix products: default
#> BLAS:   /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRblas.0.dylib
#> LAPACK: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRlapack.dylib
#> 
#> locale:
#> [1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] ggplot2_3.4.0   dplyr_1.0.10    phyloseq_1.42.0 microViz_0.10.5
#> [5] testthat_3.1.5  devtools_2.4.5  usethis_2.1.6  
#> 
#> loaded via a namespace (and not attached):
#>   [1] circlize_0.4.15        plyr_1.8.8             igraph_1.3.5          
#>   [4] selectr_0.4-2          splines_4.2.1          GenomeInfoDb_1.34.2   
#>   [7] digest_0.6.30          foreach_1.5.2          ca_0.71.1             
#>  [10] htmltools_0.5.3        magick_2.7.3           viridis_0.6.2         
#>  [13] fansi_1.0.3            magrittr_2.0.3         memoise_2.0.1         
#>  [16] cluster_2.1.4          doParallel_1.0.17      remotes_2.4.2         
#>  [19] ComplexHeatmap_2.14.0  Biostrings_2.66.0      matrixStats_0.62.0    
#>  [22] prettyunits_1.1.1      colorspace_2.0-3       corncob_0.3.0         
#>  [25] rvest_1.0.3            xfun_0.35              callr_3.7.3           
#>  [28] crayon_1.5.2           RCurl_1.98-1.9         jsonlite_1.8.3        
#>  [31] survival_3.4-0         iterators_1.0.14       ape_5.6-2             
#>  [34] glue_1.6.2             registry_0.5-1         gtable_0.3.1          
#>  [37] zlibbioc_1.44.0        XVector_0.38.0         GetoptLong_1.0.5      
#>  [40] pkgbuild_1.3.1         Rhdf5lib_1.20.0        shape_1.4.6           
#>  [43] BiocGenerics_0.44.0    scales_1.2.1           DBI_1.1.3             
#>  [46] miniUI_0.1.1.1         Rcpp_1.0.9             gridtext_0.1.5        
#>  [49] viridisLite_0.4.1      xtable_1.8-4           clue_0.3-62           
#>  [52] stats4_4.2.1           profvis_0.3.7          htmlwidgets_1.5.4     
#>  [55] httr_1.4.4             RColorBrewer_1.1-3     ellipsis_0.3.2        
#>  [58] urlchecker_1.0.1       pkgconfig_2.0.3        farver_2.1.1          
#>  [61] utf8_1.2.2             tidyselect_1.2.0       labeling_0.4.2        
#>  [64] rlang_1.0.6            reshape2_1.4.4         later_1.3.0           
#>  [67] munsell_0.5.0          tools_4.2.1            cachem_1.0.6          
#>  [70] cli_3.4.1              generics_0.1.3         ade4_1.7-20           
#>  [73] evaluate_0.18          biomformat_1.26.0      stringr_1.4.1         
#>  [76] fastmap_1.1.0          yaml_2.3.6             processx_3.8.0        
#>  [79] knitr_1.40             fs_1.5.2               purrr_0.3.5           
#>  [82] nlme_3.1-160           mime_0.12              xml2_1.3.3            
#>  [85] brio_1.1.3             compiler_4.2.1         rstudioapi_0.14       
#>  [88] curl_4.3.3             png_0.1-7              tibble_3.1.8          
#>  [91] stringi_1.7.8          highr_0.9              ps_1.7.2              
#>  [94] lattice_0.20-45        Matrix_1.5-3           commonmark_1.8.1      
#>  [97] markdown_1.4           vegan_2.6-4            microbiome_1.20.0     
#> [100] permute_0.9-7          multtest_2.54.0        vctrs_0.5.1           
#> [103] pillar_1.8.1           lifecycle_1.0.3        rhdf5filters_1.10.0   
#> [106] GlobalOptions_0.1.2    data.table_1.14.6      bitops_1.0-7          
#> [109] seriation_1.4.0        httpuv_1.6.6           R6_2.5.1              
#> [112] promises_1.2.0.1       TSP_1.2-1              gridExtra_2.3         
#> [115] IRanges_2.32.0         sessioninfo_1.2.2      codetools_0.2-18      
#> [118] MASS_7.3-58.1          assertthat_0.2.1       pkgload_1.3.2         
#> [121] rhdf5_2.42.0           rjson_0.2.21           withr_2.5.0           
#> [124] S4Vectors_0.36.0       GenomeInfoDbData_1.2.9 ggtext_0.1.2          
#> [127] mgcv_1.8-41            parallel_4.2.1         grid_4.2.1            
#> [130] tidyr_1.2.1            rmarkdown_2.18         Cairo_1.6-0           
#> [133] Rtsne_0.16             Biobase_2.58.0         shiny_1.7.3
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].