All Projects → mikemc → Speedyseq

mikemc / Speedyseq

Licence: agpl-3.0
Speedy versions of phyloseq functions

Programming Languages

r
7636 projects

speedyseq

DOI Travis build status Codecov test coverage

Speedyseq is an R package for microbiome data analysis that extends the popular phyloseq package. Speedyseq began with the limited goal of providing faster versions of phyloseq’s plotting and taxonomic merging functions, but now contains a growing number of enhancements to phyloseq which I have found useful.

Installation

Install the current development version with the remotes package,

# install.packages("remotes")
remotes::install_github("mikemc/speedyseq")

Usage

Method 1: Call speedyseq functions explicitly when you want to use speedyseq’s version instead of phyloseq. This method ensures that you do not unintentionally call speedyseq’s version of a phyloseq function.

library(phyloseq)
data(GlobalPatterns)
system.time(
  # Calls phyloseq's psmelt
  df1 <- psmelt(GlobalPatterns) # slow
)
#>    user  system elapsed 
#>   6.623   0.076   6.711
system.time(
  df2 <- speedyseq::psmelt(GlobalPatterns) # fast
)
#>    user  system elapsed 
#>   0.339   0.000   0.216
dplyr::all_equal(df1, df2, ignore_row_order = TRUE)
#> [1] TRUE
detach(package:phyloseq)

Method 2: Load speedyseq, which will load phyloseq and all speedyseq functions and cause calls to the overlapping function names to go to speedyseq by default.

library(speedyseq)
#> Loading required package: phyloseq
#> 
#> Attaching package: 'speedyseq'
#> The following objects are masked from 'package:phyloseq':
#> 
#>     filter_taxa, plot_bar, plot_heatmap, plot_tree, psmelt, tax_glom,
#>     tip_glom, transform_sample_counts
data(GlobalPatterns)
system.time(
  ps1 <- phyloseq::tax_glom(GlobalPatterns, "Genus") # slow
)
#>    user  system elapsed 
#>  35.266   0.143  35.538
system.time(
  # Calls speedyseq's tax_glom
  ps2 <- tax_glom(GlobalPatterns, "Genus") # fast
)
#>    user  system elapsed 
#>   0.259   0.000   0.247

Loading speedyseq will also load the magrittr pipe (%>%) to allow pipe chains with phyloseq objects,

gp.filt.prop <- GlobalPatterns %>%
  filter_taxa2(~ sum(. > 0) > 5) %>%
  transform_sample_counts(~ . / sum(.))

Features

Faster implementations of phyloseq functions

  • psmelt() and the plotting functions that use it: plot_bar(), plot_heatmap(), and plot_tree().
  • The taxonomic merging functions tax_glom() and tip_glom(). Speedyseq’s tip_glom() also has significantly lower memory usage.

These functions should generally function as drop-in replacements for phyloseq’s versions, with additional arguments allowing for modified behavior. Differences in row order (for psmelt()) and taxon order (for tax_glom()) can occur; see Changelog for details.

New taxonomic merging functions

  • A general-purpose merging function merge_taxa_vec() that provides a vectorized version of phyloseq’s merge_taxa() function.
  • A function tree_glom() that performs direct phylogenetic merging of taxa. This function provides an alternative to the indirect phylogenetic merging done by tip_glom() that is much faster and arguably more intuitive.

See the Changelog for details and examples.

Enhancements and additions to other phyloseq functions

See the online documentation for an up-to-date list and usage information and the Changelog for further information.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].