All Projects → dnbaker → bonsai

dnbaker / bonsai

Licence: MIT license
Bonsai: Fast, flexible taxonomic analysis and classification

Programming Languages

C++
36643 projects - #6 most used programming language
c
50402 projects - #5 most used programming language
python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to bonsai

InSilicoSeq
🚀 A sequencing simulator
Stars: ✭ 116 (+75.76%)
Mutual labels:  metagenomics
virnet
VirNet: A deep attention model for viral reads identification
Stars: ✭ 26 (-60.61%)
Mutual labels:  metagenomics
recentrifuge
Recentrifuge: robust comparative analysis and contamination removal for metagenomics
Stars: ✭ 79 (+19.7%)
Mutual labels:  metagenomics
Binning refiner
Improving genome bins through the combination of different binning programs
Stars: ✭ 26 (-60.61%)
Mutual labels:  metagenomics
GraphBin
GraphBin: Refined binning of metagenomic contigs using assembly graphs
Stars: ✭ 35 (-46.97%)
Mutual labels:  metagenomics
micca
micca - MICrobial Community Analysis
Stars: ✭ 19 (-71.21%)
Mutual labels:  metagenomics
MG-RAST
The MG-RAST Backend -- the API server
Stars: ✭ 39 (-40.91%)
Mutual labels:  metagenomics
Jovian
Metagenomics/viromics pipeline that focuses on automation, user-friendliness and a clear audit trail. Jovian aims to empower classical biologists and wet-lab personnel to do metagenomics/viromics analyses themselves, without bioinformatics expertise.
Stars: ✭ 14 (-78.79%)
Mutual labels:  metagenomics
microbiomeMarker
R package for microbiome biomarker discovery
Stars: ✭ 89 (+34.85%)
Mutual labels:  metagenomics
Maaslin2
MaAsLin2: Microbiome Multivariate Association with Linear Models
Stars: ✭ 76 (+15.15%)
Mutual labels:  metagenomics
catch
A package for designing compact and comprehensive capture probe sets.
Stars: ✭ 55 (-16.67%)
Mutual labels:  metagenomics
matam
Mapping-Assisted Targeted-Assembly for Metagenomics
Stars: ✭ 18 (-72.73%)
Mutual labels:  metagenomics
metacal
Metagenomics calibration R package
Stars: ✭ 16 (-75.76%)
Mutual labels:  metagenomics
melonnpan
Model-based Genomically Informed High-dimensional Predictor of Microbial Community Metabolic Profiles
Stars: ✭ 20 (-69.7%)
Mutual labels:  metagenomics
ORNA
Fast in-silico normalization algorithm for NGS data
Stars: ✭ 21 (-68.18%)
Mutual labels:  metagenomics
AMBER
AMBER: Assessment of Metagenome BinnERs
Stars: ✭ 18 (-72.73%)
Mutual labels:  metagenomics
SemiBin
No description or website provided.
Stars: ✭ 25 (-62.12%)
Mutual labels:  metagenomics
DRAM
Distilled and Refined Annotation of Metabolism: A tool for the annotation and curation of function for microbial and viral genomes
Stars: ✭ 159 (+140.91%)
Mutual labels:  metagenomics
traitar
From genomes to phenotypes: Traitar, the microbial trait analyzer
Stars: ✭ 41 (-37.88%)
Mutual labels:  metagenomics
functree-ng
An interactive radial tree for functional hierarchies and omics data visualization
Stars: ✭ 18 (-72.73%)
Mutual labels:  metagenomics

Bonsai: Flexible Taxonomic Analysis and Extension Build Status Language grade: C/C++

Bonsai contains varied utilities for taxonomic analysis and classification using exact subsequence matches. These include:

  • A high-performance, generic taxonomic classifier
    • Efficient classification
      • 20x as fast, single-threaded, as Kraken in our benchmarks, while demonstrating significantly better threadscaling.
    • Arbitrary, user-defined spaced-seed encoding.
      • Reference compression by windowing/minimization schemes.
      • Generic minimization including by taxonomic depth, lexicographic value, subsequence specificity, or Shannon entropy.
    • Parallelized pairwise Jaccard Distance estimation using HyperLogLog sketches, which has recently migrated to dashing.
  • An unsupervised method for taxonomic structure discovery and correction. (metatree)
  • A threadsafe, SIMD-accelerated HyperLogLog implementation, which has migrated to hll.
  • Scripts for downloading reference genomes from new (post-2014) and old RefSeq.

Tools have been compiled using both zlib and zstd, which means that they can transparently consume zlib-, zstd-, and uncompressed files.

All of these tools are experimental. Use at your own risk.

Build Instructions

cd bonsai && make bonsai

Unit Tests

We use the Catch testing framework. You can build and run the tests by:

cd bonsai && make unit && ./unit

Usage

Usage instructions are available in each executable by executing it with no options or providing the -h flag.

For classification purposes, the commands involved are bonsai prebuild, bonsai build, and bonsai classify. prebuild is only required for taxonomic or feature minimization strategies, for which case database building requires double the memory requirements. Unless you're very sure you know what you're doing, we recommend simply bonsai build with either Entropy or Lexicographic minimization.

To build a database with k = 31, window size = 50, minimized by entropy, from a taxonomy in ref/nodes.dmp and a nameidmap in ref/nameidmap.txt and store it in in bns.db

bonsai build -e -w50 -k31 -p20 -T ref/nodes.dmp -M ref/nameidmap.txt bns.db `find ref/ -name '*.fna.gz'`

To prepare the above, the script in python/download_genomes.py can be used. The default of downloading all available genomes can be run by python python/download_genomes.py --threads 20 all. This places downloaded genomes by default into the paths listed above in the bonsai build command. These paths can be altered; see python/download_genomes.py -h/--help for details.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].