All Projects → bwa-mem2 → Bwa Mem2

bwa-mem2 / Bwa Mem2

Licence: other
The next version of bwa-mem

Projects that are alternatives of or similar to Bwa Mem2

Bio.jl
[DEPRECATED] Bioinformatics and Computational Biology Infrastructure for Julia
Stars: ✭ 257 (-37.01%)
Mutual labels:  bioinformatics, genomics
Jbrowse
A modern genome browser built with JavaScript and HTML5.
Stars: ✭ 393 (-3.68%)
Mutual labels:  bioinformatics, genomics
Vcfanno
annotate a VCF with other VCFs/BEDs/tabixed files
Stars: ✭ 259 (-36.52%)
Mutual labels:  bioinformatics, genomics
GenomeAnalysisModule
Welcome to the website and github repository for the Genome Analysis Module. This website will guide the learning experience for trainees in the UBC MSc Genetic Counselling Training Program, as they embark on a journey to learn about analyzing genomes.
Stars: ✭ 19 (-95.34%)
Mutual labels:  bioinformatics, genomics
Jcvi
Python library to facilitate genome assembly, annotation, and comparative genomics
Stars: ✭ 404 (-0.98%)
Mutual labels:  bioinformatics, genomics
gff3toembl
Converts Prokka GFF3 files to EMBL files for uploading annotated assemblies to EBI
Stars: ✭ 27 (-93.38%)
Mutual labels:  bioinformatics, genomics
Pygeno
Personalized Genomics and Proteomics. Main diet: Ensembl, side dishes: SNPs
Stars: ✭ 261 (-36.03%)
Mutual labels:  bioinformatics, genomics
EarlGrey
Earl Grey: A fully automated TE curation and annotation pipeline
Stars: ✭ 25 (-93.87%)
Mutual labels:  bioinformatics, genomics
Jvarkit
Java utilities for Bioinformatics
Stars: ✭ 313 (-23.28%)
Mutual labels:  bioinformatics, genomics
Arvados
An open source platform for managing and analyzing biomedical big data
Stars: ✭ 274 (-32.84%)
Mutual labels:  bioinformatics, genomics
Gwa tutorial
A comprehensive tutorial about GWAS and PRS
Stars: ✭ 303 (-25.74%)
Mutual labels:  bioinformatics, genomics
Megahit
Ultra-fast and memory-efficient (meta-)genome assembler
Stars: ✭ 343 (-15.93%)
Mutual labels:  bioinformatics, genomics
fermikit
De novo assembly based variant calling pipeline for Illumina short reads
Stars: ✭ 98 (-75.98%)
Mutual labels:  bioinformatics, genomics
varsome-api-client-python
Example client programs for Saphetor's VarSome annotation API
Stars: ✭ 21 (-94.85%)
Mutual labels:  bioinformatics, genomics
bacnet
BACNET is a Java based platform to develop website for multi-omics analysis
Stars: ✭ 12 (-97.06%)
Mutual labels:  bioinformatics, genomics
Postgui
A React web application to query and share any PostgreSQL database.
Stars: ✭ 260 (-36.27%)
Mutual labels:  bioinformatics, genomics
tiptoft
Predict plasmids from uncorrected long read data
Stars: ✭ 27 (-93.38%)
Mutual labels:  bioinformatics, genomics
dna-traits
A fast 23andMe genome text file parser, now superseded by arv
Stars: ✭ 64 (-84.31%)
Mutual labels:  bioinformatics, genomics
Seq
A high-performance, Pythonic language for bioinformatics
Stars: ✭ 263 (-35.54%)
Mutual labels:  bioinformatics, genomics
Bowtie2
A fast and sensitive gapped read aligner
Stars: ✭ 365 (-10.54%)
Mutual labels:  bioinformatics, genomics

GitHub Downloads BioConda Install

Important Information

We are happy to announce that the index size on disk is down by 8 times and in memory by 4 times due to moving to only one type of FM-index (2bit.64 instead of 2bit.64 and 8bit.32) and 8x compression of suffix array. For example, for human genome, index size on disk is down to ~10GB from ~80GB and memory footprint is down to ~10GB from ~40GB. There is a substantial reduction in index IO time due to the reduction and hardly any performance impact on read mapping. Due to this change in index structure (in commit #4b59796, 10th October 2020), you will need to rebuild the index.

Added MC flag in the output sam file in commit a591e22. Output should match original bwa-mem version 0.7.17.

As of commit e0ac59e, we have a git submodule safestringlib. To get it, use --recursive while cloning or use "git submodule init" and "git submodule update" in an already cloned repository (See below for more details).

Getting Started

# Use precompiled binaries (recommended)
curl -L https://github.com/bwa-mem2/bwa-mem2/releases/download/v2.0pre2/bwa-mem2-2.0pre2_x64-linux.tar.bz2 \
  | tar jxf -
bwa-mem2-2.0pre2_x64-linux/bwa-mem2 index ref.fa
bwa-mem2-2.0pre2_x64-linux/bwa-mem2 mem ref.fa read1.fq read2.fq > out.sam

# Compile from source (not recommended for general users)
# Get the source
git clone --recursive https://github.com/bwa-mem2/bwa-mem2
cd bwa-mem2
# Or
git clone https://github.com/bwa-mem2/bwa-mem2
cd bwa-mem2
git submodule init
git submodule update
# Compile and run
make
./bwa-mem2

Introduction

Bwa-mem2 is the next version of the bwa-mem algorithm in bwa. It produces alignment identical to bwa and is ~1.3-3.1x faster depending on the use-case, dataset and the running machine.

The original bwa was developed by Heng Li (@lh3). Performance enhancement in bwa-mem2 was primarily done by Vasimuddin Md (@yuk12) and Sanchit Misra (@sanchit-misra) from Parallel Computing Lab, Intel. Bwa-mem2 is distributed under the MIT license.

Installation

For general users, it is recommended to use the precompiled binaries from the release page. These binaries were compiled with the Intel compiler and runs faster than gcc-compiled binaries. The precompiled binaries also indirectly support CPU dispatch. The bwa-mem2 binary can automatically choose the most efficient implementation based on the SIMD instruction set available on the running machine. Precompiled binaries were generated on a CentOS7 machine using the following command line:

make CXX=icpc multi

Usage

The usage is exactly same as the original BWA MEM tool. Here is a brief synopsys. Run ./bwa-mem2 for available commands.

# Indexing the reference sequence (Requires 28N GB memory where N is the size of the reference sequence).
./bwa-mem2 index [-p prefix] <in.fasta>
Where 
<in.fasta> is the path to reference sequence fasta file and 
<prefix> is the prefix of the names of the files that store the resultant index. Default is in.fasta.

# Mapping 
# Run "./bwa-mem2 mem" to get all options
./bwa-mem2 mem -t <num_threads> <prefix> <reads.fq/fa> > out.sam
Where <prefix> is the prefix specified when creating the index or the path to the reference fasta file in case no prefix was provided.

Performance

Datasets:
Reference Genome: human_g1k_v37.fasta

Alias Dataset source No. of reads Read length
D1 Broad Institute 2 x 2.5M bp 151bp
D2 SRA: SRR7733443 2 x 2.5M bp 151bp
D3 SRA: SRR9932168 2 x 2.5M bp 151bp
D4 SRA: SRX6999918 2 x 2.5M bp 151bp

Machine details:
Processor: Intel(R) Xeon(R) 8280 CPU @ 2.70GHz
OS: CentOS Linux release 7.6.1810
Memory: 100GB

We followed the steps below to collect the performance results:
A. Data download steps:

  1. Download SRA toolkit from https://trace.ncbi.nlm.nih.gov/Traces/sra/sra.cgi?view=software#header-global
  2. tar xfzv sratoolkit.2.10.5-centos_linux64.tar.gz
  3. Download D2: sratoolkit.2.10.5-centos_linux64/bin/fastq-dump --split-files SRR7733443
  4. Download D3: sratoolkit.2.10.5-centos_linux64/bin/fastq-dump --split-files SRR9932168
  5. Download D4: sratoolkit.2.10.5-centos_linux64/bin/fastq-dump --split-files SRX6999918

B. Alignment steps:

  1. git clone https://github.com/bwa-mem2/bwa-mem2.git
  2. cd bwa-mem2
  3. make CXX=icpc (using intel C/C++ compiler)
    or make (using gcc compiler)
  4. ./bwa-mem2 index <ref.fa>
  5. ./bwa-mem2 mem [-t <#threads>] <ref.fa> <in_1.fastq> [<in_2.fastq>] > <output.sam>

For example, in our double socket (56 threads each) and double numa compute node, we used the following command line to align D2 to human_g1k_v37.fasta reference genome.

numactl -m 0 -C 0-27,56-83 ./bwa-mem2 index human_g1k_v37.fasta  
numactl -m 0 -C 0-27,56-83 ./bwa-mem2 mem -t 56 human_g1k_v37.fasta SRR7733443_1.fastq SRR7733443_2.fastq > d3_align.sam

bwa-mem2 seeding speedup with Enumerated Radix Trees (Code in ert branch)

The ert branch of bwa-mem2 repository contains codebase of enuerated radix tree based acceleration of bwa-mem2. The ert code is built on the top of bwa-mem2 (thanks to the hard work by @arun-sub). The following are the highlights of the ert based bwa-mem2 tool:

  1. Exact same output as bwa-mem(2)
  2. The tool has two additional flags to enable the use of ert solution (for index creation and mapping), else it runs in vanilla bwa-mem2 mode
  3. It uses 1 additional flag to create ert index (different from bwa-mem2 index) and 1 additional flag for using that ert index (please see the readme of ert branch)
  4. The ert solution is 10% - 30% faster (tested on above machine configuration) in comparison to vanilla bwa-mem2 -- users are adviced to use option -K 1000000 to see the speedups
  5. The memory foot print of the ert index is ~60GB
  6. The code is present in ert branch: https://github.com/bwa-mem2/bwa-mem2/tree/ert

Citation

Vasimuddin Md, Sanchit Misra, Heng Li, Srinivas Aluru. Efficient Architecture-Aware Acceleration of BWA-MEM for Multicore Systems. IEEE Parallel and Distributed Processing Symposium (IPDPS), 2019.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].