Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → biod → Sambamba

biod / Sambamba

Licence: gpl-2.0

Tools for working with SAM/BAM data

Programming Languages

599 projects

Labels

bioinformatics sam

Projects that are alternatives of or similar to Sambamba

Htslib

C library for high-throughput sequencing data formats

Stars: ✭ 529 (+29.34%)

Mutual labels: sam, bioinformatics

simplesam

Simple pure Python SAM parser and objects for working with SAM records

Stars: ✭ 50 (-87.78%)

Mutual labels: bioinformatics, sam

BioD

A D library for computational biology and bioinformatics

Stars: ✭ 45 (-89%)

Mutual labels: bioinformatics, sam

Gwa tutorial

A comprehensive tutorial about GWAS and PRS

Stars: ✭ 303 (-25.92%)

Mutual labels: bioinformatics

Bioinformatics One Liners

Bioinformatics one liners from Ming Tang

Stars: ✭ 309 (-24.45%)

Mutual labels: bioinformatics

Megahit

Ultra-fast and memory-efficient (meta-)genome assembler

Stars: ✭ 343 (-16.14%)

Mutual labels: bioinformatics

Jcvi

Python library to facilitate genome assembly, annotation, and comparative genomics

Stars: ✭ 404 (-1.22%)

Mutual labels: bioinformatics

Bionode

Modular and universal bioinformatics

Stars: ✭ 294 (-28.12%)

Mutual labels: bioinformatics

Nanopolish

Signal-level algorithms for MinION data

Stars: ✭ 367 (-10.27%)

Mutual labels: bioinformatics

Grakel

A scikit-learn compatible library for graph kernels

Stars: ✭ 330 (-19.32%)

Mutual labels: bioinformatics

Sam

SAM: Sharpness-Aware Minimization (PyTorch)

Stars: ✭ 322 (-21.27%)

Mutual labels: sam

Dash Cytoscape

Interactive network visualization in Python and Dash, powered by Cytoscape.js

Stars: ✭ 309 (-24.45%)

Mutual labels: bioinformatics

Deeppurpose

A Deep Learning Toolkit for DTI, Drug Property, PPI, DDI, Protein Function Prediction (Bioinformatics)

Stars: ✭ 342 (-16.38%)

Mutual labels: bioinformatics

Pyfaidx

Efficient pythonic random access to fasta subsequences

Stars: ✭ 307 (-24.94%)

Mutual labels: bioinformatics

Bowtie2

A fast and sensitive gapped read aligner

Stars: ✭ 365 (-10.76%)

Mutual labels: bioinformatics

Edlib

Lightweight, super fast C/C++ (& Python) library for sequence alignment using edit (Levenshtein) distance.

Stars: ✭ 298 (-27.14%)

Mutual labels: bioinformatics

Aws Serverless Workshop Innovator Island

Welcome to the Innovator Island serverless workshop! This repo contains all the instructions and code you need to complete the workshop. Questions? Contact @jbesw.

Stars: ✭ 363 (-11.25%)

Mutual labels: sam

Biopandas

Working with molecular structures in pandas DataFrames

Stars: ✭ 329 (-19.56%)

Mutual labels: bioinformatics

Dash Bio

Open-source bioinformatics components for Dash

Stars: ✭ 329 (-19.56%)

Mutual labels: bioinformatics

Cutadapt

Cutadapt removes adapter sequences from sequencing reads

Stars: ✭ 340 (-16.87%)

Mutual labels: bioinformatics

View All Similar Projects ➔

[]

SAMBAMBA

Introduction
Binary installation
Getting help
Compiling Sambamba
Debugging and troubleshooting
License
Credit

Introduction

Sambamba is a high performance highly parallel robust and fast tool (and library), written in the D programming language, for working with SAM and BAM files. Because of its efficiency Sambamba is an important work horse running in many sequencing centres around the world today. As of November 2020, Sambamba has been cited over 450 times and has been installed from Conda over 180K times. Sambamba is also distributed by Debian.

Current functionality is an important subset of samtools functionality, including view, index, sort, markdup, and depth. Most tools support piping: just specify /dev/stdin or /dev/stdout as filenames. When we started writing sambamba (in 2012) the main advantage over samtools was parallelized BAM reading and writing. In March 2017 samtools 1.4 was released, reaching parity at least on architecture. A recent performance comparison shows that sambamba still holds its ground and can even do better. Here are some comparison metrics. For example, for flagstat sambamba is 1.4x faster than samtools. For index they are similar. For Markdup almost 6x faster and for view 4x faster. For sort sambamba has been beaten, though sambamba is notably up to 2x faster than samtools on large RAM machines (120GB+).

In addition sambamba has a few interesting features to offer, in particular

fast large machine sort, see performance
automatic index creation when writing any coordinate-sorted file
view -L <bed file> utilizes BAM index to skip unrelated chunks
depth allows to measure base, sliding window, or region coverages
- Chanjo builds upon this and gets you to exon/gene levels of abstraction
markdup, a fast implementation of Picard algorithm
slice quickly extracts a region into a new file, tweaking only first/last chunks
and more (you'll have to try)

Even though Sambamba started out as a samtools clone we are now in the process of adding new functionality - also in the BioD project. The D language is extremely suitable for high performance computing (HPC). At this point we think that the BAM format is here to stay for processing sequencing data and we aim to make it easy to parse and process BAM files.

Sambamba is free and open source software, licensed under GPLv2+. See manual pages online to know more about what is available and how to use it.

For more information on Sambamba contact the mailing list (see Getting help).

No CRAM support

Important notice: with version 0.8 support for CRAM was removed from Sambamba (see the RELEASE NOTES)

To use CRAM you can still use one of the older (binary) releases of Sambamba.

Binary installation

Install stable release

For those not in the mood to learn/install new package managers, there are Github source and binary releases. Simply download the tarball, unpack it and run it according to the accompanying release notes.

Below package managers Conda, GNU Guix, Debian and Homebrew also provide recent binary installs for Linux. For MacOS you may use Conda or Homebrew.

Bioconda install

Ther should be binary downloads for Linux and MacOS.

With Conda use the bioconda channel.

GNU Guix install

A reproducible GNU Guix package for sambamba is available. The development version is packaged here.

Debian GNU/Linux install

Homebrew install

Users of Homebrew can also use the formula from homebrew-bio.

brew install brewsci/bio/sambamba

It should work for Linux and MacOS.

Getting help

Sambamba has a mailing list for installation help and general discussion.

Reporting a sambamba bug or issue

Before posting an issue search the issue tracker and mailing list first. It is likely someone may have encountered something similar. Also try running the latest version of sambamba to make sure it has not been fixed already. Support/installation questions should be aimed at the mailing list. The issue tracker is for development issues around the software itself. When reporting an issue include the output of the program and the contents of the output directory.

Check list:

[X] I have found and issue with sambamba
[ ] I have searched for it on the issue tracker (also check closed issues)
[ ] I have searched for it on the mailing list
[ ] I have tried the latest release of sambamba
[ ] I have read and agreed to below code of conduct
[ ] If it is a support/install question I have posted it to the mailing list
[ ] If it is software development related I have posted a new issue on the issue tracker or added to an existing one
[ ] In the message I have included the output of my sambamba run
[ ] In the message I have included the relevant files in the output directory
[ ] I have made available the data to reproduce the problem (optional)

To find bugs the sambamba software developers may ask to install a development version of the software. They may also ask you for your data and will treat it confidentially. Please always remember that sambamba is written and maintained by volunteers with good intentions. Our time is valuable too. By helping us as much as possible we can provide this tool for everyone to use.

Code of conduct

By using sambamba and communicating with its communtity you implicitely agree to abide by the code of conduct as published by the Software Carpentry initiative.

Compiling Sambamba

Note: in general there is no need to compile sambamba. You can use a recent binary install as listed above.

The preferred method for compiling Sambamba is with the LDC compiler which targets LLVM. LLVM version 6 is faster than earlier editions.

Compilation dependencies

See INSTALL.md.

Compiling for Linux

The LDC compiler's github repository provides binary images. The current preferred release for sambamba is LDC - the LLVM D compiler (>= 1.6.1). After installing LDC from https://github.com/ldc-developers/ldc/releases/ with, for example

cd
wget https://github.com/ldc-developers/ldc/releases/download/v$ver/ldc2-1.7.0-linux-x86_64.tar.xz
tar xvJf ldc2-1.7.0-linux-x86_64.tar.xz
export PATH=$HOME/ldc2-1.7.0-linux-x86_64/bin:$PATH
export LIBRARY_PATH=$HOME/ldc2-1.7.0-linux-x86_64/lib

git clone --recursive https://github.com/biod/sambamba.git
cd sambamba
make

To build a development/debug version run

make clean && make debug

To run the test fetch shunit2 from https://github.com/kward/shunit2 and put it in the path so you can run

make check

GNU Guix

Our development and release environment is GNU Guix. To build sambamba the LDC compiler is also available in GNU Guix:

guix package -A ldc

For more instructions see INSTALL.md.

Compiling for MacOS

Sambamba builds on MacOS. We have a Travis integration test as an example. It can be something like

    brew install ldc
    git clone --recursive https://github.com/biod/sambamba.git
    cd sambamba
    make

Development

Sambamba development and issue tracker is on github. Developer documentation can be found in the source code and the development documentation.

Debugging and troubleshooting

Segfaults on certain Intel Xeons

Important note: some older Xeon processors segfault under heavy hyper threading - which Sambamba utilizes. Please read this when encountering seemingly random crashes. There is no real fix other than disabling hyperthreading. Also discussed here. Thank Intel for producing this bug.

Dump core

In a crash sambamba can dump a core file. To make this happen set

ulimit -c unlimited

and run your command. Send us the core file so we can reproduce the state at time of segfault.

Use catchsegv

Another option is to use catchsegv

catchsegv ./build/sambamba command

this will show state on stdout which can be sent to us.

Using gdb

In case of crashes it's helpful to have GDB stacktraces (bt command). A full stacktrace for all threads:

thread apply all backtrace full

Note that GDB should be made aware of D garbage collector which emits SIGUSR signals and gdb needs to ignore them with

handle SIGUSR1 SIGUSR2 nostop noprint

A binary relocatable install of sambamba with debug information and all dependencies can be fetched from the binary link above. Unpack the tarball and run the contained install.sh script with TARGET

./install.sh ~/sambamba-test

Run sambamba in gdb with

gdb -ex 'handle SIGUSR1 SIGUSR2 nostop noprint' \
  --args ~/sambamba-test/sambamba-*/bin/sambamba view --throw-error

License

Sambamba is generously distributed under GNU Public License v2+.

Credit

Citations are the bread and butter of Science. If you are using Sambamba in your research and want to support our future work on Sambamba, please cite the following publication:

A. Tarasov, A. J. Vilella, E. Cuppen, I. J. Nijman, and P. Prins. Sambamba: fast processing of NGS alignment formats. Bioinformatics, 2015.

Bibtex reference


@article{doi:10.1093/bioinformatics/btv098,
  author = {Tarasov, Artem and Vilella, Albert J. and Cuppen, Edwin and Nijman, Isaac J. and Prins, Pjotr},
  title = {Sambamba: fast processing of NGS alignment formats},
  journal = {Bioinformatics},
  volume = {31},
  number = {12},
  pages = {2032-2034},
  year = {2015},
  doi = {10.1093/bioinformatics/btv098},
  URL = { + http://dx.doi.org/10.1093/bioinformatics/btv098}

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 409

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (12) 🔗

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

biod / Sambamba

Programming Languages

Labels

Projects that are alternatives of or similar to Sambamba

SAMBAMBA

Table of Contents

Introduction

No CRAM support

Binary installation

Install stable release

Bioconda install

GNU Guix install

Debian GNU/Linux install

Homebrew install

Getting help

Reporting a sambamba bug or issue

Check list:

Code of conduct

Compiling Sambamba

Compilation dependencies

Compiling for Linux

GNU Guix

Compiling for MacOS

Development

Debugging and troubleshooting

Segfaults on certain Intel Xeons

Dump core

Use catchsegv

Using gdb

License

Credit

Bibtex reference