All Projects → biocore → Sortmerna

biocore / Sortmerna

Licence: gpl-3.0
SortMeRNA: next-generation sequence filtering and alignment tool

Programming Languages

python
139335 projects - #7 most used programming language
cpp
1120 projects

Projects that are alternatives of or similar to Sortmerna

block-aligner
SIMD-accelerated library for computing global and X-drop affine gap penalty sequence-to-sequence or sequence-to-profile alignments using an adaptive block-based algorithm.
Stars: ✭ 58 (-46.3%)
Mutual labels:  bioinformatics, alignment
Hh Suite
Remote protein homology detection suite.
Stars: ✭ 230 (+112.96%)
Mutual labels:  bioinformatics, alignment
Lambda
LAMBDA – the Local Aligner for Massive Biological DatA
Stars: ✭ 59 (-45.37%)
Mutual labels:  bioinformatics, alignment
PHAT
Pathogen-Host Analysis Tool - A modern Next-Generation Sequencing (NGS) analysis platform
Stars: ✭ 17 (-84.26%)
Mutual labels:  bioinformatics, alignment
lexicon-mono-seq
DOM Text Based Multiple Sequence Alignment Library
Stars: ✭ 15 (-86.11%)
Mutual labels:  bioinformatics, alignment
Mmseqs2
MMseqs2: ultra fast and sensitive search and clustering suite
Stars: ✭ 441 (+308.33%)
Mutual labels:  bioinformatics, alignment
Sibeliaz
A fast whole-genome aligner based on de Bruijn graphs
Stars: ✭ 76 (-29.63%)
Mutual labels:  bioinformatics, alignment
Molgenis
MOLGENIS - for scientific data: management, exploration, integration and analysis.
Stars: ✭ 88 (-18.52%)
Mutual labels:  bioinformatics
Dnachisel
✏️ A versatile DNA sequence optimizer
Stars: ✭ 95 (-12.04%)
Mutual labels:  bioinformatics
Awesome Bioinformatics
A curated list of awesome Bioinformatics libraries and software.
Stars: ✭ 1,266 (+1072.22%)
Mutual labels:  bioinformatics
Vdjtools
Post-analysis of immune repertoire sequencing data
Stars: ✭ 85 (-21.3%)
Mutual labels:  bioinformatics
Swarm
A robust and fast clustering method for amplicon-based studies
Stars: ✭ 88 (-18.52%)
Mutual labels:  bioinformatics
Ariba
Antimicrobial Resistance Identification By Assembly
Stars: ✭ 96 (-11.11%)
Mutual labels:  bioinformatics
Decontam
Simple statistical identification and removal of contaminants in marker-gene and metagenomics sequencing data
Stars: ✭ 86 (-20.37%)
Mutual labels:  bioinformatics
Awesome Image Alignment And Stitching
A curated list of awesome resources for image alignment and stitching ...
Stars: ✭ 101 (-6.48%)
Mutual labels:  alignment
Clusterflow
A pipelining tool to automate and standardise bioinformatics analyses on cluster environments.
Stars: ✭ 85 (-21.3%)
Mutual labels:  bioinformatics
Bedtk
A simple toolset for BED files (warning: CLI may change before bedtk becomes stable)
Stars: ✭ 103 (-4.63%)
Mutual labels:  bioinformatics
Pymzml
pymzML - an interface between Python and mzML Mass spectrometry Files
Stars: ✭ 100 (-7.41%)
Mutual labels:  bioinformatics
Gcp For Bioinformatics
GCP Essentials for Bioinformatics Researchers
Stars: ✭ 95 (-12.04%)
Mutual labels:  bioinformatics
Fastqt
FastQC port to Qt5: A quality control tool for high throughput sequence data.
Stars: ✭ 92 (-14.81%)
Mutual labels:  bioinformatics

sortmerna

Build Status

SortMeRNA is a local sequence alignment tool for filtering, mapping and clustering.

The core algorithm is based on approximate seeds and allows for sensitive analysis of NGS reads. The main application of SortMeRNA is filtering rRNA from metatranscriptomic data. SortMeRNA takes as input files of reads (fasta, fastq, fasta.gz, fastq.gz) and one or multiple rRNA database file(s), and sorts apart aligned and rejected reads into two files. Additional applications include clustering and taxonomy assignation available through QIIME v1.9.1. SortMeRNA works with Illumina, Ion Torrent and PacBio data, and can produce SAM and BLAST-like alignments.

Table of Contents

Getting Started

SortMeRNA 4 can be run/built on Linux, and Windows (Mac coming soon).

Using Conda package

conda config --add channels bioconda

conda search sortmerna
  Loading channels: done
  # Name                       Version           Build  Channel
  sortmerna                        2.0               0  bioconda
  ...
  sortmerna                       2.1b               0  bioconda
  ...
  sortmerna                      4.2.0               0  bioconda

conda install sortmerna
which sortmerna
  /home/biocodz/miniconda3/bin/conda

# test the installation
sortmerna --version
  SortMeRNA version 4.2.0
  Build Date: Mar 12 2020
  sortmerna_build_git_sha:@[email protected]
  sortmerna_build_git_date:@2020/03/12 12:34:[email protected]

# view help
sortmerna -h

Using GitHub release binaries on Linux

Visit Sortmerna GitHub Releases

Linux distribution is a Shell script with the embedded installation archive.

Issue the following bash commands:

pushd ~

# get the distro
wget https://github.com/biocore/sortmerna/releases/download/v4.2.0/sortmerna-4.2.0-Linux.sh

# view the installer usage
bash sortmerna-4.2.0-Linux.sh --help
    Options: [defaults in brackets after descriptions]
      --help            print this message
      --version         print cmake installer version
      --prefix=dir      directory in which to install
      --include-subdir  include the sortmerna-4.2.0-Linux subdirectory
      --exclude-subdir  exclude the sortmerna-4.2.0-Linux subdirectory
      --skip-license    accept license

# run the installer
bash sortmerna-4.2.0-Linux.sh --skip-license
  sortmerna Installer Version: 4.2.0, Copyright (c) Clarity Genomics
  This is a self-extracting archive.
  The archive will be extracted to: $HOME/sortmerna
  
  Using target directory: /home/biocodz/sortmerna
  Extracting, please wait...
  
  Unpacking finished successfully

# check the installed binaries
ls -lrt /home/biocodz/sortmerna/bin/
sortmerna

# set PATH
export PATH=$HOME/sortmerna/bin:$PATH

# test the installation
sortmerna --version
  SortMeRNA version 4.2.0
  Build Date: Mar 12 2020
  sortmerna_build_git_sha:@[email protected]
  sortmerna_build_git_date:@2020/03/12 12:34:[email protected]

# view help
sortmerna -h

Running

  • The only required options are --ref and --reads
  • Options (any) can be specified usig a single dash e.g. -ref and -reads
  • Both plain fasta/fastq and archived fasta.gz/fastq.gz files are accepted
  • Relative paths are accepted

for example

# single reference and single reads file
sortmerna --ref REF_PATH --reads READS_PATH

# for multiple references use multiple '--ref'
sortmerna --ref REF_PATH_1 --ref REF_PATH_2 --ref REF_PATH_3 --reads READS_PATH

# for paired reads use '--reads' twice
sortmerna --ref REF_PATH_1 --ref REF_PATH_2 --ref REF_PATH_3 --reads READS_PATH_1 --reads READS_PATH_2

More examples can be found in test.jinja.yaml and run.py

Refer to the Manual for usage details

Execution trace

Here is a sample execution trace.

IMPORTANT

  • Progressing execution trace showing the number of reads processed so far indicates a normally running program.
  • Non-progressing trace means a problem. Please, kill the process (no waiting for two days), and file an issue here
  • please, provide the execution trace when filing issues.

Sample execution statistics are provided to give an idea on what the execution time might be.

Building from sources

Build instructions

User Manual

See Sortmerna Wiki - User Manual.

In case you need PDF, any modern browser can print web pages to PDF.

Taxonomies

The folder data/rRNA_databases/silva_ids_acc_tax.tar.gz contains SILVA taxonomy strings (extracted from XML file generated by ARB) for each of the reference sequences in the representative databases. The format of the files is three tab-separated columns, the first being the reference sequence ID, the second being the accession number and the final column is the taxonomy.

Citation

If you use SortMeRNA, please cite: Kopylova E., Noé L. and Touzet H., "SortMeRNA: Fast and accurate filtering of ribosomal RNAs in metatranscriptomic data", Bioinformatics (2012), doi: 10.1093/bioinformatics/bts611.

Contributors

See AUTHORS for a list of contributors to this project.

Support

For questions and comments, please use the SortMeRNA forum.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].