All Projects → TimoLassmann → kalign

TimoLassmann / kalign

Licence: GPL-3.0 license
A fast multiple sequence alignment program.

Programming Languages

c
50402 projects - #5 most used programming language
CMake
9771 projects
Roff
2310 projects
C++
36643 projects - #6 most used programming language

Projects that are alternatives of or similar to kalign

BioAlignments.jl
Sequence alignment tools
Stars: ✭ 49 (-44.94%)
Mutual labels:  sequence-alignment, sequence-analysis
biscuit
BISulfite-seq CUI Toolkit
Stars: ✭ 51 (-42.7%)
Mutual labels:  sequence-alignment
Biopython
Official git repository for Biopython (originally converted from CVS)
Stars: ✭ 2,936 (+3198.88%)
Mutual labels:  sequence-alignment
bioseq-js
For live demo, see http://lh3lh3.users.sourceforge.net/bioseq.shtml
Stars: ✭ 34 (-61.8%)
Mutual labels:  sequence-alignment
lexicon-mono-seq
DOM Text Based Multiple Sequence Alignment Library
Stars: ✭ 15 (-83.15%)
Mutual labels:  sequence-alignment
unimap
A EXPERIMENTAL fork of minimap2 optimized for assembly-to-reference alignment
Stars: ✭ 76 (-14.61%)
Mutual labels:  sequence-alignment
seqalign
Collection of sequence alignment algorithms.
Stars: ✭ 20 (-77.53%)
Mutual labels:  sequence-alignment
deepblast
Neural Networks for Protein Sequence Alignment
Stars: ✭ 29 (-67.42%)
Mutual labels:  sequence-alignment
MA
The Modular Aligner and The Modular SV Caller
Stars: ✭ 39 (-56.18%)
Mutual labels:  sequence-alignment
SneakySnake
SneakySnake🐍 is the first and the only pre-alignment filtering algorithm that works efficiently and fast on modern CPU, FPGA, and GPU architectures. It greatly (by more than two orders of magnitude) expedites sequence alignment calculation for both short and long reads. Described in the Bioinformatics (2020) by Alser et al. https://arxiv.org/abs…
Stars: ✭ 44 (-50.56%)
Mutual labels:  sequence-alignment
RDPTools
Collection of commonly used RDP Tools for easy building
Stars: ✭ 44 (-50.56%)
Mutual labels:  sequence-alignment
seqalign pathing
Rust implementation of sequence alignment / Levenshtein distance by A* acceleration of the DP algorithm
Stars: ✭ 17 (-80.9%)
Mutual labels:  sequence-alignment
chromap
Fast alignment and preprocessing of chromatin profiles
Stars: ✭ 93 (+4.49%)
Mutual labels:  sequence-analysis
flexidot
Highly customizable, ambiguity-aware dotplots for visual sequence analyses
Stars: ✭ 73 (-17.98%)
Mutual labels:  sequence-analysis
lasagne4bio
No description or website provided.
Stars: ✭ 103 (+15.73%)
Mutual labels:  sequence-analysis
wub
Tools and software library developed by the ONT Applications group
Stars: ✭ 57 (-35.96%)
Mutual labels:  sequence-analysis
SNPGenie
Program for estimating πN/πS, dN/dS, and other diversity measures from next-generation sequencing data
Stars: ✭ 81 (-8.99%)
Mutual labels:  sequence-analysis
spacepharer
SpacePHARER CRISPR Spacer Phage-Host pAiRs findER
Stars: ✭ 30 (-66.29%)
Mutual labels:  sequence-analysis
gfapy
Gfapy: a flexible and extensible software library for handling sequence graphs in Python
Stars: ✭ 54 (-39.33%)
Mutual labels:  sequence-analysis

CMake CodeQL

Kalign

Kalign is a fast multiple sequence alignment program for biological sequences.

Installation

Release Tarball

Download tarball from releases. Then:

tar -zxvf kalign-<version>.tar.gz
cd kalign-<version>
mkdir build 
cd build
cmake .. 
make 
make test 
make install

on macOS, install brew then:

brew install cmake 
git clone https://github.com/TimoLassmann/kalign.git
cd kalign
mkdir build
cd build 
cmake ..
make 
make test 
make install

Usage

The command line interface of Kalign accepts the following options:

Usage: kalign  -i <seq file> -o <out aln> 

Options:

   --format           : Output format. [Fasta]
   --type             : Alignment type (rna, dna, internal). [rna]
                        Options: protein, divergent (protein) 
                                 rna, dna, internal (nuc). 
   --gpo              : Gap open penalty. []
   --gpe              : Gap extension penalty. []
   --tgpe             : Terminal gap extension penalty. []
   -n/--nthreads      : Number of threads. [4]
   --version (-V/-v)  : Prints version. [NA]

Kalign expects the input to be a set of unaligned sequences in fasta format or aligned sequences in aligned fasta, MSF or clustal format. If the sequences are already aligned, kalign will remove all gap characters and re-align the sequences.

By default, Kalign automatically detects whether the input sequences are protein or DNA and selects appropriate alignment parameters.

The --type option gives users more direct control over the alignment parameters. Currently there are five core options:

  • protein : uses a the CorBLOSUM66_13plus substituion matrix (default for protein sequence)
  • divergent: uses the gonnet 250 substituion matrix
  • dna : default DNA parameters
    • 5 match score
    • -4 mismatch score
    • -8 gap open penalty
    • -6 gap extension penalty
    • 0 terminal gap extension penalty
  • internal : same as above but terminal gaps set to 8 to encourage gaps within the sequences.
  • rna : parameters optimised for RNA alignments.

The --gpo, --gpe and --tgpe options can be used to further fine tune the parameters.

Examples

Passing sequences via stdin:

cat input.fa | kalign -f fasta > out.afa

Combining multiple input files:

kalign seqsA.fa seqsB.fa seqsC.fa -f fasta > combined.afa

Align sequences and output the alignment in MSF format:

kalign -i BB11001.tfa -f msf  -o out.msf

Align sequences and output the alignment in clustal format:

kalign -i BB11001.tfa -f clu -o out.clu

Re-align sequences in an existing alignment:

kalign -i BB11001.msf  -o out.afa

Reformat existing alignment:

kalign -i BB11001.msf -r afa -o out.afa

Kalign library

To incorporate Kalign into your own projects you can link to the library like this:

find_package(kalign)
target_link_libraries(<target> kalign::kalign)

Alternatively, you can include the kalign code directly in your project and link with:

if (NOT TARGET kalign)
  add_subdirectory(<path_to_kalign>/kalign EXCLUDE_FROM_ALL)
endif ()
target_link_libraries(<target> kalign::kalign)

Benchmark results

Here are some benchmark results. The code to reproduce these figures can be found at here.

Balibase

Balibase_scores

Bralibase

Bralibase_scores

Please cite:

  1. Lassmann, Timo. Kalign 3: multiple sequence alignment of large data sets. Bioinformatics (2019). pdf

Other papers:

  1. Lassmann, Timo, Oliver Frings, and Erik LL Sonnhammer. Kalign2: high-performance multiple alignment of protein and nucleotide sequences allowing external features. Nucleic acids research 37.3 (2008): 858-865. Pubmed
  2. Lassmann, Timo, and Erik LL Sonnhammer. Kalign: an accurate and fast multiple sequence alignment algorithm. BMC bioinformatics 6.1 (2005): 298. Pubmed
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].