All Projects → kloetzl → libdna

kloetzl / libdna

Licence: MIT license
♥ Essential Functions for DNA Manipulation

Programming Languages

c
50402 projects - #5 most used programming language
C++
36643 projects - #6 most used programming language

Projects that are alternatives of or similar to libdna

Deepvariant
DeepVariant is an analysis pipeline that uses a deep neural network to call genetic variants from next-generation DNA sequencing data.
Stars: ✭ 2,404 (+15926.67%)
Mutual labels:  dna
bamgineer
Bamgineer: Introduction of simulated allele-specific copy number variants into exome and targeted sequence data sets
Stars: ✭ 35 (+133.33%)
Mutual labels:  dna
snps
tools for reading, writing, merging, and remapping SNPs
Stars: ✭ 57 (+280%)
Mutual labels:  dna
Biopython
Official git repository for Biopython (originally converted from CVS)
Stars: ✭ 2,936 (+19473.33%)
Mutual labels:  dna
cora-docs
CoRA Docs
Stars: ✭ 36 (+140%)
Mutual labels:  dna
STing
Ultrafast sequence typing and gene detection from NGS raw reads
Stars: ✭ 15 (+0%)
Mutual labels:  dna
Pyrosetta.notebooks
Jupyter Notebooks for learning the PyRosetta platform for biomolecular structure prediction and design
Stars: ✭ 116 (+673.33%)
Mutual labels:  dna
gblastn
G-BLASTN is a GPU-accelerated nucleotide alignment tool based on the widely used NCBI-BLAST.
Stars: ✭ 52 (+246.67%)
Mutual labels:  dna
DNA-Sequence-Machine-learning
Understand DNA structure and how machine learning can be used to work with DNA sequence data.
Stars: ✭ 25 (+66.67%)
Mutual labels:  dna
variantkey
Numerical Encoding for Human Genetic Variants
Stars: ✭ 32 (+113.33%)
Mutual labels:  dna
dnapacman
waka waka
Stars: ✭ 15 (+0%)
Mutual labels:  dna
sequencework
programs and scripts, mainly python, for analyses related to nucleic or protein sequences
Stars: ✭ 22 (+46.67%)
Mutual labels:  dna
polyply 1.0
Generate input parameters and coordinates for atomistic and coarse-grained simulations of polymers, ssDNA, and carbohydrates
Stars: ✭ 59 (+293.33%)
Mutual labels:  dna
Htsjdk
A Java API for high-throughput sequencing data (HTS) formats.
Stars: ✭ 220 (+1366.67%)
Mutual labels:  dna
adapt
A package for designing activity-informed nucleic acid diagnostics for viruses.
Stars: ✭ 16 (+6.67%)
Mutual labels:  dna
Shasta
De novo assembly from Oxford Nanopore reads.
Stars: ✭ 188 (+1153.33%)
Mutual labels:  dna
arv
A fast 23andMe DNA parser and inferrer for Python
Stars: ✭ 98 (+553.33%)
Mutual labels:  dna
lightdock
Protein-protein, protein-peptide and protein-DNA docking framework based on the GSO algorithm
Stars: ✭ 110 (+633.33%)
Mutual labels:  dna
Repo-Bio
Binomica Public Repository for Biological Parts
Stars: ✭ 21 (+40%)
Mutual labels:  dna
pydna
Clone with Python! Data structures for double stranded DNA & simulation of homologous recombination, Gibson assembly, cut & paste cloning.
Stars: ✭ 109 (+626.67%)
Mutual labels:  dna

libdna

Build Status Documentation Status

The aim of this project is to unify functionality commonly found in bioinformatics projects working on DNA. DNA, as opposed to RNA or amino acid sequences, are very long strings. Even bacterial genomes are easily a few megabyte in size. Thus, for efficient analysis the length has to be taken into account in the design of an application. To this end, libdna contains SIMD routines highly optimised for DNA strings. For some functions the library even choses the optimal implementation depending on the CPU at runtime.

Installation

Libdna requires the Meson buildsystem. It is commonly available via package managers. Then execute the following steps to compile and install the latest version of libdna.

git clone https://github.com/kloetzl/libdna.git
meson builddir
cd builddir
meson compile
meson install

Contributor may also want to take a look at the Makefile.Maintainer. It contains handy shortcuts set up tests, benchmarks and other release related files.

How to use

Libdna is both simple, efficient and customizable. For instance, many bioinformatics tools need to compute the reverse complement of some DNA sequence. Now it is just one function call away, dna4_revcomp. The prefix dna4 indicates that this function is optimised for strings containing only the four canonical nucleotides A, C, G and T. The first parameter is a pointer to the beginning of the string. The second parameter points to the first byte just past the string. So for a string starting at str of length len the arguments are str and str + len representing the string [str, str+1, …, str+len). The last parameter is a pointer to a location with enough space to hold the reverse compliment.

#include <kloetzl/dna.h>

int main()
{
	char buffer[] = "ACGT";
	char rev[5] = {0};
	dna4_revcomp(buffer, buffer + 4, rev);

	printf("%s\n", rev);
}

In C++ things are even simpler thanks to a thin wrapper. Instead of raw pointers it uses std::string_ref and std::string to make the API more convenient.

#include <kloetzl/dna.hpp>

int main()
{
	std::cout << dna4::revcomp("ACGT") << "\n";
}

As the wrapper relies on automatic memory management which can incur a significant runtime overhead, the underlying C functions are still available for use. Don't forget to link with -ldna.

Bonus

  • libdna comes with man pages for IUPAC codes and the standard genetic code.
  • Where a dna4_ function provides high performance at the expense of limited applicability the dnax_ functions provide generality at the expense of speed.
  • Some functions pick the optimal SIMD instruction set at runtime.
  • To prove efficiency, benchmarks with alternate implementations are included.

License

Copyright © 2018 - 2022 Fabian Klötzl [email protected]
MIT License

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].