All Projects → KirillKryukov → naf

KirillKryukov / naf

Licence: Zlib License
Nucleotide Archival Format - Compressed file format for DNA/RNA/protein sequences

Programming Languages

c
50402 projects - #5 most used programming language
perl
6916 projects
Makefile
30231 projects

Projects that are alternatives of or similar to naf

dnapacman
waka waka
Stars: ✭ 15 (-57.14%)
Mutual labels:  dna, protein, rna
FluentDNA
FluentDNA allows you to browse sequence data of any size using a zooming visualization similar to Google Maps. You can use FluentDNA as a standalone program or as a python module for your own bioinformatics projects.
Stars: ✭ 52 (+48.57%)
Mutual labels:  dna, protein, fasta
hlatyping
Precision HLA typing from next-generation sequencing data
Stars: ✭ 28 (-20%)
Mutual labels:  dna, rna
sequencework
programs and scripts, mainly python, for analyses related to nucleic or protein sequences
Stars: ✭ 22 (-37.14%)
Mutual labels:  dna, rna
mmtf
The specification of the MMTF format for biological structures
Stars: ✭ 40 (+14.29%)
Mutual labels:  compression, file-format
Biopython
Official git repository for Biopython (originally converted from CVS)
Stars: ✭ 2,936 (+8288.57%)
Mutual labels:  dna, protein
cljam
A DNA Sequence Alignment/Map (SAM) library for Clojure
Stars: ✭ 85 (+142.86%)
Mutual labels:  fasta, fastq
pydna
Clone with Python! Data structures for double stranded DNA & simulation of homologous recombination, Gibson assembly, cut & paste cloning.
Stars: ✭ 109 (+211.43%)
Mutual labels:  dna, fasta
bioinf-commons
Bioinformatics library in Kotlin
Stars: ✭ 21 (-40%)
Mutual labels:  fasta, fastq
BuddySuite
Bioinformatics toolkits for manipulating sequence, alignment, and phylogenetic tree files
Stars: ✭ 106 (+202.86%)
Mutual labels:  dna, protein
orfipy
Fast and flexible ORF finder
Stars: ✭ 27 (-22.86%)
Mutual labels:  dna, protein
poly
A Go package for engineering organisms.
Stars: ✭ 270 (+671.43%)
Mutual labels:  dna, fasta
Pairfq
Sync paired-end FASTA/Q files and keep singleton reads
Stars: ✭ 18 (-48.57%)
Mutual labels:  fasta, fastq
lightdock
Protein-protein, protein-peptide and protein-DNA docking framework based on the GSO algorithm
Stars: ✭ 110 (+214.29%)
Mutual labels:  dna, protein
fuc
Frequently used commands in bioinformatics
Stars: ✭ 23 (-34.29%)
Mutual labels:  fasta, fastq
seqfold
minimalistic nucleic acid folding
Stars: ✭ 39 (+11.43%)
Mutual labels:  dna, rna
rust-huffman-compress
A Rust library for Huffman compression given a propability distribution over arbitrary symbols
Stars: ✭ 18 (-48.57%)
Mutual labels:  compression
QmapCompression
Official implementation of "Variable-Rate Deep Image Compression through Spatially-Adaptive Feature Transform", ICCV 2021
Stars: ✭ 27 (-22.86%)
Mutual labels:  compression
deflate-rs
An implementation of a DEFLATE encoder in rust
Stars: ✭ 47 (+34.29%)
Mutual labels:  compression
zip-bucket
zips files in a Google Cloud Storage [tm] bucket
Stars: ✭ 32 (-8.57%)
Mutual labels:  compression

Nucleotide Archival Format (NAF)

NAF is a binary file format for biological sequence data. It's based on zstd, and features strong compression and fast decompression. It can store DNA, RNA, protein or text sequences, with or without qualities. It supports FASTA and FASTQ-formatted sequences, ambiguous IUPAC codes, masked sequence, and has no limit on sequence length or number of sequences. It supports Unix pipes which allows easy integration into pipelines. See NAF homepage for details.

Example benchmark: SILVA 132 LSURef database (610 MB):
From Sequence Compression Benchmark project - visit for details and more benchmarks.

More examples:

Format specification

NAF specification is in public domain: NAFv2.pdf

Encoder and decoder

NAF encoder and decoder are called "ennaf" and "unnaf". After compressing your data with ennaf, you suddenly have enough space. However, if you decompress it back with unnaf, your space is again un-enough.

Installing

Installing with bioconda

To install NAF with bioconda:

conda install naf

See package page for details: naf at bioconda.

Building from source

Prerequisites: git, gcc, make, diff, perl (diff and perl are only used for test suite). E.g., to install on Ubuntu: sudo apt install git gcc make diffutils perl. On Mac OS you may have to install Xcode Command Line Tools.

Building and installing:

git clone --recurse-submodules https://github.com/KirillKryukov/naf.git
cd naf && make && make test && sudo make install

To install in alternative location, add "prefix=DIR" to the "make install" command. E.g., sudo make prefix=/usr/local/bio install

For a staged install, add "DESTDIR=DIR". E.g., make DESTDIR=/tmp/stage install

On Windows it can be installed using Cygwin, and should be also possible with WSL. In Cygwin drop sudo: cd naf && make && make test && make install

Building from latest unreleased source

For testing purpose only:

git clone --recurse-submodules --branch develop https://github.com/KirillKryukov/naf.git
cd naf && make && make test && sudo make install

Compressing

ennaf file.fa -o file.naf

See ennaf -h and Compression Manual for detailed usage.

Decompressing

unnaf file.naf -o file.fa

See unnaf -h and Decompression Manual.

Compressing multiple files

Working with multiple files is possible using Multi-Multi-FASTA as intermediate format. Example commands:

Compressing:
mumu.pl --dir 'Helicobacter' 'Helicobacter pylori*' | ennaf -22 --text -o Hp.nafnaf

Decompressing and unpacking:
unnaf Hp.nafnaf | mumu.pl --unpack --dir 'Helicobacter'

Filename of NAF-compressed single file normally ends with a ".naf". To avoid ambiguity, ".nafnaf" is the recommended suffix for multi-file NAF archives.

Citation

If you use NAF, please cite:

For compressor benchmark, please cite:

  • Kirill Kryukov, Mahoko Takahashi Ueda, So Nakagawa, Tadashi Imanishi (2020) "Sequence Compression Benchmark (SCB) database — A comprehensive evaluation of reference-free compressors for FASTA-formatted sequences" GigaScience, 9(7), giaa072, doi: 10.1093/gigascience/giaa072.
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].