All Projects → ratschlab → graph_annotation

ratschlab / graph_annotation

Licence: GPL-3.0 license
Code accompanying the publication for compressed graph annotation

Programming Languages

C++
36643 projects - #6 most used programming language
c
50402 projects - #5 most used programming language
CMake
9771 projects

Hash-based colored de Bruijn graph with wavelet trie and Bloom filter color compression

Reference

This code implements the wavelet trie and corrected Bloom filter compressors proposed in our paper

Dynamic compression schemes for graph coloring, Bioinformatics, 2018 by Harun Mustafa, Ingo Schilken, Mikhail Karasikov, Carsten Eickhoff, Gunnar Rätsch, and André Kahles.

Other methods for representing graph annotations implemented with a more generic API, including Multi-BRWT, Rainbowfish, as well as Row- and Column-major sparse representations, may be found here.

Install

Prerequisites

  • cmake 3.6.1
  • C++14
  • HTSlib
  • GNU GMP
  • boost
  • sdsl-lite

Steps

  1. git clone --recursive https://github.com/ratschlab/graph_annotation
  2. Build sdsl-lite by pushd external-libraries/sdsl-lite; ./install.sh $(pwd); popd
  3. go to the build directory mkdir -p build && cd build
  4. compile by cmake .. && make && ./unit_tests

Build types: cmake .. <arguments> where arguments are:

  • -DCMAKE_BUILD_TYPE=[Debug|Release|Profile] -- build modes (Debug by default)
  • -DBUILD_STATIC=ON -- link statically (OFF by default)

Typical workflow

  1. Generate graph and uncompressed annotations (.precise.dbg and optionally .wtr.dbg files)
    ./annograph build -o <OUTPREFIX> <FLAGS> <INPUTS>
  2. Compress annotation with Bloom filters
    ./annograph build -i <OUTPREFIX> -o <BLOOMOUTPREFIX> <FLAGS> <INPUTS>
  3. Compress annotation with wavelet tries (if not done in step 1)
    ./annograph build -i <OUTPREFIX> -o <WTROUTPREFIX> --wavelet-trie <FLAGS> <INPUTS>

Example

./annograph build -k 9 -o tiny_example ../tests/data/test_vcfparse.fa

./annograph build -i tiny_example --bloom-false-pos-prob 0.01 -o tiny_example ../tests/data/test_vcfparse.fa
./annograph map -i tiny_example TCGCGCGCTA TCGCGCGCTA TCGCGCGCTC TCGCGCGCTN TCGCGCGCTANA TCGCGCGCTC

./annograph build -i tiny_example --wavelet-trie -o tiny_example ../tests/data/test_vcfparse.fa
./annograph map --wavelet-trie -i tiny_example TCGCGCGCTA TCGCGCGCTA TCGCGCGCTC TCGCGCGCTN TCGCGCGCTANA TCGCGCGCTC

Other use cases

Constructing wavelet trie in blocks (slower, uses less RAM)
./annograph compress -i <OUTPREFIX> -o <WTROUTPREFIX>

Annotation compressor query time
./annograph query -i <OUTPREFIX>

Wavelet trie statistics
./annograph stats -i <OUTPREFIX> --wavelet-trie

Compress wavelet tries with random column permutations
./annograph permutation -i <OUTPREFIX> --num-permutations <NUM_PERMS>

Reproducing results from the paper

The input data for reproducing the results of the experiments in our paper is located here.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].