Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → GATB → Bcalm

GATB / Bcalm

Licence: mit

compacted de Bruijn graph construction in low memory

Programming Languages

python

139335 projects - #7 most used programming language

Labels

graph bioinformatics

Projects that are alternatives of or similar to Bcalm

Bio4j

Bio4j abstract model and general entry point to the project

Stars: ✭ 113 (+63.77%)

Mutual labels: graph, bioinformatics

Arcs

🌈Scaffold genome sequence assemblies using linked read sequencing data

Stars: ✭ 67 (-2.9%)

Mutual labels: graph, bioinformatics

Qiime16stutorial

A tutorial on methods of 16S analysis with QIIME 1

Stars: ✭ 59 (-14.49%)

Mutual labels: bioinformatics

Gubbins

Rapid phylogenetic analysis of large samples of recombinant bacterial whole genome sequences using Gubbins

Stars: ✭ 67 (-2.9%)

Mutual labels: bioinformatics

Tridnr

Tri-Party Deep Network Representation

Stars: ✭ 65 (-5.8%)

Mutual labels: graph

Embedded gcnn

Embedded Graph Convolutional Neural Networks (EGCNN) in TensorFlow

Stars: ✭ 60 (-13.04%)

Mutual labels: graph

Daggraph

Dagger dependency graph generator for Android Developers

Stars: ✭ 1,140 (+1552.17%)

Mutual labels: graph

Calendar Graph

Calendar graph like github using jsx support SVG, Canvas and SSR

Stars: ✭ 58 (-15.94%)

Mutual labels: graph

Charger

Characterization of Germline variants

Stars: ✭ 69 (+0%)

Mutual labels: bioinformatics

Penman

PENMAN notation (e.g. AMR) in Python

Stars: ✭ 63 (-8.7%)

Mutual labels: graph

Node Audio

Graph-based audio api for Node.js based on LabSound and JUCE

Stars: ✭ 67 (-2.9%)

Mutual labels: graph

Terpene Profile Parser For Cannabis Strains

Parser and database to index the terpene profile of different strains of Cannabis from online databases

Stars: ✭ 63 (-8.7%)

Mutual labels: bioinformatics

Asciichart

Nice-looking lightweight console ASCII line charts ╭┈╯ for NodeJS, browsers and terminal, no dependencies

Stars: ✭ 1,107 (+1504.35%)

Mutual labels: graph

Movies Java Bolt

Neo4j Movies Example application with SparkJava backend using the neo4j-java-driver

Stars: ✭ 66 (-4.35%)

Mutual labels: graph

Lambda

LAMBDA – the Local Aligner for Massive Biological DatA

Stars: ✭ 59 (-14.49%)

Mutual labels: bioinformatics

Dna Nn

Model and predict short DNA sequence features with neural networks

Stars: ✭ 59 (-14.49%)

Mutual labels: bioinformatics

Dmgi

Unsupervised Attributed Multiplex Network Embedding (AAAI 2020)

Stars: ✭ 62 (-10.14%)

Mutual labels: graph

Gramtools

Genome inference from a population reference graph

Stars: ✭ 65 (-5.8%)

Mutual labels: bioinformatics

Whom I Know

Looks for common users of vk.com [DEPRECATED]

Stars: ✭ 69 (+0%)

Mutual labels: graph

Parmat

Multi-threaded Large-Scale RMAT Graph Generator.

Stars: ✭ 68 (-1.45%)

Mutual labels: graph

View All Similar Projects ➔

BCALM 2

BCALM 2 is a bioinformatics tool for constructing the compacted de Bruijn graph from sequencing data.

This repository is the new, parallel version of the BCALM software. It is using a new algorithm, and is implemented using the GATB library. The original, single-threaded code of BCALM (version 1) is still available at: https://github.com/Malfoy/bcalm

Usage

Read the instructions below to compile, then:

./bcalm -in [reads.fa] -kmer-size [kmer_size] -abundance-min [abundance_threshold]

e.g.

./bcalm -in reads.fastq -kmer-size 21 -abundance-min 2

Importants parameters are:

-kmer-size [int]

The k-mer size, i.e. length of the nodes of the de Bruijn graph.

-abundance-min [int]

Sets a threshold X below which k-mers that are seen (strictly) less than X times in the dataset are filtered out; i.e. sequencing errors, typically.

Pre-requisites:

GCC >= 4.8 or a very recent C++11 capable compiler

Installation

Download the latest Linux/MacOS binaries, or compile from source as follows:

git clone --recursive https://github.com/GATB/bcalm 
cd bcalm
mkdir build;  cd build;  cmake ..;  make -j 8

You can also install bcalm from bioconda with conda:

conda install -c conda-forge -c bioconda bcalm

Input formats

File input format can be fasta, fastq, either gzipped or not. BCALM 2 does not care about paired-end information, all given reads contribute to k-mers in the graph (as long as such k-mers pass the abundance threshold).

To pass several files as input:

ls -1 *.fastq > list_reads
./bcalm -in list_reads [..]

Output

BCALM 2 outputs the set of unitigs of the de Bruijn graph. A unitig is the sequence of a non-branching path. Unitigs that are connected by an edge in the graph overlap by exactly (k-1) nucleotides. For a formal description of what BCALM2 outputs, see here

We have two output formats: FASTA and GFA.

GFA output: use scripts/convertToGFA.py to convert the output of BCALM 2 to GFA (contributed by Mayank Pahadia).

FASTA output header:

><id> LN:i:<length> KC:i:<abundance> km:f:<abundance> L:<+/->:<other id>:<+/-> [..]

Where:

LN field is the length of the unitig
KC and km fields are for total abundance and mean abundance of kmers inside the unitig, respectively.
Edges between unitigs are reported as L❌y:z entries in the FASTA header (1 entry per edge). A classic forward-forward outcoming edge is labeled L:+:[next node]:+. A forward-reverse, L:+:[next node]:-. Incoming edges are encoded as outcoming edges of the reverse-complement node. E.g. L:-:[previous node]:+ means that if you reverse-complemented the current node, then there would be an edge from the last k-mer of current node to the first k-mer of the forward strand of [previous node].

Reverse-complements and double-strandedness

BCALM 2 converts all k-mers into their canonical representation with respect to reverse-complements. In other words, a k-mer and its reverse complement are considered to be the same object, appearing only once in the output, either in forward or reverse orientation.

Note: in the output of BCALM 2, each unitig may be either be returned in forward or reverse orientation, with no guarantee that the orientation will stay the same across identical runs of the software.

For a formal description of how BCALM2 handles double-strandedness of DNA, see here

Larger k values

BCALM 2 supports arbitrary large k-mer lengths. You need to recompile it from sources. For k up to, say, 320, type this in the build folder:

rm -Rf CMake* && cmake -DKSIZE_LIST="32 64 96 128 160 192 224 256 320" .. && make -j 8

For compilation, list of kmers should only contain multiples of 32. Also, for technical reason, keep 32 in the list. Of course, for higher k's, BCALM will run slower. Intermediate values create optimized code for smaller $k$'s. You could specify just KSIZE_LIST="32 320" but then using k values above would 32 be as slow as if k was equal to 320.

After that, BCALM 2 can be run with any k value up to the largest one specified during compilation.

Intermediate files

BCALM 2 produces some intermediate files: a .h5 file (or a _gatb/ folder), which contain the k-mer counts. The "*glue*" files contain compacted sequences that needs to be glued together (see BCALM 2 paper). Those files can be safely deleted after an execution, as the actual output is just the FASTA file containing the unitigs.

Acknowledgements

If using BCALM 2, please cite: Rayan Chikhi, Antoine Limasset and Paul Medvedev, Compacting de Bruijn graphs from sequencing data quickly and in low memory, Proceedings of ISMB 2016, Bioinformatics, 32 (12): i201-i208. (Bibtex)

This project has been supported in part by NSF awards DBI-1356529, CCF-1439057, IIS-1453527, and IIS-1421908.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 69

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (13) 🔗