All Projects → jasperlinthorst → reveal

jasperlinthorst / reveal

Licence: MIT license
Graph based multi genome aligner

Programming Languages

python
139335 projects - #7 most used programming language
C++
36643 projects - #6 most used programming language
c
50402 projects - #5 most used programming language

Projects that are alternatives of or similar to reveal

pblat
parallelized blat with multi-threads support
Stars: ✭ 34 (-12.82%)
Mutual labels:  genome, alignment
StrobeAlign
Aligns short reads using dynamic seed size with strobemers
Stars: ✭ 49 (+25.64%)
Mutual labels:  alignment
unveil-rs
Unveil Rs is a tool to create presentations from markdown inspired by reveal.js, mdbook and zola.
Stars: ✭ 39 (+0%)
Mutual labels:  reveal
dcHiC
dcHiC: Differential compartment analysis for Hi-C datasets
Stars: ✭ 28 (-28.21%)
Mutual labels:  genome
mtcnn tf
MTCNN implement by tensorflow. Easy to training and testing.
Stars: ✭ 41 (+5.13%)
Mutual labels:  alignment
nanoseq
Nanopore demultiplexing, QC and alignment pipeline
Stars: ✭ 82 (+110.26%)
Mutual labels:  alignment
mapping-iterative-assembler
Consensus calling (or "reference assisted assembly"), chiefly of ancient mitochondria
Stars: ✭ 15 (-61.54%)
Mutual labels:  alignment
pufferfish
An efficient index for the colored, compacted, de Bruijn graph
Stars: ✭ 94 (+141.03%)
Mutual labels:  genome
semantic
SuperPhy for the semantic web
Stars: ✭ 17 (-56.41%)
Mutual labels:  genome
SARS-CoV-2-Sequenzdaten aus Deutschland
Das Robert Koch-Institut stellt Systeme zur bundesweiten molekularen Surveillance des SRARS-CoV-2-Virus bereit. Jedes Labor in Deutschland, das SARS-CoV-2 sequenziert, ist laut der Verordnung zur molekulargenetischen Surveillance des Coronavirus SARS-CoV-2 verpflichtet, dem Robert Koch-Institut die Sequenz- und zugehörige Metadaten zu übermittel…
Stars: ✭ 66 (+69.23%)
Mutual labels:  genome
MA
The Modular Aligner and The Modular SV Caller
Stars: ✭ 39 (+0%)
Mutual labels:  alignment
NanoSim
Nanopore sequence read simulator
Stars: ✭ 156 (+300%)
Mutual labels:  genome
alignment-nf
Whole Exome/Whole Genome Sequencing alignment pipeline
Stars: ✭ 19 (-51.28%)
Mutual labels:  alignment
nightlight
Nightlight: Astronomic Image Processing
Stars: ✭ 25 (-35.9%)
Mutual labels:  alignment
RevealLayout
揭示效果布局,可以指定2个子布局,以圆形揭示效果切换选中状态
Stars: ✭ 118 (+202.56%)
Mutual labels:  reveal
SpatialAlignment
Helpful components for aligning and keeping virtual objects aligned with the physical world.
Stars: ✭ 29 (-25.64%)
Mutual labels:  alignment
node-reveal
📦 [npm] Node CLI for reveal.js
Stars: ✭ 21 (-46.15%)
Mutual labels:  reveal
FAIR.m
Flexible Algorithms for Image Registration
Stars: ✭ 103 (+164.1%)
Mutual labels:  alignment
FluentDNA
FluentDNA allows you to browse sequence data of any size using a zooming visualization similar to Google Maps. You can use FluentDNA as a standalone program or as a python module for your own bioinformatics projects.
Stars: ✭ 52 (+33.33%)
Mutual labels:  alignment
SynNet-Pipeline
Workflow for Building Microsynteny Networks
Stars: ✭ 32 (-17.95%)
Mutual labels:  genome

REVEAL

REVEAL (REcursiVe Exact-matching ALigner) can be used to (multi) align whole genomes.

INSTALL

REVEAL is written in Python and C code. To build it, it needs Python version 2.7 and a GCC compiler.

It uses libdivsufsort for suffix array construction and the probcons code for refinement of the graph.

Furthermore it uses the Python packages networkx (version 2), intervaltree, pysam and matplotlib.

A version of REVEAL can be installed through pip:

pip install reveal

Or through cloning this repository on github and executing the following command:

python setup.py test install

To install without executing the unit tests:

python setup.py install

To install in your user directory:

python setup.py install --user

RUN

To validate whether everything is correctly installed you can run a test alignment from the tests directory, e.g. by executing the following command:

reveal align tests/1a.fa tests/1b.fa

This will output a shell script that outlines the typical steps to generate some graphs. If you're not interested in changing any parameters or the intermediate steps, you can immediately execute the script by piping it into your shell:

reveal align tests/1a.fa tests/1b.fa | sh

If everything ran correctly, various gfa files should have been produced. Most likely you will be interested in 'prg.unzipped.realigned.gfa'. This file contains a reference graph in GFA format (see GFA).

By default reveal will try to anchor the alignment by simultaneously aligning all genomes, however, if this is unwanted (due to e.g. memory constraints), you can run the following command to generate a shell script that anchors the alignment in a hierarchically way in batches of for instance 5 genomes at a time:

reveal align tests/1a.fa tests/1b.fa --order=sequential --chunksize=5 | sh

All commands in the shell script that are in between the comment lines can be run in parallel in case you're running on a compute cluster.

There are other subcommands for reveal, for which some are used by the generated shell script:

To generate an anchor graph using the recursive exact matching approach for more than two sequences you can either call:

reveal rem tests/1a.fa tests/1b.fa tests/1c.fa

or by aligning a sequence to an existing gfa graph:

reveal rem 1a_1b.gfa tests/1c.fa

or align two graphs:

reveal rem 1a_1b.gfa 1c_1d.gfa

Important parameters for reveal rem are -m and -n. See subcommand help.

With REVEAL a global alignment between chromosome length assemblies is assumed. To address the issues that follow from draft assemblies, a 'finish' subcommand is supplied that orders and orients contigs/scaffolds with respect to a reference genome and produces pseudo molecules for the draft assembly.

reveal finish reference.fasta draft.fasta

To address large events (like translocations, inversions, but also misassemblies) that prevent a colinear alignment between two genomes, the following command can be used to transform a structurally rearranged (draft) genome such that it conforms to the layout of the reference sequence.

reveal finish --order=chains reference.fasta draft.fasta

Have a look at the various parameters, especially: --mineventsize, --minchainsum and -m.

To obtain a graph-based representation that encodes the original as well as the 'transformed' genome as separate paths through a graph, use:

reveal transform reference.fasta draft.fasta

Note that the resulting graphs may contain cycles, but can still be used in subsequent alignments using REVEAL rem, as only the transformed paths that correspond to the reference layout will be used for segmenting the graph. Paths in the graph prefixed with an asterisk (*) correspond to the original (non-transformed) input genomes, which are ignored by REVEAL during graph traversal and mainly function as a way to record the structural events that are present in the graph.

To extract bubbles (a list of source/sink pairs and nodes within the bubble) from a graph run:

reveal bubbles 1a_1b.gfa

Similar to bubbles, but will print the actual varying sequence.

reveal variants 1a_1b.gfa

To output variants to a vcf file:

reveal variants 1a_1b.gfa --vcf

To output statistics with respect to the number of nodes, bubbles, variants, aligned sequence etc.:

reveal stats 1a_1b.gfa

To realign parts of the graph using a basepair resolution multiple sequence alignment method (instead of MUMs):

reveal refine <graph> <source-node> <sink-node>

To realign all bubbles:

reveal refine <graph> --all

Note that when bubbles are larger than let's say 10000bp, this won't work, so have a look at different filtering options (e.g. --maxsize).

As the boundaries of Maximal Unique Matches are somewhat greedy, more accurate variant calls are obtained by first 'unzipping' bubbles before applying reveal refine. To unzip all bubbles 10bp, run:

reveal unzip <graph> -u10

To construct an interactive (-i, for zooming purposes) mumplot of two fasta files:

reveal plot genome1.fasta genome2.fasta -i

Or to visualise a graph in a mumplot

reveal gplot 1a_1b.gfa -i

NOTE that you need to have matplotlib installed for these commands.

In case you want to inspect the graph with software like cytoscape or gephi, you can produce a graph in gml format by calling reveal as follows:

reveal convert prg.gfa

To extract a genome/path from the graph:

reveal extract <graph> <pathname>

To extract a subgraph of the graph, to for instance inspect a complex bubble structure:

reveal subgraph <graph> <node1> ... <nodeN>

To merge multiple gfa graphs into a single gfa graph, while maintaining node-id space:

reveal merge <graph1> <graph2> ... <graphN>

Or to do the opposite, split a graph by its connected components:

reveal split <graph>

For the rest, most commands should print a help function, when you specify reveal <subcommand> -h

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].