All Projects → SAMtoBAM → MUMandCo

SAMtoBAM / MUMandCo

Licence: GPL-3.0 license
MUM&Co is a simple bash script that uses Whole Genome Alignment information provided by MUMmer (only v4) to detect Structural Variation

Programming Languages

shell
77523 projects

Projects that are alternatives of or similar to MUMandCo

arriba
Fast and accurate gene fusion detection from RNA-Seq data
Stars: ✭ 162 (+350%)
Mutual labels:  structural-variation
dcHiC
dcHiC: Differential compartment analysis for Hi-C datasets
Stars: ✭ 28 (-22.22%)
Mutual labels:  genome
reveal
Graph based multi genome aligner
Stars: ✭ 39 (+8.33%)
Mutual labels:  genome
Circle-Map
A method for circular DNA detection based on probabilistic mapping of ultrashort reads
Stars: ✭ 45 (+25%)
Mutual labels:  structural-variation
MA
The Modular Aligner and The Modular SV Caller
Stars: ✭ 39 (+8.33%)
Mutual labels:  structural-variation
polio
Research on polio / protein folding.
Stars: ✭ 13 (-63.89%)
Mutual labels:  genome
DNA-Sequence-Machine-learning
Understand DNA structure and how machine learning can be used to work with DNA sequence data.
Stars: ✭ 25 (-30.56%)
Mutual labels:  genome
squid
SQUID detects both fusion-gene and non-fusion-gene structural variations from RNA-seq data
Stars: ✭ 37 (+2.78%)
Mutual labels:  structural-variation
SARS-CoV-2-Sequenzdaten aus Deutschland
Das Robert Koch-Institut stellt Systeme zur bundesweiten molekularen Surveillance des SRARS-CoV-2-Virus bereit. Jedes Labor in Deutschland, das SARS-CoV-2 sequenziert, ist laut der Verordnung zur molekulargenetischen Surveillance des Coronavirus SARS-CoV-2 verpflichtet, dem Robert Koch-Institut die Sequenz- und zugehörige Metadaten zu übermittel…
Stars: ✭ 66 (+83.33%)
Mutual labels:  genome
pufferfish
An efficient index for the colored, compacted, de Bruijn graph
Stars: ✭ 94 (+161.11%)
Mutual labels:  genome
pipeline-structural-variation
Pipeline for calling structural variations in whole genomes sequencing Oxford Nanopore data
Stars: ✭ 104 (+188.89%)
Mutual labels:  structural-variation
pyrodigal
Cython bindings and Python interface to Prodigal, an ORF finder for genomes and metagenomes. Now with SIMD!
Stars: ✭ 38 (+5.56%)
Mutual labels:  genome
semantic
SuperPhy for the semantic web
Stars: ✭ 17 (-52.78%)
Mutual labels:  genome
arv
A fast 23andMe DNA parser and inferrer for Python
Stars: ✭ 98 (+172.22%)
Mutual labels:  genome
svtools
Tools for processing and analyzing structural variants.
Stars: ✭ 118 (+227.78%)
Mutual labels:  structural-variation
valr
Genome Interval Arithmetic in R
Stars: ✭ 78 (+116.67%)
Mutual labels:  genome
svict
Structural Variation and fusion detection using targeted sequencing data from circulating cell free DNA
Stars: ✭ 21 (-41.67%)
Mutual labels:  structural-variation
Scaff10X
Pipeline for scaffolding and breaking a genome assembly using 10x genomics linked-reads
Stars: ✭ 21 (-41.67%)
Mutual labels:  genome
sedef
Identification of segmental duplications in the genome
Stars: ✭ 22 (-38.89%)
Mutual labels:  structural-variation
SynNet-Pipeline
Workflow for Building Microsynteny Networks
Stars: ✭ 32 (-11.11%)
Mutual labels:  genome

alt text

MUM&Co is a simple bash script that uses Whole Genome Alignment information provided by MUMmer (v4) to detect variants.
VERSION >= 3 UPDATE
Only uses MUMmer4 now and contains a thread count option
Contains a VCF output file with all calls currently being imprecise
Contains another output file containing the calls alongside the respective DNA impacted
This new step requires samtools installation
Now calls the reverse of tandem duplications, tandem contractions (>50bp)

MUM&Co is able to detect:
Deletions, insertions, tandem duplications and tandem contractions (>=50bp & <=150kb)
Inversions (>=1kb) and translocations (>=10kb)

MUM&Co requires installation of MUMmer4 and samtools.
MUM&Co will look for the MUMmer toolkit's and samtools scripts path using 'which xxxxx'.
An error warning will print and the script will stop if these paths cannot be found
This path can be editted directly in the script if required.

In order to help with downstream analysis:
Renaming and re-orientation of the query genome contigs to correspond to their reference counterparts
Tools such as RaGOO and Ragout can do this alongside scaffolding of contigs (this is not currently recommended for short-read based assemblies)

Options:

     -r or --reference_genome          path to reference genome
     -q or --query_genome              path to query genome
     -g or --genome_size               size of genome
     -o or --output                    output prefix (default: mumandco)
     -t or --threads                   thread number (default: 1)
     -ml or --minlen                   minimum length of alignments (default: 50bp)
     -b or --blast                     adds the blast option to identify is insertions or deletions look repetitive or novel

Test run script:

     bash mumandco_v*.sh -r ./yeast.tidy.fa -q ./yeast_tidy_DEL100.fa -g 12500000 -o DEL100_test -t 2 -b

OUTPUT FOLDER:
Folder with alignments used for SV detection
Txt file with summary of SVs detected
TSV file with all the detected SVs
TSV file with all detected SVs plus the DNA associated with the event (all from reference except insertions)
VCF file with all calls currently being imprecise

TSV NOTES:
The last column in the TSV file contains notes.
'complicated' : multiple calls within the same region; generally overlapping insertions and deletions
'double' : several calls at the same coordinates; generally tandem duplications or contractions with multiple copy changes
']chrX:xxxxxx]' : a VCF inspired notation for the association of the translocation fragments with the other fragments
e.g. for chr1 with its right border at 250000bp assocaited with chr2 at 100000bp;
the note would be as follows for chr 1: ']chr2:100000]' and for chr2 : '[chr1:250000['
As such, each translocation fragment as called as an event, is now a breakend-like call and will be duplicated if both borders are involved in translocations

VCF TRA EVENT:
The later notation for the TSV file is currently being added to the alt column in the VCF for 'TRA' events.
Currently it is not a called a breakend site (contains no nucleotide at edge) but can be interpreted similarly

Note:
MUMmer4 is now required due to the hard wired thread option not available during alignment with MUMmer3
The blast option (-b /--blast) using BLAST to search for insertion and deletion events in the reference/query in order to label them as either mobile or novel events.

Reference:
Samuel O’Donnell, Gilles Fischer, MUM&Co: accurate detection of all SV types through whole-genome alignment, Bioinformatics, Volume 36, Issue 10, 15 May 2020, Pages 3242–3243, https://doi.org/10.1093/bioinformatics/btaa115

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].