All Projects → malonge → Ragoo

malonge / Ragoo

Licence: mit
Fast Reference-Guided Scaffolding of Genome Assembly Contigs. RagTag, the successor to RaGOO, is now available here: https://github.com/malonge/RagTag

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Ragoo

Hts Nim
nim wrapper for htslib for parsing genomics data files
Stars: ✭ 132 (-16.46%)
Mutual labels:  bioinformatics
Parallel Fastq Dump
parallel fastq-dump wrapper
Stars: ✭ 141 (-10.76%)
Mutual labels:  bioinformatics
Soapdenovo2
Next generation sequencing reads de novo assembler.
Stars: ✭ 150 (-5.06%)
Mutual labels:  bioinformatics
Mrbayes
MrBayes is a program for Bayesian inference and model choice across a wide range of phylogenetic and evolutionary models. For documentation and downloading the program, please see the home page:
Stars: ✭ 131 (-17.09%)
Mutual labels:  bioinformatics
Artemis
Artemis is a free genome viewer and annotation tool that allows visualization of sequence features and the results of analyses within the context of the sequence, and its six-frame translation
Stars: ✭ 135 (-14.56%)
Mutual labels:  bioinformatics
Pysradb
Package for fetching metadata and downloading data from SRA/ENA/GEO
Stars: ✭ 146 (-7.59%)
Mutual labels:  bioinformatics
Splatter
Simple simulation of single-cell RNA sequencing data
Stars: ✭ 128 (-18.99%)
Mutual labels:  bioinformatics
Bioc Refcard
Bioconductor cheat sheet
Stars: ✭ 152 (-3.8%)
Mutual labels:  bioinformatics
Hgvs
Python library to parse, format, validate, normalize, and map sequence variants. `pip install hgvs`
Stars: ✭ 138 (-12.66%)
Mutual labels:  bioinformatics
Mixcr
MiXCR is a universal software for fast and accurate extraction of T- and B- cell receptor repertoires from any type of sequencing data. Free for academic use only.
Stars: ✭ 148 (-6.33%)
Mutual labels:  bioinformatics
Octopus
Bayesian haplotype-based mutation calling
Stars: ✭ 131 (-17.09%)
Mutual labels:  bioinformatics
Hifiasm
Hifiasm: a haplotype-resolved assembler for accurate Hifi reads
Stars: ✭ 134 (-15.19%)
Mutual labels:  bioinformatics
Kaiju
Fast taxonomic classification of metagenomic sequencing reads using a protein reference database
Stars: ✭ 146 (-7.59%)
Mutual labels:  bioinformatics
Biotite
A comprehensive library for computational molecular biology
Stars: ✭ 132 (-16.46%)
Mutual labels:  bioinformatics
Biograkn
BioGrakn Knowledge Graph
Stars: ✭ 152 (-3.8%)
Mutual labels:  bioinformatics
Readfq
Fast multi-line FASTA/Q reader in several programming languages
Stars: ✭ 128 (-18.99%)
Mutual labels:  bioinformatics
Awesome Bioinformatics Benchmarks
A curated list of bioinformatics bench-marking papers and resources.
Stars: ✭ 142 (-10.13%)
Mutual labels:  bioinformatics
Snpr
The sources of the openSNP website
Stars: ✭ 155 (-1.9%)
Mutual labels:  bioinformatics
Clairvoyante
Clairvoyante: a multi-task convolutional deep neural network for variant calling in Single Molecule Sequencing
Stars: ✭ 151 (-4.43%)
Mutual labels:  bioinformatics
Scde
R package for analyzing single-cell RNA-seq data
Stars: ✭ 147 (-6.96%)
Mutual labels:  bioinformatics

RaGOO

DOI

A tool to order and orient genome assembly contigs via Minimap2 alignments to a reference genome.

Announcements

RagTag, RaGOO's successor, is now available here! Please transition your work here, if possible, as RaGOO will eventually no longer be supported.

Description

Alonge, Michael, et al. "RaGOO: fast and accurate reference-guided scaffolding of draft genomes." Genome biology 20.1 (2019): 1-17.

RaGOO is a tool for coalescing genome assembly contigs into pseudochromosomes via minimap2 alignments to a closely related reference genome. The focus of this tool is on practicality and therefore has the following features:

  1. Good performance. On a MacBook Pro using Arabidopsis data, pseudochromosome construction takes less than a minute and the whole pipeline with SV calling takes ~2 minutes.
  2. Intact ordering and orienting of contigs.
  3. Misassembly correction
  4. GFF lift-over
  5. Structural variant calling with and integrated version of Assemblytics
  6. Confidence scores associated with the grouping, localization, and orientation for each contig.

Installation

Dependencies

RaGOO should install on OSX and most standard flavors of Linux. RaGOO depends on Python3 as well as the following packages:

  1. intervaltree
  2. numpy
  3. Minimap2

The first two packages will be installed automatically when installing RaGOO. Minimap2 is straightforward to install following the instructions on its website. Place the minimap2 executable in your path, or specify its location with the -m parameter (see below).

Installation

Currently, the only way to install RaGOO is from source. Set up a virtualenv if desired, just be sure to make a python3 environment. Then, enter the following command to install RaGOO:

$ python setup.py install

Usage

usage: ragoo.py [-h] [-e <exclude.txt>] [-gff <annotations.gff>] [-m PATH]
                [-b] [-R <reads.fasta>] [-T sr] [-t 3] [-g 100] [-s] [-i 0.2]
                [-j <skip.txt>] [-C]
                <contigs.fasta> <reference.fasta>

order and orient contigs according to minimap2 alignments to a reference
(v1.1)

positional arguments:
  <contigs.fasta>       fasta file with contigs to be ordered and oriented
  <reference.fasta>     reference fasta file

optional arguments:
  -h, --help            show this help message and exit
  -e <exclude.txt>      single column text file of reference headers to ignore
  -gff <annotations.gff>
                        lift-over gff features to chimera-broken contigs
  -m PATH               path to minimap2 executable
  -b                    Break chimeric contigs
  -R <reads.fasta>      Turns on misassembly correction. Align provided reads
                        to the contigs to aid misassembly correction. fastq or
                        fasta allowed. Gzipped files allowed. Turns off '-b'.
  -T sr                 Type of reads provided by '-R'. 'sr' and 'corr'
                        accepted for short reads and error corrected long
                        reads respectively.
  -t 3                  Number of threads when running minimap.
  -g 100                Gap size for padding in pseudomolecules.
  -s                    Call structural variants
  -i 0.2                Minimum grouping confidence score needed to be
                        localized.
  -j <skip.txt>         List of contigs to automatically put in chr0.
  -C                    Write unplaced contigs individually instead of making a chr0

RaGOO will try to be smart and not redo intermediate analysis already done in previous executions of the pipeline. For example, if the Minimap2 alignment files are already present from previous runs, RaGOO will not recreate them. However, RaGOO is not that smart, so be sure to remove any files that you want to replace. To be safe, one can just remove the entire output directory if a new analysis is desired (see "Output Files" below).

Example Run

Both the assembly and the reference must be in the current workding directory, so please either copy them or create a symbolic link. For example:

$ cd /path/to/current/working/directory
$ ln -s /path/to/contigs.fasta
$ ln -s /path/to/reference.fasta
$ ragoo.py contigs.fasta reference.fasta

Output Files

All of the output will be in the "ragoo_output" directory. If breaking chimeric contigs and calling SVs, the contents of this output directory is as follows:

ragoo_output/
├── ctg_alignments
├── groupings
├── orderings
├── pm_alns
└── ragoo.fasta

ragoo.fasta

The final pseudomolecules. Any unlocalized contigs are concatenated and placed in "Chr0_RaGOO".

chimera_break

This directory contains the results from chimeric contig breaking. The most notable file here is the [prefix].intra.chimera.broken.fa, as this is the final corrected assembly used for downstream scaffolding. All of the downstream information, such as confidence scores, refers to this assembly, not the orignal assembly.

groupings

There is one file per chromosome listing the contigs assigned to that chromosome and their grouping confidence score. Please note that these contigs are not ordered. Also note that if chimeras were corrected, the headers in these files refer to the broken assembly in "chimera_break", and not the original assembly.

orderings

There is one file per chromosome showing the ordering, orientation (second column), location confidence scores (third column), and orientation confidence scores (fourth column).

pm_alignments

This directory contains all of the structural variant calling results. The final structural variants can be found in assemblytics_out.Assemblytics_structural_variants.bed. This bed file can be converted to VCF using SURVIVOR, though the last two columns (overlap with gaps) must be removed first. The alignment used to generate these variant calls are also present in this directory in SAM and delta format (pm_contigs_against_ref.sam and pm_contigs_against_ref.sam.delta), and can be used as input for external tools.

ctg_alignments

Contains the results from misassembly correction. It will contain the corrected contigs in fasta format, as well as an updated gff file if provided.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].