All Projects â†’ morispi â†’ CONSENT

morispi / CONSENT

Licence: AGPL-3.0 license
Scalable long read self-correction and assembly polishing with multiple sequence alignment

Programming Languages

C++
36643 projects - #6 most used programming language
shell
77523 projects
Makefile
30231 projects
Dockerfile
14818 projects

Projects that are alternatives of or similar to CONSENT

SVCollector
Method to optimally select samples for validation and resequencing
Stars: ✭ 20 (-57.45%)
Mutual labels:  ngs, long-reads
msabrowser
đŸ§Ŧ MSABrowser: dynamic and fast visualization of sequence alignments, variations, and annotations
Stars: ✭ 13 (-72.34%)
Mutual labels:  msa, multiple-sequence-alignment
TideHunter
TideHunter: efficient and sensitive tandem repeat detection from noisy long reads using seed-and-chain
Stars: ✭ 15 (-68.09%)
Mutual labels:  long-reads, multiple-sequence-alignment
react-msa-viewer
React rerelease of MSAViewer
Stars: ✭ 15 (-68.09%)
Mutual labels:  msa, multiple-sequence-alignment
gencore
Generate duplex/single consensus reads to reduce sequencing noises and remove duplications
Stars: ✭ 91 (+93.62%)
Mutual labels:  ngs, consensus
platon
Identification & characterization of bacterial plasmid-borne contigs from short-read draft assemblies.
Stars: ✭ 52 (+10.64%)
Mutual labels:  ngs, contigs
TideHunter
TideHunter: efficient and sensitive tandem repeat detection from noisy long reads using seed-and-chain
Stars: ✭ 21 (-55.32%)
Mutual labels:  long-reads, multiple-sequence-alignment
IARC-nf
List of IARC bioinformatics nextflow pipelines
Stars: ✭ 34 (-27.66%)
Mutual labels:  ngs
readfq
A simple tool to calculate reads number and total base count in FASTQ file
Stars: ✭ 19 (-59.57%)
Mutual labels:  ngs
ilus
A handy variant calling pipeline generator for whole genome re-sequencing (WGS) and whole exom sequencing data (WES) analysis. 一ä¸ĒįŽ€æ˜“且全éĸįš„ WGS/WES 分析æĩį¨‹į”Ÿæˆå™¨.
Stars: ✭ 64 (+36.17%)
Mutual labels:  ngs
fastq-and-furious
Efficient handling of FASTQ files from Python
Stars: ✭ 49 (+4.26%)
Mutual labels:  ngs
pufferfish
An efficient index for the colored, compacted, de Bruijn graph
Stars: ✭ 94 (+100%)
Mutual labels:  contigs
numerifides
A proposal for a system of decentralized trust, built on an open, public blockchain.
Stars: ✭ 14 (-70.21%)
Mutual labels:  consensus
nimbus
Upgradeable consensus framework for Substrate blockchains and parachains
Stars: ✭ 31 (-34.04%)
Mutual labels:  consensus
rctl
A set of command line tools based on R and JavaScript.
Stars: ✭ 15 (-68.09%)
Mutual labels:  ngs
cbc-casper-js
JS implementation of Vlad Zamfir's CBC Casper TFG
Stars: ✭ 21 (-55.32%)
Mutual labels:  consensus
MTBseq source
MTBseq is an automated pipeline for mapping, variant calling and detection of resistance mediating and phylogenetic variants from illumina whole genome sequence data of Mycobacterium tuberculosis complex isolates.
Stars: ✭ 26 (-44.68%)
Mutual labels:  ngs
NeuroSEED
Implementation of Neural Distance Embeddings for Biological Sequences (NeuroSEED) in PyTorch (NeurIPS 2021)
Stars: ✭ 40 (-14.89%)
Mutual labels:  multiple-sequence-alignment
IsoQuant
Reference-based transcript discovery from long RNA read
Stars: ✭ 26 (-44.68%)
Mutual labels:  ngs
dentist
Close assembly gaps using long-reads at high accuracy.
Stars: ✭ 39 (-17.02%)
Mutual labels:  long-reads

CONSENT

CONSENT (Scalable long read self-correction and assembly polishing with multiple sequence alignment) is a self-correction method for long reads. It works by, first, computing overlaps between the long reads, in order to define an alignment pile (i.e. a set of overlapping reads used for correction) for each read. Each read's alignment pile is then further divided into smaller windows, that are corrected idependently. First, a multiple alignment strategy is used in order to compute consensus. Then, this consensus is further polished with a local de Bruijn graph, in order to get rid of the remaining errors. Additionally to error correction, CONSENT can also perform assembly polishing.

Requirements

  • A Linux based operating system.
  • Python3.
  • g++, minimum version 5.5.0.
  • CMake, minimum version 2.8.2.
  • Minimap2 available through you path.

Installation

Clone the CONSENT repository, along with its submodules with:

git clone --recursive https://github.com/morispi/CONSENT

Then run the install.sh script:

./install.sh

If you do not already have minimap2 available through your path, you can then run:

export PATH=$PWD/minimap2:$PATH

CONSENT should then be able to run.

Getting started

An example dataset (10x of simulated PacBio reads, raw assembly, and reference genome) is provided in the example folder.

Please run the following commands to try out CONSENT on this example.

Self-correction

To perform self-correction on the example dataset, run the following command:

./CONSENT-correct --in example/reads.fasta --out example/correctedReads.fasta --type PB

This should take about 2 min and use up to 750 MB of RAM, using 4 cores.

Polishing

To perform assembly polishing on the example dataset, run the following command:

./CONSENT-polish --contigs example/rawAssembly.fasta --reads example/reads.fasta --out example/polishedAssembly.fasta

This should take about 15 sec and use at most 150 MB of RAM, using 4 cores.

Running CONSENT

Self-correction

To run CONSENT for long reads self-correction, run the following command:

./CONSENT-correct --in longReads.fast[a|q] --out result.fasta --type readsTechnology

  • longReads.fast[a|q]: fasta or fastq file of long reads to .
  • result.fasta: fasta file where to output the corrected long reads.
  • readsTechnology: Indicate whether the long reads are from PacBio (--type PB) or Oxford Nanopore (--type ONT)

Polishing

To run CONSENT for assembly polishing, run the followning command:

./CONSENT-polish --contigs contigs.fast[a|q] --reads longReads.fast[a|q] --out result.fasta

  • contigs.fast[a|q]: fasta or fastq file of contigs to polish.
  • longReads.fast[a|q]: fasta or fastq file of long reads to use for polishing.
  • result.fasta: fasta file where to output the polished contigs.

Options

  --windowSize INT, -l INT:      Size of the windows to process. (default: 500)
  --minSupport INT, -s INT:      Minimum support to consider a window for correction. (default: 4)
  --maxSupport INT, -S INT:      Maximum number of overlaps to include in a pile. (default: 150)
  --maxMSA INT, -M:              Maximum number of sequences to include into the MSA. (default: 150)
  --merSize INT, -k INT:         k-mer size for chaining and polishing. (default: 9)
  --solid INT, -f INT:           Minimum number of occurrences to consider a k-mer as solid during polishing. (default: 4)
  --anchorSupport INT, -c INT:   Minimum number of sequences supporting (Ai) - (Ai+1) to keep the two anchors in the chaining. (default: 8)
  --minAnchors INT, -a INT:      Minimum number of anchors in a window to allow consensus computation. (default: 2)
  --windowOverlap INT, -o INT:   Overlap size between consecutive windows. (default: 50)
  --nproc INT, -j INT:           Number of processes to run in parallel (default: number of cores).
  --minimapIndex INT, -m INT:    Split minimap2 index every INT input bases (default: 500M).
  --tmpdir STRING, -t STRING:    Path where to store the temporary overlaps file (default: working directory, as Alignments_dateTimeStamp.paf).
  --help, -h:                    Print this help message.

Notes

CONSENT has been developed and tested on x86-64 GNU/Linux.
Support for any other platform has not been tested.

Authors

Pierre Morisse, Camille Marchet, Antoine Limasset, Arnaud Lefebvre and Thierry Lecroq.

Reference

Morisse, P., Marchet, C., Limasset, A. et al. Scalable long read self-correction and assembly polishing with multiple sequence alignment. Sci Rep 11, 761 (2021). https://doi.org/10.1038/s41598-020-80757-5

Contact

You can report problems and bugs to pierre[dot]morisse[at]inria[dot]fr

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].