Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

SIMD-accelerated library for computing global and X-drop affine gap penalty sequence-to-sequence or sequence-to-profile alignments using an adaptive block-based algorithm.

Stars: ✭ 58 (-85.78%)

Mutual labels: alignment

Facenet-Caffe

facenet recognition and retrieve by using hnswlib and flask, convert tensorflow model to caffe

Stars: ✭ 30 (-92.65%)

Mutual labels: alignment

PHAT

Pathogen-Host Analysis Tool - A modern Next-Generation Sequencing (NGS) analysis platform

Stars: ✭ 17 (-95.83%)

Mutual labels: alignment

Seqan

SeqAn's official repository.

Stars: ✭ 386 (-5.39%)

Mutual labels: alignment

ctable

C library to print nicely formatted tables

Stars: ✭ 13 (-96.81%)

Mutual labels: alignment

Deca

DECA: Detailed Expression Capture and Animation

Stars: ✭ 292 (-28.43%)

Mutual labels: alignment

pblat

parallelized blat with multi-threads support

Stars: ✭ 34 (-91.67%)

Mutual labels: alignment

astroalign

A tool to align astronomical images based on asterism matching

Stars: ✭ 102 (-75%)

Mutual labels: alignment

Face Everthing

face detection alignment recognition reconstruction ...

Stars: ✭ 257 (-37.01%)

Mutual labels: alignment

figpatch

Easily Arrange Images with Patchwork Alongside ggplot2 Figures.

Stars: ✭ 46 (-88.73%)

Mutual labels: alignment

Robust point cloud registration

Robust Point Cloud Registration Using Iterative Probabilistic Data Associations ("Robust ICP")

Stars: ✭ 350 (-14.22%)

Mutual labels: alignment

RTIconButton

A Interface Builder configurable UIButton with a image icon

Stars: ✭ 40 (-90.2%)

Mutual labels: alignment

bs3

BS-Seeker3: An Ultra-fast, Versatile Pipeline for Mapping Bisulfite-treated Reads.

Stars: ✭ 20 (-95.1%)

Mutual labels: alignment

Cmu Multimodalsdk

CMU MultimodalSDK is a machine learning platform for development of advanced multimodal models as well as easily accessing and processing multimodal datasets.

Stars: ✭ 388 (-4.9%)

Mutual labels: alignment

Mesh mesh align plus

Precisely align, move, and measure+match objects and mesh parts in your 3D scenes.

Stars: ✭ 350 (-14.22%)

Mutual labels: alignment

Realsr

Toward Real-World Single Image Super-Resolution: A New Benchmark and A New Model (ICCV 2019)

Stars: ✭ 282 (-30.88%)

Mutual labels: alignment

View All Similar Projects ➔

Getting Started

git clone https://github.com/ruanjue/wtdbg2
cd wtdbg2 && make
#quick start with wtdbg2.pl
./wtdbg2.pl -t 16 -x rs -g 4.6m -o dbg reads.fa.gz

# Step by step commandlines
# assemble long reads
./wtdbg2 -x rs -g 4.6m -i reads.fa.gz -t 16 -fo dbg

# derive consensus
./wtpoa-cns -t 16 -i dbg.ctg.lay.gz -fo dbg.raw.fa

# polish consensus, not necessary if you want to polish the assemblies using other tools
minimap2 -t16 -ax map-pb -r2k dbg.raw.fa reads.fa.gz | samtools sort [email protected] >dbg.bam
samtools view -F0x900 dbg.bam | ./wtpoa-cns -t 16 -d dbg.raw.fa -i - -fo dbg.cns.fa

# Addtional polishment using short reads
bwa index dbg.cns.fa
bwa mem -t 16 dbg.cns.fa sr.1.fa sr.2.fa | samtools sort -O SAM | ./wtpoa-cns -t 16 -x sam-sr -d dbg.cns.fa -i - -fo dbg.srp.fa

Introduction

Wtdbg2 is a de novo sequence assembler for long noisy reads produced by PacBio or Oxford Nanopore Technologies (ONT). It assembles raw reads without error correction and then builds the consensus from intermediate assembly output. Wtdbg2 is able to assemble the human and even the 32Gb Axolotl genome at a speed tens of times faster than CANU and FALCON while producing contigs of comparable base accuracy.

During assembly, wtdbg2 chops reads into 1024bp segments, merges similar segments into a vertex and connects vertices based on the segment adjacency on reads. The resulting graph is called fuzzy Bruijn graph (FBG). It is akin to De Bruijn graph but permits mismatches/gaps and keeps read paths when collapsing k-mers. The use of FBG distinguishes wtdbg2 from the majority of long-read assemblers.

Installation

Wtdbg2 only works on 64-bit Linux. To compile, please type make in the source code directory. You can then copy wtdbg2 and wtpoa-cns to your PATH.

Wtdbg2 also comes with an approxmimate read mapper kbm, a faster but less accurate consesus tool wtdbg-cns and many auxiliary scripts in the scripts directory.

Usage

Wtdbg2 has two key components: an assembler wtdbg2 and a consenser wtpoa-cns. Executable wtdbg2 assembles raw reads and generates the contig layout and edge sequences in a file "prefix.ctg.lay.gz". Executable wtpoa-cns takes this file as input and produces the final consensus in FASTA. A typical workflow looks like this:

./wtdbg2 -x rs -g 4.6m -t 16 -i reads.fa.gz -fo prefix
./wtpoa-cns -t 16 -i prefix.ctg.lay.gz -fo prefix.ctg.fa

where -g is the estimated genome size and -x specifies the sequencing technology, which could take value "rs" for PacBio RSII, "sq" for PacBio Sequel, "ccs" for PacBio CCS reads and "ont" for Oxford Nanopore. This option sets multiple parameters and should be applied before other parameters. When you are unable to get a good assembly, you may need to tune other parameters as follows.

Wtdbg2 combines normal k-mers and homopolymer-compressed (HPC) k-mers to find read overlaps. Option -k specifies the length of normal k-mers, while -p specifies the length of HPC k-mers. By default, wtdbg2 samples a fourth of all k-mers by their hashcodes. For data of relatively low coverage, you may increase this sampling rate by reducing -S. This will greatly increase the peak memory as a cost, though. Option -e, which defaults to 3, specifies the minimum read coverage of an edge in the assembly graph. You may adjust this option according to the overall sequencing depth, too. Option -A also helps relatively low coverage data at the cost of performance. For PacBio data, -L5000 often leads to better assemblies emperically, so is recommended. Please run wtdbg2 --help for a complete list of available options or consult README-ori.md for more help.

The following table shows various command lines and their resource usage for the assembly step:

Dataset	GSize	Cov	Asm options	CPU asm	CPU cns	Real tot	RAM
E. coli	4.6Mb	PB x20	-x rs -g4.6m -t16	53s	8m54s	42s	1.0G
C. elegans	100Mb	PB x80	-x rs -g100m -t32	1h07m	5h06m	13m42s	11.6G
D. melanogaster A4	144m	PB x120	-x rs -g144m -t32	2h06m	5h11m	26m17s	19.4G
D. melanogaster ISO1	144m	ONT x32	-xont -g144m -t32	5h12m	4h30m	25m59s	17.3G
A. thaliana	125Mb	PB x75	-x sq -g125m -t32	11h26m	4h57m	49m35s	25.7G
Human NA12878	3Gb	ONT x36	-x ont -g3g -t31	793h11m	97h46m	31h03m	221.8G
Human NA19240	3Gb	ONT x35	-x ont -g3g -t31	935h31m	89h17m	35h20m	215.0G
Human HG00733	3Gb	PB x93	-x sq -g3g -t47	2114h26m	152h24m	52h22m	338.1G
Human NA24385	3Gb	CCS x28	-x ccs -g3g -t31	231h25m	58h48m	10h14m	112.9G
Human CHM1	3Gb	PB x60	-x rs -g3g -t96	105h33m	139h24m	5h17m	225.1G
Axolotl	32Gb	PB x32	-x rs -g32g -t96	2806h40m	1456h13m	110h16m	1788.1G

The timing was obtained on three local servers with different hardware configurations. There are also run-to-run fluctuations. Exact timing on your machines may differ. The assembled contigs can be found at the following FTP:

ftp://ftp.dfci.harvard.edu/pub/hli/wtdbg/

Limitations

For Nanopore data, wtdbg2 may produce an assembly smaller than the true genome.
When inputing multiple files of both fasta and fastq format, please put fastq first, then fasta. Otherwise, program cannot find '>' in fastq, and append all fastq in one read.

Citing wtdbg2

If you use wtdbg2, please cite:

Ruan, J. and Li, H. (2019) Fast and accurate long-read assembly with wtdbg2. Nat Methods doi:10.1038/s41592-019-0669-3

Ruan, J. and Li, H. (2019) Fast and accurate long-read assembly with wtdbg2. bioRxiv. doi:10.1101/530972

Getting Help

Please use the GitHub's Issues page if you have questions. You may also directly contact Jue Ruan at [email protected].

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 408

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (15) 🔗