All Projects → chanzuckerberg → Shasta

chanzuckerberg / Shasta

Licence: other
De novo assembly from Oxford Nanopore reads.

Programming Languages

assembly
5116 projects

Projects that are alternatives of or similar to Shasta

Galaxy
Data intensive science for everyone.
Stars: ✭ 812 (+331.91%)
Mutual labels:  sequencing, dna
Sns
Analysis pipelines for sequencing data
Stars: ✭ 43 (-77.13%)
Mutual labels:  sequencing, dna
Ugene
UGENE is free open-source cross-platform bioinformatics software
Stars: ✭ 112 (-40.43%)
Mutual labels:  sequencing, dna
Deepvariant
DeepVariant is an analysis pipeline that uses a deep neural network to call genetic variants from next-generation DNA sequencing data.
Stars: ✭ 2,404 (+1178.72%)
Mutual labels:  sequencing, dna
FluentDNA
FluentDNA allows you to browse sequence data of any size using a zooming visualization similar to Google Maps. You can use FluentDNA as a standalone program or as a python module for your own bioinformatics projects.
Stars: ✭ 52 (-72.34%)
Mutual labels:  sequencing, dna
Htsjdk
A Java API for high-throughput sequencing data (HTS) formats.
Stars: ✭ 220 (+17.02%)
Mutual labels:  sequencing, dna
Gatk
Official code repository for GATK versions 4 and up
Stars: ✭ 1,002 (+432.98%)
Mutual labels:  sequencing, dna
catch
A package for designing compact and comprehensive capture probe sets.
Stars: ✭ 55 (-70.74%)
Mutual labels:  sequencing, dna
Genomics
A collection of scripts and notes related to genomics and bioinformatics
Stars: ✭ 101 (-46.28%)
Mutual labels:  sequencing, dna
Gubbins
Rapid phylogenetic analysis of large samples of recombinant bacterial whole genome sequences using Gubbins
Stars: ✭ 67 (-64.36%)
Mutual labels:  sequencing
Afterqc
Automatic Filtering, Trimming, Error Removing and Quality Control for fastq data
Stars: ✭ 169 (-10.11%)
Mutual labels:  sequencing
Abra2
ABRA2
Stars: ✭ 65 (-65.43%)
Mutual labels:  dna
Mirdeep2
Discovering known and novel miRNAs from small RNA sequencing data
Stars: ✭ 70 (-62.77%)
Mutual labels:  sequencing
Pyrosetta.notebooks
Jupyter Notebooks for learning the PyRosetta platform for biomolecular structure prediction and design
Stars: ✭ 116 (-38.3%)
Mutual labels:  dna
Migmap
HTS-compatible wrapper for IgBlast V-(D)-J mapping tool
Stars: ✭ 38 (-79.79%)
Mutual labels:  sequencing
Roary
Rapid large-scale prokaryote pan genome analysis
Stars: ✭ 176 (-6.38%)
Mutual labels:  sequencing
Hgvs
Python library to parse, format, validate, normalize, and map sequence variants. `pip install hgvs`
Stars: ✭ 138 (-26.6%)
Mutual labels:  sequencing
Ariba
Antimicrobial Resistance Identification By Assembly
Stars: ✭ 96 (-48.94%)
Mutual labels:  sequencing
Genomicsqlite
Genomics Extension for SQLite
Stars: ✭ 90 (-52.13%)
Mutual labels:  sequencing
Artemis
Artemis is a free genome viewer and annotation tool that allows visualization of sequence features and the results of analyses within the context of the sequence, and its six-frame translation
Stars: ✭ 135 (-28.19%)
Mutual labels:  sequencing

Shasta long read assembler


The complete user documentation is available here.

For quick start information see here.

See Shafin et al, Nature Biotechnology 2020 for an error analysis of the Shasta assembler and more. Reads from this paper are available here. The assembly results are here.

Here is a QUAST analysis of a Shasta assembly of CHM13 and comparison with other assemblers.

Requests for help: please file GitHub issues to report problems, request help or ask questions. Please keep each issue on a single topic when possible.


The goal of the Shasta long read assembler is to rapidly produce accurate assembled sequence using as input DNA reads generated by Oxford Nanopore flow cells.

Computational methods used by the Shasta assembler include:

  • Using a run-length representation of the read sequence. This makes the assembly process more resilient to errors in homopolymer repeat counts, which are the most common type of errors in Oxford Nanopore reads.

  • Using in some phases of the computation a representation of the read sequence based on markers, a fixed subset of short k-mers (k ≈ 10).

As currently implemented, Shasta can run an assembly of a human genome at coverage around 60x in about 3 hours using a single, large machine (AWS instance type x1.32xlarge, with 128 virtual processors and 1952 GB of memory). The compute cost of such an assembly is around $20 at AWS spot market or reserved prices.

Shasta assembly quality is comparable or better than assembly quality achieved by other long read assemblers - see this paper for an extensive analysis. However, adjustments of assembly parameters are generally necessary to achieve optimal assembly results. A set of sample configuration files is provided (in the conf directory) to assist with this process.

Acknowledgments

The Shasta software uses various external software packages. See here for more information.

Reporting Security Issues

Please note: If you believe you have found a security issue, please responsibly disclose by contacting [email protected].


The complete user documentation is available here.

For quick start information see here.


Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].