All Projects → rvalieris → Parallel Fastq Dump

rvalieris / Parallel Fastq Dump

Licence: mit
parallel fastq-dump wrapper

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Parallel Fastq Dump

Circlator
A tool to circularize genome assemblies
Stars: ✭ 121 (-14.18%)
Mutual labels:  bioinformatics
Masurca
Stars: ✭ 128 (-9.22%)
Mutual labels:  bioinformatics
Octopus
Bayesian haplotype-based mutation calling
Stars: ✭ 131 (-7.09%)
Mutual labels:  bioinformatics
Kmer Cnt
Code examples of fast and simple k-mer counters for tutorial purposes
Stars: ✭ 124 (-12.06%)
Mutual labels:  bioinformatics
Plip
Protein-Ligand Interaction Profiler - Analyze and visualize non-covalent protein-ligand interactions in PDB files according to 📝 Salentin et al. (2015), https://www.doi.org/10.1093/nar/gkv315
Stars: ✭ 123 (-12.77%)
Mutual labels:  bioinformatics
Readfq
Fast multi-line FASTA/Q reader in several programming languages
Stars: ✭ 128 (-9.22%)
Mutual labels:  bioinformatics
Hicexplorer
HiCExplorer is a powerful and easy to use set of tools to process, normalize and visualize Hi-C data.
Stars: ✭ 116 (-17.73%)
Mutual labels:  bioinformatics
Artemis
Artemis is a free genome viewer and annotation tool that allows visualization of sequence features and the results of analyses within the context of the sequence, and its six-frame translation
Stars: ✭ 135 (-4.26%)
Mutual labels:  bioinformatics
Somalier
fast sample-swap and relatedness checks on BAMs/CRAMs/VCFs/GVCFs... "like damn that is one smart wine guy"
Stars: ✭ 128 (-9.22%)
Mutual labels:  bioinformatics
Mrbayes
MrBayes is a program for Bayesian inference and model choice across a wide range of phylogenetic and evolutionary models. For documentation and downloading the program, please see the home page:
Stars: ✭ 131 (-7.09%)
Mutual labels:  bioinformatics
Krakenuniq
🐙 KrakenUniq: Metagenomics classifier with unique k-mer counting for more specific results
Stars: ✭ 123 (-12.77%)
Mutual labels:  bioinformatics
Sarek
Detect germline or somatic variants from normal or tumour/normal whole-genome or targeted sequencing
Stars: ✭ 124 (-12.06%)
Mutual labels:  bioinformatics
Hts Nim
nim wrapper for htslib for parsing genomics data files
Stars: ✭ 132 (-6.38%)
Mutual labels:  bioinformatics
Scgen
Single cell perturbation prediction
Stars: ✭ 122 (-13.48%)
Mutual labels:  bioinformatics
Awesome Single Cell
Community-curated list of software packages and data resources for single-cell, including RNA-seq, ATAC-seq, etc.
Stars: ✭ 1,937 (+1273.76%)
Mutual labels:  bioinformatics
Blacklist
Application for making ENCODE Blacklists
Stars: ✭ 119 (-15.6%)
Mutual labels:  bioinformatics
Splatter
Simple simulation of single-cell RNA sequencing data
Stars: ✭ 128 (-9.22%)
Mutual labels:  bioinformatics
Hgvs
Python library to parse, format, validate, normalize, and map sequence variants. `pip install hgvs`
Stars: ✭ 138 (-2.13%)
Mutual labels:  bioinformatics
Hifiasm
Hifiasm: a haplotype-resolved assembler for accurate Hifi reads
Stars: ✭ 134 (-4.96%)
Mutual labels:  bioinformatics
Biotite
A comprehensive library for computational molecular biology
Stars: ✭ 132 (-6.38%)
Mutual labels:  bioinformatics

parallel-fastq-dump

parallel fastq-dump wrapper

Why & How

NCBI fastq-dump can be very slow sometimes, even if you have the resources (network, IO, CPU) to go faster, even if you already downloaded the sra file (see the protip below). This tool speeds up the process by dividing the work into multiple threads.

This is possible because fastq-dump have options (-N and -X) to query specific ranges of the sra file, this tool works by dividing the work into the requested number of threads, running multiple fastq-dump in parallel and concatenating the results back together, as if you had just executed a plain fastq-dump call.

Protips

  • Downloading with fastq-dump is slow, even with multiple threads, it is recommended to use prefetch to download the target sra file before using fastq-dump, that way fastq-dump will only need to do the dumping.
  • All extra arguments will be passed directly to fastq-dump, --gzip, --split-files and filters works as expected.
  • This tool is not a replacement, you still need fastq-dump and sra-stat on your PATH for it to work properly.
  • Speed improvements are better with bigger files, think at least 200k reads/pairs for each thread used.

Install

The preferred way to install is using Bioconda <http://bioconda.github.io/>_:

conda install parallel-fastq-dump

this will get you the sra-tools dependency as well.

Examples

$ parallel-fastq-dump --sra-id SRR1219899 --threads 4 --outdir out/ --split-files --gzip

Micro Benchmark

.. figure:: https://cloud.githubusercontent.com/assets/6310472/23962085/bdefef44-098b-11e7-825f-1da53d6568d6.png

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].