All Projects → mdshw5 → fastqp

mdshw5 / fastqp

Licence: MIT license
Simple FASTQ quality assessment using Python

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to fastqp

pheniqs
Fast and accurate sequence demultiplexing
Stars: ✭ 14 (-86.14%)
Mutual labels:  sam, fastq
fuc
Frequently used commands in bioinformatics
Stars: ✭ 23 (-77.23%)
Mutual labels:  sam, fastq
bin
My bioinfo toolbox
Stars: ✭ 42 (-58.42%)
Mutual labels:  sam, fastq
cljam
A DNA Sequence Alignment/Map (SAM) library for Clojure
Stars: ✭ 85 (-15.84%)
Mutual labels:  sam, fastq
Aws Serverless Airline Booking
Airline Booking is a sample web application that provides Flight Search, Flight Payment, Flight Booking and Loyalty points including end-to-end testing, GraphQL and CI/CD. This web application was the theme of Build on Serverless Season 2 on AWS Twitch running from April 24th until end of August in 2019.
Stars: ✭ 1,290 (+1177.23%)
Mutual labels:  sam
Htslib
C library for high-throughput sequencing data formats
Stars: ✭ 529 (+423.76%)
Mutual labels:  sam
Serverless Express
Run Node.js web applications and APIs using existing application frameworks on AWS #serverless technologies such as Lambda, API Gateway, Lambda@Edge, and ALB.
Stars: ✭ 4,265 (+4122.77%)
Mutual labels:  sam
Aws Serverless Workshop Innovator Island
Welcome to the Innovator Island serverless workshop! This repo contains all the instructions and code you need to complete the workshop. Questions? Contact @jbesw.
Stars: ✭ 363 (+259.41%)
Mutual labels:  sam
bioSyntax-archive
Syntax highlighting for computational biology
Stars: ✭ 16 (-84.16%)
Mutual labels:  sam
Htsjdk
A Java API for high-throughput sequencing data (HTS) formats.
Stars: ✭ 220 (+117.82%)
Mutual labels:  sam
Genozip
Compressor for genomic files (FASTQ, SAM/BAM, VCF, FASTA, GVF, 23andMe...), up to 5x better than gzip and faster too
Stars: ✭ 53 (-47.52%)
Mutual labels:  sam
Aws Sam Cli
CLI tool to build, test, debug, and deploy Serverless applications using AWS SAM
Stars: ✭ 5,817 (+5659.41%)
Mutual labels:  sam
Sessions With Aws Sam
This repo contains all the SAM templates created in the Twitch series #SessionsWithSAM. The show is every Thursday on Twitch at 10 AM PDT.
Stars: ✭ 136 (+34.65%)
Mutual labels:  sam
Aws Toolkit Jetbrains
AWS Toolkit for JetBrains - a plugin for interacting with AWS from JetBrains IDEs
Stars: ✭ 514 (+408.91%)
Mutual labels:  sam
Helios
The free embedded operating system.
Stars: ✭ 223 (+120.79%)
Mutual labels:  sam
Sambamba
Tools for working with SAM/BAM data
Stars: ✭ 409 (+304.95%)
Mutual labels:  sam
Serverless Application Model
AWS Serverless Application Model (SAM) is an open-source framework for building serverless applications
Stars: ✭ 8,305 (+8122.77%)
Mutual labels:  sam
Esp8266sam
Speech synthesis for ESP8266 using S.A.M. port
Stars: ✭ 199 (+97.03%)
Mutual labels:  sam
Transcriptclean
Correct mismatches, microindels, and noncanonical splice junctions in long reads that have been mapped to the genome
Stars: ✭ 32 (-68.32%)
Mutual labels:  sam
Learning bam file
Learning the Sequence Alignment/Map format
Stars: ✭ 76 (-24.75%)
Mutual labels:  sam

fastqp

Build Status PyPI

Simple FASTQ, SAM and BAM read quality assessment and plotting using Python.

Features

  • Requires only Python with Numpy, Scipy, and Matplotlib libraries
  • Works with (gzipped) FASTQ, SAM, and BAM formatted reads
  • Tabular, tidy, output statistics so you can create your own graphs
  • A useful set of default graphics rivaling comparable QC packages
  • Counts all IUPAC ambiguous nucleotide codes (NMWSKRYVHDB) if present in sequences
  • Downsamples input files to around 2,000,000 reads (user adjustable)
  • Allows a 5′ and 3′ (left and right) cycle limit for graphics generation
  • Tracks kmers and sequence duplication for the entire input file
  • Plots base call reference mismatches for aligned reads
  • Optional sequence duplication calculation using Bloom filters (beta)

Requirements

Tested on Python 2.7, and 3.4

Tested on Mac OS 10.10 and Linux 2.6.18

Installation

pip install [--user] fastqp

Note: BAM file support requires samtools

Usage

usage: fastqp [-h] [-q] [-s BINSIZE] [-a NAME] [-n NREADS] [-p BASE_PROBS] [-k {2,3,4,5,6,7}] [-o OUTPUT]
              [-ll LEFTLIMIT] [-rl RIGHTLIMIT] [-mq MEDIAN_QUAL] [--aligned-only | --unaligned-only] [-d]
              input

simple NGS read quality assessment using Python

positional arguments:
  input                 input file (one of .sam, .bam, .fq, or .fastq(.gz) or stdin (-))

optional arguments:
  -h, --help            show this help message and exit
  -q, --quiet           do not print any messages (default: False)
  -s BINSIZE, --binsize BINSIZE
                        number of reads to bin for sampling (default: auto)
  -a NAME, --name NAME  sample name identifier for text and graphics output (default: input file name)
  -n NREADS, --nreads NREADS
                        number of reads sample from input (default: 2000000)
  -p BASE_PROBS, --base-probs BASE_PROBS
                        probabilites for observing A,T,C,G,N in reads (default: 0.25,0.25,0.25,0.25,0.1)
  -k {2,3,4,5,6,7}, --kmer {2,3,4,5,6,7}
                        length of kmer for over-repesented kmer counts (default: 5)
  -o OUTPUT, --output OUTPUT
                        base name for output files (default: fastqp_figures)
  -ll LEFTLIMIT, --leftlimit LEFTLIMIT
                        leftmost cycle limit (default: 1)
  -rl RIGHTLIMIT, --rightlimit RIGHTLIMIT
                        rightmost cycle limit (-1 for none) (default: -1)
  -mq MEDIAN_QUAL, --median-qual MEDIAN_QUAL
                        median quality threshold for failing QC (default: 30)
  --aligned-only        only aligned reads (default: False)
  --unaligned-only      only unaligned reads (default: False)
  -d, --count-duplicates
                        calculate sequence duplication rate (default: False)

Changes

See releases page for details.

Examples

quality heatmap

gc plot

gc distribution

nucleotide plot

nucleotide mismatch plot

kmer distribution

depth plot

quality percentiles

quality distribution

adapter kmer distribution

Acknowledgements

This project is freely licensed by the author, Matthew Shirley, and was completed under the mentorship financial support of Drs. Sarah Wheelan and Vasan Yegnasubramanian at the Sidney Kimmel Comprehensive Cancer Center in the Department of Oncology.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].