All Projects â†’ slowkow â†’ picardmetrics

slowkow / picardmetrics

Licence: MIT License
🚦 Run Picard on BAM files and collate 90 metrics into one file.

Programming Languages

shell
77523 projects
Makefile
30231 projects

Projects that are alternatives of or similar to picardmetrics

PHAT
Pathogen-Host Analysis Tool - A modern Next-Generation Sequencing (NGS) analysis platform
Stars: ✭ 17 (-55.26%)
Mutual labels:  bam-files, bioinformatics, quality-control
gene-oracle
Feature extraction algorithm for genomic data
Stars: ✭ 13 (-65.79%)
Mutual labels:  bioinformatics, rna-seq
slamdunk
Streamlining SLAM-seq analysis with ultra-high sensitivity
Stars: ✭ 24 (-36.84%)
Mutual labels:  bioinformatics, rna-seq
CellO
CellO: Gene expression-based hierarchical cell type classification using the Cell Ontology
Stars: ✭ 34 (-10.53%)
Mutual labels:  bioinformatics, rna-seq
pyrpipe
Reproducible bioinformatics pipelines in python. Import any Unix tool/command in python.
Stars: ✭ 53 (+39.47%)
Mutual labels:  bioinformatics, rna-seq
MetaOmGraph
MetaOmGraph: a workbench for interactive exploratory data analysis of large expression datasets
Stars: ✭ 30 (-21.05%)
Mutual labels:  bioinformatics, rna-seq
CoNekT
CoNekT (short for Co-expression Network Toolkit) is a platform to browse co-expression data and enable cross-species comparisons.
Stars: ✭ 17 (-55.26%)
Mutual labels:  bioinformatics, rna-seq
crazydoc
Read DNA sequences from colourful Microsoft Word documents
Stars: ✭ 18 (-52.63%)
Mutual labels:  bioinformatics
calour
exploratory and interactive microbiome analyses based on heatmaps
Stars: ✭ 22 (-42.11%)
Mutual labels:  bioinformatics
chromap
Fast alignment and preprocessing of chromatin profiles
Stars: ✭ 93 (+144.74%)
Mutual labels:  bioinformatics
reg-gen
Regulatory Genomics Toolbox: Python library and set of tools for the integrative analysis of high throughput regulatory genomics data.
Stars: ✭ 64 (+68.42%)
Mutual labels:  bioinformatics
netSmooth
netSmooth: A Network smoothing based method for Single Cell RNA-seq imputation
Stars: ✭ 23 (-39.47%)
Mutual labels:  bioinformatics
bistro
A library to build and execute typed scientific workflows
Stars: ✭ 43 (+13.16%)
Mutual labels:  bioinformatics
epiviz
EpiViz is a scientific information visualization tool for genetic and epigenetic data, used to aid in the exploration and understanding of correlations between various genome features.
Stars: ✭ 65 (+71.05%)
Mutual labels:  bioinformatics
AMIDD
Introduction to Applied Mathematics and Informatics in Drug Discovery (AMIDD)
Stars: ✭ 13 (-65.79%)
Mutual labels:  bioinformatics
perbase
Per-base per-nucleotide depth analysis
Stars: ✭ 46 (+21.05%)
Mutual labels:  bioinformatics
SigProfilerExtractor
SigProfilerExtractor allows de novo extraction of mutational signatures from data generated in a matrix format. The tool identifies the number of operative mutational signatures, their activities in each sample, and the probability for each signature to cause a specific mutation type in a cancer sample. The tool makes use of SigProfilerMatrixGen…
Stars: ✭ 86 (+126.32%)
Mutual labels:  bioinformatics
matam
Mapping-Assisted Targeted-Assembly for Metagenomics
Stars: ✭ 18 (-52.63%)
Mutual labels:  bioinformatics
pathway-mapper
PathwayMapper: An interactive and collaborative graphical curation tool for cancer pathways
Stars: ✭ 47 (+23.68%)
Mutual labels:  bioinformatics
react-msa-viewer
React rerelease of MSAViewer
Stars: ✭ 15 (-60.53%)
Mutual labels:  bioinformatics

picardmetrics

Run Picard tools and collate multiple metrics files. Check the quality of your sequencing data.

DOI Build Status

Summary

Run picardmetrics like this:

for bam in data/project1/sample?/sample?.bam
do
  # -k keeps the BAM file with marked duplicate reads
  # -r runs RNA-seq Picard metrics
  # -o specifies where to put the output files
  picardmetrics run -k -r -o out/rnaseq $bam
done

# The final output file will be called "project1-all-metrics.tsv"
picardmetrics collate project1 out/rnaseq

picardmetrics runs up to 12 Picard tools on each BAM file and collates all of the output files into a single table with up to 90 different metrics. It also automatically creates the .refFlat and .rRNA.list files required for CollectRnaSeqMetrics.

See the picardmetrics manual for more details.

Next, plot and explore the metrics in R:

library(ggplot2)

dat <- read.delim("project1-all-metrics.tsv", stringsAsFactors = FALSE)

ggplot(dat) +
  geom_point(aes(PF_READS, PF_ALIGNED_BASES))

See two example BAM files in the data/ folder. The test/test.sh script illustrates the usage of picardmetrics and tests that it works correctly. See the outputs in the out/ folder. You can also download the reference files used to test picardmetrics.

Example

Genes detected vs. Mean MAPQ and Percent of bases vs. Sample

Use Picard to assess the quality of your sequencing data. This example shows RNA-seq data from hundreds of glioblastoma cells and gliomasphere cell lines.

On the left, each point represents an RNA-seq sample. We see that samples with high mean mapping quality have the greatest number of detected genes. Further, the color reveals variation in the percent of reads per sample that are assigned to exons.

On the right, each bar represents an RNA-seq sample. Each sample is broken down into the percent of sequenced bases coming from different genomic regions. We see that many samples have few sequenced bases coming from coding regions relative to intergenic regions.

Installation

# Download the code.
git clone https://github.com/slowkow/picardmetrics

cd picardmetrics

# Download and install the dependencies.
make get-deps PREFIX=~/.local

# Install picardmetrics and the man page.
make install PREFIX=~/.local

# Edit the configuration file for your project.
vim ~/picardmetrics.conf

If you wish, you can manually install the dependencies:

Contributing

Please submit an issue to report bugs or ask questions.

Please contribute bug fixes or new features with a pull request to this repository.

Related work

RNA-SeQC

RNA-SeQC is a java program which computes a series of quality control metrics for RNA-seq data. The input can be one or more BAM files. The output consists of HTML reports and tab delimited files of metrics data. This program can be valuable for comparing sequencing quality across different samples or experiments to evaluate different experimental parameters. It can also be run on individual samples as a means of quality control before continuing with downstream analysis.

RSeQC

RSeQC package provides a number of useful modules that can comprehensively evaluate high throughput sequence data especially RNA-seq data. Some basic modules quickly inspect sequence quality, nucleotide composition bias, PCR bias and GC bias, while RNA-seq specific modules evaluate sequencing saturation, mapped reads distribution, coverage uniformity, strand specificity, etc.

QoRTs

The QoRTs software package is a fast, efficient, and portable multifunction toolkit designed to assist in the analysis, quality control, and data management of RNA-Seq datasets. Its primary function is to aid in the detection and identification of errors, biases, and artifacts produced by paired-end high-throughput RNA-Seq technology. In addition, it can produce count data designed for use with differential expression and differential exon usage tools 2, as well as individual-sample and/or group-summary genome track files suitable for use with the UCSC genome browser (or any compatible browser).

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].