All Projects → DarwinAwardWinner → CD4-csaw

DarwinAwardWinner / CD4-csaw

Licence: other
Reproducible reanalysis of a combined ChIP-Seq & RNA-Seq data set

Programming Languages

r
7636 projects
python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to CD4-csaw

ideal
Interactive Differential Expression AnaLysis - DE made accessible and reproducible
Stars: ✭ 24 (+50%)
Mutual labels:  rna-seq, reproducible-research, bioconductor
pyrpipe
Reproducible bioinformatics pipelines in python. Import any Unix tool/command in python.
Stars: ✭ 53 (+231.25%)
Mutual labels:  rna-seq, bioinformatics-pipeline
ngs-in-bioc
A course on Analysing Next Generation (/High Throughput etc..) Sequencing data using Bioconductor
Stars: ✭ 37 (+131.25%)
Mutual labels:  rna-seq, bioconductor
DEGreport
Create a cromphensive report of DEG list coming from any analysis of RNAseq data
Stars: ✭ 18 (+12.5%)
Mutual labels:  rna-seq, bioconductor
GREIN
GREIN : GEO RNA-seq Experiments Interactive Navigator
Stars: ✭ 40 (+150%)
Mutual labels:  rna-seq, bioinformatics-pipeline
dolphinnext
A graphical user interface for distributed data processing of high throughput genomics
Stars: ✭ 92 (+475%)
Mutual labels:  rna-seq, chip-seq
Protocols-4pub
Multi-omics analysis protocols by Lyu.
Stars: ✭ 37 (+131.25%)
Mutual labels:  rna-seq, chip-seq
GeneTonic
Enjoy your transcriptomic data and analysis responsibly - like sipping a cocktail
Stars: ✭ 66 (+312.5%)
Mutual labels:  reproducible-research, bioconductor
haystack bio
Haystack: Epigenetic Variability and Transcription Factor Motifs Analysis Pipeline
Stars: ✭ 42 (+162.5%)
Mutual labels:  rna-seq, chip-seq
ALPS
AnaLysis routines for ePigenomicS data - 🏫 Bioconductor project
Stars: ✭ 13 (-18.75%)
Mutual labels:  bioconductor, chip-seq
GGR-cwl
CWL tools and workflows for GGR
Stars: ✭ 20 (+25%)
Mutual labels:  rna-seq, chip-seq
cruk-summer-school-2018
Summer school course materials collection
Stars: ✭ 24 (+50%)
Mutual labels:  rna-seq, chip-seq
biojupies
Automated generation of tailored bioinformatics Jupyter Notebooks via a user interface.
Stars: ✭ 96 (+500%)
Mutual labels:  rna-seq
NGI-RNAseq
Nextflow RNA-Seq Best Practice analysis pipeline, used at the SciLifeLab National Genomics Infrastructure.
Stars: ✭ 50 (+212.5%)
Mutual labels:  rna-seq
lumberjack
Track changes in data with ease
Stars: ✭ 58 (+262.5%)
Mutual labels:  reproducible-research
RepSeP
Reproducible Self-Publishing - Demo Publications in the Most Common Formats
Stars: ✭ 14 (-12.5%)
Mutual labels:  reproducible-research
rmdTemplates
Rmarkdown templates for reproducible science
Stars: ✭ 112 (+600%)
Mutual labels:  reproducible-research
IsoQuant
Reference-based transcript discovery from long RNA read
Stars: ✭ 26 (+62.5%)
Mutual labels:  rna-seq
OrchestratingSingleCellAnalysis-release
An online companion to the OSCA manuscript demonstrating Bioconductor resources and workflows for single-cell RNA-seq analysis.
Stars: ✭ 35 (+118.75%)
Mutual labels:  rna-seq
snakefiles
🐍 Snakefiles for common RNA-seq data analysis workflows.
Stars: ✭ 78 (+387.5%)
Mutual labels:  rna-seq

Re-analysis of a combined ChIP-Seq & RNA-Seq data set

This is the code for a re-analysis of a GEO dataset that I originally analyzed for this paper using statistical methods that were not yet available at the time, such as the csaw Bioconductor package, which provides a principled way to normalize windowed counts of ChIP-Seq reads and test them for differential binding. The original paper only analyzed binding within pre-defined promoter regions. In addition, some improvements have also been made to the RNA-seq analysis using newer features of limma such as quality weights.

This workflow downloads the sequence data and sample metadata from the public GEO/SRA release, so anyone can download and run this code to reproduce the full analysis.

Workflow

Rule Graph

Completed components

  • ChIP-seq
    • Mapping with bowtie2
    • Peak calling with MACS2 and Epic
    • Fetching of blacklists from UCSC
    • Generation of greylists from ChIP-Seq input samples
    • IDR analysis of blacklist-filtered peak calls
    • Computation of cross-correlation function for ChIP-Seq samples, excluding blacklisted regions
    • Counting in windows across the genome
  • RNA-seq
    • Mapping with STAR & HISAT2
    • Counting reads aligned to genes
    • Alignment-free bias-corrected transcript quantification using Salmon & Kallisto
    • Differential gene expression

Possible TODO components

TODO Code cleanup

  • Remove unnecessary library() calls
  • Put spaces around equals signs

TODO Other

  • Document how to run the pipeline
  • Provide install script for R & Python packages.

Dependencies

Command-line tools

Programming languages and packages

  • R, Bioconductor, and the following R packages:
    • From CRAN: assertthat, doParallel, dplyr, future, getopt, GGally, ggforce, ggfortify, ggplot2, ks, lazyeval, lubridate, magrittr, MASS, Matrix, openxlsx, optparse, parallel, purrr, RColorBrewer, readr, reshape2, rex, scales, stringi, stringr
    • From Bioconductor: annotate, Biobase, BiocParallel, BSgenome.Hsapiens.UCSC.hg19, BSgenome.Hsapiens.UCSC.hg38, ChIPQC, csaw, edgeR, GenomicFeatures, GenomicRanges, GEOquery, limma, org.Hs.eg.db, Rsamtools, Rsubread, rtracklayer, S4Vectors, SRAdb, SummarizedExperiment, TxDb.Hsapiens.UCSC.hg19.knownGene, tximport
    • Installed manually: sleuth, wasabi
  • Python 3 and the following Python packages: biopython, atomicwrites, numpy, pandas, plac, pysam, rpy2, snakemake
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].