All Projects → tobiasrausch → ATACseq

tobiasrausch / ATACseq

Licence: BSD-3-Clause license
Analysis Workflow for Assay for Transposase-Accessible Chromatin using sequencing (ATAC-Seq)

Programming Languages

shell
77523 projects
r
7636 projects
python
139335 projects - #7 most used programming language
Makefile
30231 projects
Dockerfile
14818 projects

Projects that are alternatives of or similar to ATACseq

HMMRATAC
HMMRATAC peak caller for ATAC-seq data
Stars: ✭ 86 (+68.63%)
Mutual labels:  sequencing, atac-seq, peak-detection
mlst check
Multilocus sequence typing by blast using the schemes from PubMLST
Stars: ✭ 22 (-56.86%)
Mutual labels:  sequencing, next-generation-sequencing
gubbins
Rapid phylogenetic analysis of large samples of recombinant bacterial whole genome sequences using Gubbins
Stars: ✭ 103 (+101.96%)
Mutual labels:  sequencing, next-generation-sequencing
saffrontree
SaffronTree: Reference free rapid phylogenetic tree construction from raw read data
Stars: ✭ 17 (-66.67%)
Mutual labels:  sequencing, next-generation-sequencing
gff3toembl
Converts Prokka GFF3 files to EMBL files for uploading annotated assemblies to EBI
Stars: ✭ 27 (-47.06%)
Mutual labels:  sequencing, next-generation-sequencing
cacao
Callable Cancer Loci - assessment of sequencing coverage for actionable and pathogenic loci in cancer
Stars: ✭ 21 (-58.82%)
Mutual labels:  sequencing, next-generation-sequencing
plasmidtron
Assembling the cause of phenotypes and genotypes from NGS data
Stars: ✭ 27 (-47.06%)
Mutual labels:  sequencing, next-generation-sequencing
snp-sites
Finds SNP sites from a multi-FASTA alignment file
Stars: ✭ 182 (+256.86%)
Mutual labels:  sequencing, next-generation-sequencing
assembly improvement
Improve the quality of a denovo assembly by scaffolding and gap filling
Stars: ✭ 46 (-9.8%)
Mutual labels:  sequencing, next-generation-sequencing
workflows
Bioinformatics workflows developed for and used on the St. Jude Cloud project.
Stars: ✭ 16 (-68.63%)
Mutual labels:  next-generation-sequencing
GGR-cwl
CWL tools and workflows for GGR
Stars: ✭ 20 (-60.78%)
Mutual labels:  atac-seq
Jamais-Vu
Audio Fingerprinting and Recognition in Python using NVidia's CUDA
Stars: ✭ 24 (-52.94%)
Mutual labels:  peak-detection
desh-data
Sequence lineage information extracted from RKI sequence data repo
Stars: ✭ 22 (-56.86%)
Mutual labels:  sequencing
PECA
PECA is a software for inferring context specific gene regulatory network from paired gene expression and chromatin accessibility data
Stars: ✭ 31 (-39.22%)
Mutual labels:  atac-seq
xcms
This is the git repository matching the Bioconductor package xcms: LC/MS and GC/MS Data Analysis
Stars: ✭ 124 (+143.14%)
Mutual labels:  peak-detection
SNPGenie
Program for estimating πN/πS, dN/dS, and other diversity measures from next-generation sequencing data
Stars: ✭ 81 (+58.82%)
Mutual labels:  next-generation-sequencing
gargammel
gargammel is an ancient DNA simulator
Stars: ✭ 17 (-66.67%)
Mutual labels:  sequencing
astarix
AStarix: Fast and Optimal Sequence-to-Graph Aligner
Stars: ✭ 60 (+17.65%)
Mutual labels:  sequencing
gchromVAR
Cell type specific enrichments using finemapped variants and quantitative epigenetic data
Stars: ✭ 31 (-39.22%)
Mutual labels:  atac-seq
covid-19-signal
Files and methodology pertaining to the sequencing and analysis of SARS-CoV-2, causative agent of COVID-19.
Stars: ✭ 31 (-39.22%)
Mutual labels:  sequencing

ATAC-Seq Pipeline Installation

git clone https://github.com/tobiasrausch/ATACseq.git

cd ATACseq

make all

If one of the above commands fail your operating system probably lacks some build essentials. These are usually pre-installed but if you lack them you need to install these. For instance, for Ubuntu this would require:

apt-get install build-essential g++ git wget unzip

Building promoter regions for QC and downloading motifs

To annotate motifs and estimate TSS enrichments some simple scripts are included in this repository to download these databases.

cd bed/ && Rscript promoter.R && cd ..

cd motif/ && ./downloadMotifs.sh && cd ..

Running the ATAC-Seq analysis pipeline for a single sample

./src/atac.sh <hg38|hg19|mm10> <read1.fq.gz> <read2.fq.gz> <genome.fa> <output prefix>

Plotting the key ATAC-Seq Quality Control metrics

The pipeline produces at various steps JSON QC files (*.json.gz). You can upload and interactively browse these files at https://gear.embl.de/alfred/. In addition, the pipeline produces a succinct QC file for each sample. If you have multiple output folders (one for each ATAC-Seq sample) you can simply concatenate the QC metrics of each sample.

head -n 1 ./*/*.key.metrics | grep "TssEnrichment" | uniq > summary.tsv

cat ./*/*.key.metrics | grep -v "TssEnrichment" >> summary.tsv

To plot the distribution for all QC parameters.

Rscript R/metrics.R summary.tsv

ATAC-Seq pipeline output files

The ATAC-Seq pipeline produces various output files.

  • Bowtie BAM alignment files filtered for duplicates and mitochondrial reads.
  • Quality control output files from alfred, samtools, FastQC and cutadapt adapter filter metrics.
  • Macs peak calling files and IDR filtered peak lists.
  • Succinct browser tracks in bedGraph format and IGV's tdf format.
  • Footprint track of nucleosome positions and/or transcription factor bound DNA.
  • Homer motif finding results.

Differential peak calling

Merge peaks across samples and create a raw count matrix.

ls ./Sample1/Sample1.peaks ./Sample2/Sample2.peaks ./SampleN/SampleN.peaks > peaks.lst

ls ./Sample1/Sample1.bam ./Sample2/Sample2.bam ./SampleN/SampleN.bam > bams.lst

./src/count.sh hg19 peaks.lst bams.lst <output prefix>

To call differential peaks on a count matrix for TSS peaks, called counts.tss.gz, using DESeq2 we first need to create a file with sample level information (sample.info). For instance, if you have 2 replicates per condition:

echo -e "name\tcondition" > sample.info

zcat counts.tss.gz | head -n 1 | cut -f 5- | tr '\t' '\n' | sed 's/.final$//' | awk '{print $0"\t"int((NR-1)/2);}' >> sample.info

Rscript R/dpeaks.R counts.tss.gz sample.info

Intersecting peaks with annotation tracks

Peaks can of course be intersected with enhancer or conserved element tracks, i.e.:

cd tracks/ && downloadTracks.sh

bedtools intersect -a ./Sample2/Sample2.peaks -b tracks/conserved.bed

Plotting peak density along all chromosomes

There is a basic Rscript available for plotting peak densities.

Rscript R/karyoplot.R input.peaks

Citation

Tobias Rausch, Markus Hsi-Yang Fritz, Jan O Korbel, Vladimir Benes.
Alfred: Interactive multi-sample BAM alignment statistics, feature counting and feature annotation for long- and short-read sequencing.
Bioinformatics. 2018 Dec 6.

B Erarslan, JB Kunz, T Rausch, P Richter-Pechanska et al.
Chromatin accessibility landscape of pediatric T‐lymphoblastic leukemia and human T‐cell precursors
EMBO Mol Med (2020)

License

This ATAC-Seq pipeline is distributed under the BSD 3-Clause license.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].