slowkow / snakefiles

Licence: MIT license

🐍 Snakefiles for common RNA-seq data analysis workflows.

Programming Languages

python

139335 projects - #7 most used programming language

shell

77523 projects

Projects that are alternatives of or similar to snakefiles

rna-seq-kallisto-sleuth

A Snakemake workflow for differential expression analysis of RNA-seq data with Kallisto and Sleuth.

Stars: ✭ 56 (-28.21%)

Mutual labels: rna-seq, snakemake

tailseeker

Software for measuring poly(A) tail length and 3′-end modifications using a high-throughput sequencer

Stars: ✭ 17 (-78.21%)

Mutual labels: rna-seq, snakemake

plinycompute

A system for development of high-performance, data-intensive, distributed computing, applications, tools, and libraries.

Stars: ✭ 27 (-65.38%)

Mutual labels: high-performance-computing

dee2

Digital Expression Explorer 2 (DEE2): a repository of uniformly processed RNA-seq data

Stars: ✭ 32 (-58.97%)

Mutual labels: rna-seq

pychopper

A tool to identify, orient, trim and rescue full length cDNA reads

Stars: ✭ 74 (-5.13%)

Mutual labels: rna-seq

Whippet.jl

Lightweight and Fast; RNA-seq quantification at the event-level

Stars: ✭ 85 (+8.97%)

Mutual labels: rna-seq

snakemake-mode

[MIRROR] Emacs support for Snakemake

Stars: ✭ 30 (-61.54%)

Mutual labels: snakemake

kallistobustools

kallisto | bustools workflow for pre-processing single-cell RNA-seq data

Stars: ✭ 79 (+1.28%)

Mutual labels: rna-seq

TransPi

TransPi – a comprehensive TRanscriptome ANalysiS PIpeline for de novo transcriptome assembly

Stars: ✭ 18 (-76.92%)

Mutual labels: rna-seq

hicma

HiCMA: Hierarchical Computations on Manycore Architectures

Stars: ✭ 21 (-73.08%)

Mutual labels: high-performance-computing

tourmaline

Amplicon sequence processing workflow using QIIME 2 and Snakemake

Stars: ✭ 23 (-70.51%)

Mutual labels: snakemake

sunbeam

A robust, extensible metagenomics pipeline

Stars: ✭ 143 (+83.33%)

Mutual labels: snakemake

MDBenchmark

Quickly generate, start and analyze benchmarks for molecular dynamics simulations.

Stars: ✭ 64 (-17.95%)

Mutual labels: high-performance-computing

GGR-cwl

CWL tools and workflows for GGR

Stars: ✭ 20 (-74.36%)

Mutual labels: rna-seq

GREIN

GREIN : GEO RNA-seq Experiments Interactive Navigator

Stars: ✭ 40 (-48.72%)

Mutual labels: rna-seq

PSyclone

Domain-specific compiler for Finite Difference/Volume/Element Earth-system models in Fortran

Stars: ✭ 67 (-14.1%)

Mutual labels: high-performance-computing

AutoOptimize.jl

Automatic optimization and parallelization for Scientific Machine Learning (SciML)

Stars: ✭ 77 (-1.28%)

Mutual labels: high-performance-computing

scCATCH

Automatic Annotation on Cell Types of Clusters from Single-Cell RNA Sequencing Data

Stars: ✭ 137 (+75.64%)

Mutual labels: rna-seq

targets-minimal

A minimal example data analysis project with the targets R package

Stars: ✭ 50 (-35.9%)

Mutual labels: high-performance-computing

stantargets

Reproducible Bayesian data analysis pipelines with targets and cmdstanr

Stars: ✭ 31 (-60.26%)

Mutual labels: high-performance-computing

View All Similar Projects ➔

snakefiles

This repository has Snakefiles for common RNA-seq data analysis workflows. Please feel free to copy them and modify them to suit your needs.

Getting started

If you are new to Snakemake, you might like to start by walking through my tutorial for beginners. Next, have a look at Johannes Koster's introductory slides, tutorial, documentation, and FAQ.

Quick start:

# Copy the files
git clone https://github.com/slowkow/snakefiles.git

# Go to the kallisto directory
cd snakefiles/kallisto

# Run snakemake
snakemake

Data

This repository includes 6 FASTQ files in data/fastq/ to illustrate the usage of each of the RNA-seq workflows.

Sample1
- Sample1.R1.fastq.gz has the first mates of sequenced fragments.
- Sample1.R2.fastq.gz has the second mates of sequenced fragments.
Sample2
- Sample2.L1.R1.fastq.gz
- Sample2.L2.R1.fastq.gz
  - The first mate reads (R1), split across two files (L1 and L2). Some software such as STAR requires these reads to be merged into one file.
- Sample2.L1.R2.fastq.gz
- Sample2.L2.R2.fastq.gz
  - Likewise, the second mate reads (R2) are also split across two files (L1 and L2). To make matters worse, Sample2.L2.R2.fastq.gz has only 2000 reads, whereas Sample2.L2.R1.fastq.gz has 2500 reads. The Snakefiles in this repository can handle this without any problems.

Scripts

make_samples.py creates the samples.json file.
bsub.py receives job scripts from Snakemake and automatically submits them to an appropriate LSF queue based on job requirements.

RNA-seq workflows

kallisto/

Quantify gene isoform expression in transcripts per million (TPM) with kallisto and collate outputs from multiple samples into one file.

Execute a multi-sample 2-pass STAR alignment, sharing the splice junctions across samples. Count fragments per gene and fragments per splice site. Also produce a BAM file with coordinates relative to transcripts. Quantify transcripts in TPM with eXpress. Collate outputs from multiple samples.

Contributing

Please submit an issue to report bugs or ask questions.

Please contribute bug fixes or new features with a pull request to this repository.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

slowkow / snakefiles

Programming Languages

Labels

Projects that are alternatives of or similar to snakefiles

snakefiles

Getting started

Data

Scripts

RNA-seq workflows

kallisto/

star_express/

Contributing