All Projects → urmi-21 → pyrpipe

urmi-21 / pyrpipe

Licence: MIT license
Reproducible bioinformatics pipelines in python. Import any Unix tool/command in python.

Programming Languages

python
139335 projects - #7 most used programming language
shell
77523 projects

Projects that are alternatives of or similar to pyrpipe

GREIN
GREIN : GEO RNA-seq Experiments Interactive Navigator
Stars: ✭ 40 (-24.53%)
Mutual labels:  rna-seq, bioinformatics-pipeline, bioinformatics-analysis, rna-seq-pipeline
conda-env-builder
Build and maintain multiple custom conda environments all in once place.
Stars: ✭ 18 (-66.04%)
Mutual labels:  bioinformatics, conda, bioinformatics-pipeline
bystro
Bystro genetic analysis (annotation, filtering, statistics)
Stars: ✭ 31 (-41.51%)
Mutual labels:  bioinformatics, bioinformatics-pipeline, bioinformatics-analysis
slamdunk
Streamlining SLAM-seq analysis with ultra-high sensitivity
Stars: ✭ 24 (-54.72%)
Mutual labels:  bioinformatics, rna-seq
plasmidtron
Assembling the cause of phenotypes and genotypes from NGS data
Stars: ✭ 27 (-49.06%)
Mutual labels:  bioinformatics, bioinformatics-pipeline
picardmetrics
🚦 Run Picard on BAM files and collate 90 metrics into one file.
Stars: ✭ 38 (-28.3%)
Mutual labels:  bioinformatics, rna-seq
nPhase
Ploidy agnostic phasing pipeline and algorithm
Stars: ✭ 18 (-66.04%)
Mutual labels:  bioinformatics, bioinformatics-pipeline
genomedisco
Software for comparing contact maps from HiC, CaptureC and other 3D genome data.
Stars: ✭ 23 (-56.6%)
Mutual labels:  bioinformatics, bioinformatics-pipeline
tiptoft
Predict plasmids from uncorrected long read data
Stars: ✭ 27 (-49.06%)
Mutual labels:  bioinformatics, bioinformatics-pipeline
gff3toembl
Converts Prokka GFF3 files to EMBL files for uploading annotated assemblies to EBI
Stars: ✭ 27 (-49.06%)
Mutual labels:  bioinformatics, bioinformatics-pipeline
r2g
A homology-based, computationally lightweight pipeline for discovering genes in the absence of an assembly
Stars: ✭ 49 (-7.55%)
Mutual labels:  bioinformatics-pipeline, ncbi-sra
TransPi
TransPi – a comprehensive TRanscriptome ANalysiS PIpeline for de novo transcriptome assembly
Stars: ✭ 18 (-66.04%)
Mutual labels:  rna-seq, rna-seq-pipeline
CoNekT
CoNekT (short for Co-expression Network Toolkit) is a platform to browse co-expression data and enable cross-species comparisons.
Stars: ✭ 17 (-67.92%)
Mutual labels:  bioinformatics, rna-seq
CellO
CellO: Gene expression-based hierarchical cell type classification using the Cell Ontology
Stars: ✭ 34 (-35.85%)
Mutual labels:  bioinformatics, rna-seq
gene-oracle
Feature extraction algorithm for genomic data
Stars: ✭ 13 (-75.47%)
Mutual labels:  bioinformatics, rna-seq
PHAT
Pathogen-Host Analysis Tool - A modern Next-Generation Sequencing (NGS) analysis platform
Stars: ✭ 17 (-67.92%)
Mutual labels:  bioinformatics, bioinformatics-analysis
saffrontree
SaffronTree: Reference free rapid phylogenetic tree construction from raw read data
Stars: ✭ 17 (-67.92%)
Mutual labels:  bioinformatics, bioinformatics-pipeline
MetaOmGraph
MetaOmGraph: a workbench for interactive exploratory data analysis of large expression datasets
Stars: ✭ 30 (-43.4%)
Mutual labels:  bioinformatics, rna-seq
streamformatics
Real-time species-typing visualisation for nanopore data.
Stars: ✭ 13 (-75.47%)
Mutual labels:  bioinformatics, bioinformatics-pipeline
CD4-csaw
Reproducible reanalysis of a combined ChIP-Seq & RNA-Seq data set
Stars: ✭ 16 (-69.81%)
Mutual labels:  rna-seq, bioinformatics-pipeline

Build Status Coverage Status Documentation Status PyPI install with bioconda PyPI - License publication

pyrpipe: python rna-seq pipeliner

Introduction

pyrpipe (Pronounced as "pyre-pipe") is a python package to easily develop bioinformatic or any other computational pipelines in pure python. pyrpipe provides an easy-to-use framework for importing any UNIX command in python. pyrpipe comes with specialized classes and functions to easily code RNA-Seq processing workflows. Pipelines in pyrpipe can be created and extended by integrating third-party tools, executable scripts, or python libraries in an object oriented manner.

Read the paper here

Read the docs here

NOTE: Due to change in API designs, pyrpipe version 0.0.5 and above is not compatible with lower versions. All the tutorials and documentation have been updated to reflect v0.0.5.

What it does

Allows fast and easy development of bioinformatics pipelines in python by providing

  • a high level api to popular RNA-Seq processing tools -- downloading, trimming, alignment, quantificantion and assembly
  • optimizes program parameters based on the data
  • a general framework to execute any linux command from python
  • comprehensive logging features to log all the commands, output and their return status
  • report generating features for easy sharing, reproducing, benchmarking and debugging

Key Features (version 0.0.5)

  • Import any UNIX executable command/tool in python
  • Dry-run feature to check dependencies and commands before execution
  • Flexible and robust handling of options and arguments (both Linux and Java style options)
  • Auto load command options from .yaml files
  • Easily override threads and memory options using global values
  • Extensive logging for all the commands
  • Automatically verify Integrity of output targets
  • Resume feature to restart pipelines/jobs from where interrupted
  • Create reports, MultiQC reports for bioinformatic pipelines
  • Easily integrated into workflow managers like Snakemake and NextFlow (to schedule jobs, scale jobs, identify paralellel steps in pipelines)

What it CAN NOT do by itself

  • Schedule jobs
  • Scale jobs on HPC/cloud
  • Identify parallel steps in pipelines

Prerequisites

  • python 3.6 or higher
  • OS: Linux, Mac

API to RNA-Seq tools include:

Tool Purpose
SRA Tools (v. 2.10.9 ) SRA access
Trimgalore (v. 0.6.0) Trimming
BBDuk (v. 38.76) Trimming
Hisat2 (v. 2.2.1) Alignment
STAR (v. 2.7.7a) Alignment
Bowtie2 (v. 2.3.5.1) Alignment
Kallisto (v. 0.46.2) Quantification
Salmon (v. 0.14.1) Quantification
Stringtie (v. 2.1.4) Transcript Assembly
Cufflinks (v. 2.2.1) Transcript Assembly
Samtools (v. 1.9) Tools

Examples

Get started with the basic tutorial. Read the documentation here. Several examples are provided here

Download, trim and align RNA-Seq data

Following python code downloads data from SRA, uses Trim Galore to trim the fastq files and STAR to align reads. More detailed examples are provided here

from pyrpipe.sra import SRA
from pyrpipe.qc import Trimgalore
from pyrpipe.mapping import Star
trimgalore = Trimgalore(threads=8)
star = Star(index='data/index',threads=4)
for srr in ['SRR976159','SRR978411','SRR971778']:
    SRA(srr).trim(trimgalore).align(star)

Import a Unix command

This simple example imports and runs the Unix grep command. See this for more examples.

>>> from pyrpipe.runnable import Runnable
>>> grep=Runnable(command='grep')
>>> grep.run('query1','file1.txt',verbose=True)
>>> grep.run('query2','file2.txt',verbose=True)

Installation

Please follow these instructions:

To create a new Conda environment (recommended):

NOTE: You need to install the third-party tools to work with pyrpipe. We recomend installing these through bioconda where possible. An example of setting up the environment using conda is provided below. It is best to share your conda environment files with pyrpipe scripts to ensure reproducibility.

  1. Download and install Conda
  2. conda create -n pyrpipe python=3.8
  3. conda activate pyrpipe
  4. conda install -c bioconda pyrpipe star=2.7.7a sra-tools=2.10.9 stringtie=2.1.4 trim-galore=0.6.6

The above command will install pyrpipe and the required tools inside a conda environment. Alternatively, use the conda environment.yaml file provided in this repository and build the conda environment by running

conda env create -f pyrpipe_environment.yaml

Install latest stable version

Through conda

conda install -c bioconda pyrpipe 

Through PIP

pip install pyrpipe --upgrade

If above command fails due to dependency issues, try:

  1. Download the requirements.txt
  2. pip install -r requirements.txt
  3. pip install pyrpipe

To run tests:

  1. Download the test set (direct link)
  2. pip install pytest
  3. To build test_environment. Please READ THIS
  4. From pyrpipe root directory, run pytest tests/test_*

Install dev version

git clone https://github.com/urmi-21/pyrpipe.git
pip install -r pyrpipe/requirements.txt
pip install -e path_to/pyrpipe

#Running tests; From pyrpipe root perform
#To build test_environment (This will download tools): 
cd tests ; . ./build_test_env.sh
#in same terminal
py.test tests/test_*

Setting NCBI SRA-Tools

If you face problems with downloading data from SRA, try configuring the SRA-Tools. Use vdb-config -i to configure SRA Toolkit. Make sure that:

  • Under the TOOLS tab, prefetch downloads to is set to public user-repository
  • Under the CACHE tab, location of public user-repository is not empty

Use the following pyrpipe_diagnostic command to test if SRA-Tools are setup properly

pyrpipe_diagnostic test

Contributing

Please see CONTRIBUTING.md

Funding

This work is funded in part by the National Science Foundation award IOS 1546858, "Orphan Genes: An Untapped Genetic Reservoir of Novel Traits".

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].