All Projects → kevchn → Quagmir

kevchn / Quagmir

Licence: mit
A python-based isomiR quantification and analysis pipeline

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Quagmir

Ugene
UGENE is free open-source cross-platform bioinformatics software
Stars: ✭ 112 (+1144.44%)
Mutual labels:  pipeline, sequencing
Galaxy
Data intensive science for everyone.
Stars: ✭ 812 (+8922.22%)
Mutual labels:  pipeline, sequencing
Rnaseq Workflow
A repository for setting up a RNAseq workflow
Stars: ✭ 170 (+1788.89%)
Mutual labels:  pipeline, sequencing
Sns
Analysis pipelines for sequencing data
Stars: ✭ 43 (+377.78%)
Mutual labels:  pipeline, sequencing
RNASeq
RNASeq pipeline
Stars: ✭ 30 (+233.33%)
Mutual labels:  pipeline, sequencing
Pdpipe
Easy pipelines for pandas DataFrames.
Stars: ✭ 590 (+6455.56%)
Mutual labels:  pipeline
Pipeline
A cloud-native Pipeline resource.
Stars: ✭ 6,751 (+74911.11%)
Mutual labels:  pipeline
Ttyplot
a realtime plotting utility for terminal/console with data input from stdin
Stars: ✭ 532 (+5811.11%)
Mutual labels:  pipeline
Gaia
Build powerful pipelines in any programming language.
Stars: ✭ 4,534 (+50277.78%)
Mutual labels:  pipeline
Vector
A reliable, high-performance tool for building observability data pipelines.
Stars: ✭ 8,736 (+96966.67%)
Mutual labels:  pipeline
Cookiecutter
DEPRECIATED! Please use nf-core/tools instead
Stars: ✭ 18 (+100%)
Mutual labels:  pipeline
Syntax sugar python
A library adding some anti-Pythonic syntatic sugar to Python
Stars: ✭ 721 (+7911.11%)
Mutual labels:  pipeline
Go Streams
A lightweight stream processing library for Go
Stars: ✭ 615 (+6733.33%)
Mutual labels:  pipeline
Proposal Pipeline Operator
A proposal for adding a useful pipe operator to JavaScript.
Stars: ✭ 5,899 (+65444.44%)
Mutual labels:  pipeline
Helmsman
highly-efficient & lightweight mutation signature matrix aggregation
Stars: ✭ 19 (+111.11%)
Mutual labels:  sequencing
Ok
Elegant error/exception handling in Elixir, with result monads.
Stars: ✭ 517 (+5644.44%)
Mutual labels:  pipeline
Aws Boilerplate
Opinionated full stack web app's boilerplate, ready to be deployed to AWS platform.
Stars: ✭ 682 (+7477.78%)
Mutual labels:  pipeline
Phila Airflow
Stars: ✭ 16 (+77.78%)
Mutual labels:  pipeline
Lambdacd
a library to define a continuous delivery pipeline in code
Stars: ✭ 655 (+7177.78%)
Mutual labels:  pipeline
Bk Sops
蓝鲸智云标准运维(SOPS)
Stars: ✭ 632 (+6922.22%)
Mutual labels:  pipeline

alt tag alt tag alt tag

QuagmiR

A python-based miRNA sequencing pipeline for isomiR quantification and analysis.

Please see https://github.com/Gu-Lab-RBL-NCI/QuagmiR for the newest version of this tool.

alt tag

Dependencies

  • Make sure that you have Python 3.4+ installed (type python --version in the console)
  • Make sure you have the latest version of pip: pip3 install -U pip
  • Make sure you have Miniconda installed

Installation

  1. Download repository: git clone https://github.com/kevchn/quagmir
  2. Go into local quagmir folder: cd quagmir
  3. Install Python dependencies: conda env create -f environment.yml

Quickstart

  1. Add your .fastq samples into the data folder (a sample has been provided for testing):
  ├── LICENSE
  ├── README.md
  ├── config.yaml
  ├── Snakefile
  ├── environment.yml
  ├── data/
  │   ├── sample.fastq_ready
  │   └── YOUR_FILE_HERE.fastq_ready
  |   ├── collapsed/
  ├── motif-consensus.fa
  └── results/
  │   └── tabular/
  
  1. Edit the motif-consensus.fa file to insert your miRNA information with the following format:
>miRNA_name miRNA_motif
miRNA_consensus_sequence

>passenger-shRNA-mir21-ORF59-5p-1 ACACCCTGGCCGGGT
CCGACACCCTGGCCGGGTTGT
  1. Edit the config.yaml file to change configuration options if needed (default values fine in most use cases):
# DISPLAY
min_ratio: .001
min_read: 9

display_summary: True
display_sequence_info: True
display_nucleotide_dist: True

# FUNCTION
destructive_motif_pull: False

# INPUT
motif_consensus_file: 'motif-consensus.fa'
  1. Run pipeline: bash run.sh or activate conda env and run snakemake

Update

Run the commands git reset --hard and git pull.

Additional information

Quagmir skips any reads that it considers not to be an isomir of the miRNA, even if the reads are pulled in by the motif of the miRNA. These skipped reads are reported in the run log in the logs/ folder.

Currently, Quagmir by default skips reads that:

(1) Have none-templating trimming and tailing combinations on the 5' end

Biologically, 5' modifications are very rare/non-existent. This removes reads that supposedly have these modifications. Keeps isomirs that are a result of imprecise cleavage. False positive: Also keeps potentially 'fake' isomirs that have non-templating modifications beyond the extent of the provided consensus mirna.

(2) Have a mutation 3nt from the consensus end (shares rest of end)

**Most likely to be sequencing errors rather than trimming/tailing if 3nt or more are tailed that match the canonical sequence (1/4^3 = 1.5%).

Quagmir also has an optional flag (off by default) to skip reads that:

(3) Were reported to be an isomir of a previously reported miRNA

Biologically, a read can only belong to one miRNA

Notes

The step of collapsing sample files takes the longest time, but once the samples are collapsed, and you need to re-run the pipeline, the pipeline will automatically start from the collapsed files and take a far shorter amount of time.

To save time you may consider to skip distance metric (weighted-levenstein distance) which is calculated on group results.

Output will be a sample.fastq.results.txt file for each sample in the results/ folder and grouped results if multiple files are provided. All resulting files are in tabular (TSV) format.

Planned features

  • Have a provided primary miRNA transcript for each miRNA either in motif-consensus as well as gen-uniq-subseq, or have it pre-provided in local files. Then implement features to smartly filter out reads that are not isomiRs of a miRNA because they have statistically improbable behavior (e.g 5p non-templating addition) or sequencing errors.
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].