All Projects → yyoshiaki → ikra

yyoshiaki / ikra

Licence: other
RNAseq pipeline centered on Salmon

Programming Languages

shell
77523 projects
Jupyter Notebook
11667 projects
r
7636 projects

Projects that are alternatives of or similar to ikra

ostatus2
A Ruby toolset for interacting with the OStatus suite of protocols
Stars: ✭ 29 (+61.11%)
Mutual labels:  salmon
CodeIgniter3-online-shop
A full e-commerce project built with CI3 HMVC with Email confirmation,Paypal payement , Ion Auth , Live Chatroom and full admin dashboard
Stars: ✭ 114 (+533.33%)
Mutual labels:  ion
digital-paper-edit-client
Work in progress - BBC News Labs digital paper edit project - React Client
Stars: ✭ 36 (+100%)
Mutual labels:  transcript
CliqueSNV
No description or website provided.
Stars: ✭ 13 (-27.78%)
Mutual labels:  illumina
Phomeme
Simple sentence mixing tool (work in progress)
Stars: ✭ 18 (+0%)
Mutual labels:  transcript
fq
Command line utility for manipulating Illumina-generated FastQ files.
Stars: ✭ 31 (+72.22%)
Mutual labels:  illumina
MGSE
Mapping-based Genome Size Estimation (MGSE) performs an estimation of a genome size based on a read mapping to an existing genome sequence assembly.
Stars: ✭ 22 (+22.22%)
Mutual labels:  illumina
ion-app-flutter
ion flutter app
Stars: ✭ 98 (+444.44%)
Mutual labels:  ion
obs-auto-subtitle
Show the subtitle as long as you speak
Stars: ✭ 135 (+650%)
Mutual labels:  transcript
StrobeAlign
Aligns short reads using dynamic seed size with strobemers
Stars: ✭ 49 (+172.22%)
Mutual labels:  illumina
ion-schema-kotlin
A Kotlin reference implementation of the Ion Schema Specification.
Stars: ✭ 23 (+27.78%)
Mutual labels:  ion
wengan
An accurate and ultra-fast hybrid genome assembler
Stars: ✭ 81 (+350%)
Mutual labels:  illumina
pisces
PISCES is a pipeline for rapid transcript quantitation, genetic fingerprinting, and quality control assessment of RNAseq libraries using Salmon.
Stars: ✭ 23 (+27.78%)
Mutual labels:  salmon
human genomics pipeline
A Snakemake workflow to process single samples or cohorts of paired-end sequencing data (WGS or WES) using trim galore/bwa/GATK4/parabricks.
Stars: ✭ 19 (+5.56%)
Mutual labels:  illumina
sample-sheet
A permissively licensed library designed to replace Illumina's Experiment Manager
Stars: ✭ 42 (+133.33%)
Mutual labels:  illumina
recount
R package for the recount2 project. Documentation website: http://leekgroup.github.io/recount/
Stars: ✭ 40 (+122.22%)
Mutual labels:  illumina
ion-sdk-flutter
ion flutter sdk
Stars: ✭ 98 (+444.44%)
Mutual labels:  ion
carapace-bin
multi-shell multi-command argument completer
Stars: ✭ 42 (+133.33%)
Mutual labels:  ion
fast-sg
Fast-SG: An alignment-free algorithm for ultrafast scaffolding graph construction from short or long reads.
Stars: ✭ 22 (+22.22%)
Mutual labels:  illumina
SalmonTE
SalmonTE is an ultra-Fast and Scalable Quantification Pipeline of Transpose Element (TE) Abundances
Stars: ✭ 63 (+250%)
Mutual labels:  salmon

DOI

ikra v2.0.1 -RNAseq pipeline centered on Salmon-

A gene expression table (gene × sample) is automatically created from the experiment matrix. The output can be used as an input of idep. Ikra is an RNAseq pipeline centered on salmon.

日本語ドキュメントはこちら

Note that sra-tools has to be installed locally. This is up to NCBI's tool upgrade. Please install sra-tools (>=2.10.7).

Usage

Usage: ikra.sh experiment_table.csv species \
        [--test, --fastq, --help, --without-docker, --udocker --protein-coding] \
        [--threads [VALUE]][--output [VALUE]]\
        [--suffix_PE_1 [VALUE]][--suffix_PE_2 [VALUE]]
  args
    1.experiment matrix(csv)
    2.reference(human or mouse)

Options:
  --test  test mode(MAX_SPOT_ID=100000).(dafault : False)
  --fastq use fastq files instead of SRRid. The extension must be foo.fastq.gz (default : False)
  -u, --udocker
  -w, --without-docker
  -pc, --protein-coding use protein coding transcripts instead of comprehensive transcripts. (default : True)
  -ct, --comprehensive-transcripts use comprehensive transcripts instead of protein coding transcripts. (default : False) 
  -t, --threads
  -o, --output  output file. (default : output.tsv)
  -l, --log  log file. (default : ikra.log)
  -a, --align carry out mapping onto a reference genome. hisat2 or star (default : None)
  -g, --gencode specify the version of gencode. (defalut : Mouse=26, Human=37)
  -s1, --suffix_PE_1    suffix for PE fastq files. (default : _1.fastq.gz)
  -s2, --suffix_PE_2    suffix for PE fastq files. (default : _2.fastq.gz)
  -h, --help    Show usage.
  -v, --version Show version.
  -r, --remove-intermediates Remove intermediate files
  • test option limits the number of reads to 100,000 in each sample.
  • udocker mode is for server environments that can only use User privileges. For more information https://github.com/indigo-dc/udocker.
  • without-docker mode works with all tools installed. Not recommended.
  • protein-coding mode restricts genes to protein coding genes only.
  • threads
  • output is output.tsv by default.
  • align mode generates genome-mapped bam and bigwig files. Note that Salmon works quasi-alignment mode similarly as no-align mode. Generated bw files can be visualized on IGV or other genome browsers. experiment matrix should be separated by commas (csv format).

SRR mode

name SRR Layout condition1 (optional) ...
Treg_LN_1 SRR5385247 SE Treg ...
Treg_LN_2 SRR5385248 SE Treg ...

fastq mode

name fastq(PREFIX) Layout condition1 (optional) ...
Treg_LN_1 hoge/SRR5385247 SE Treg ...
Treg_LN_2 hoge/SRR5385248 SE Treg ...
  • Denote names by connecting conditions and replicates with underscores. See idep's Naming convention in detail.
  • The first three columns are required.
  • If you want to use your own fastq file, add --fastq option. Ikra supports only .fq, .fq.gz, .fastq and fastq.gz.
  • fastq file specifies path excluding fastq.gz or _1.fastq.gz and _2.fastq.gz. For example, hoge/SRR5385247.fastq.gz is described as hoge/SRR5385247.
  • If suffix is not _1.fastq.gz or _2.fastq.gz, add -s1 and -s2 options.
  • It is impossible for docker to specify a hierarchy above the execution directory, such as ../fq/**.fastq.gz, but it can be avoided by pasting a symbolic link. bonohu blog

Output

  • output.tsv(scaledTPM)
  • multiqc_report.html : including fastQC reports and mapping rate of salmon(mapping rate for transcripts)

output sample

Treg_LN_1 Treg_LN_2
0610005C13Rik 0 0
0610006L08Rik 0 1
0610009B22Rik 4 10
...

Specification

Major bugs that have fixed

tximport_R.R 2019/04/30

A serious bug was reported in the tximport_R.R and fixed. In the older version, Salmon's output and multiqc reports were correct and sometimes output.tsv were disturbed. Please update Ikra to the latest version. If you are using the old version(<1.1.1), please update and re-run ikra. We apologize for the inconvenience.

fasterq-dump error 2019/09/21

A bug has been reported that stops processing due to the following error in sra-tools. docker: Error response from daemon: OCI runtime create failed: container_linux.go:345: starting container process caused "exec: \"fasterq-dump\": executable file not found in $PATH": unknown. The latest version has already been corrected, so if you encounter the same error, please update to the latest version.

Install

All you need is git clone ikra, and install docker or udocker(v1.1.3). No need for installing plenty of softwares! If you don’t want to use docker (or udocker), you must install all softwares by yourself and use —-without-docker option.

$ git clone https://github.com/yyoshiaki/ikra.git

if you use SRR mode, install sra-toolkit locally.

Upgrade

$ git pull origin master

Version

 $ bash ikra.sh --version
 ...
 ikra v2.0.1 -RNAseq pipeline centered on Salmon-
 ...

Version of tools

  • sra-tools : 2.10.9
  • FastQC : 0.11.9
  • MultiQC : 0.10.1
  • Trim Galore! : 0.6.7
  • Salmon : 1.4.0
  • tximport : 1.6.0
  • STAR : 2.7.8a
  • Hisat2 : 2.2.1
  • sambamba : 0.8.0
  • deeptools : 3.5.1

Version of reference genome (when using alignment option)

  • mouse:mm10 (GRCm38)
  • human:hg38 (GRCh38)

Test

SE

SRR mode

$ cd test/Illumina_SE && bash ../../ikra.sh Illumina_SE_SRR.csv mouse --test -t 10

fastq mode

You can execute it after you execute SRR mode. (That is because you don’t have fastq files.)

$ cd test/Illumina_SE && bash ../../ikra.sh Illumina_SE_fastq.csv mouse --fastq -t 10

PE

SRR mode

$ cd test/Illumina_PE && bash ../../ikra.sh Illumina_PE_SRR.csv mouse --test -t 10

fastq mode

You can execute it after you execute SRR mode. (That is because you don’t have fastq files.)

$ cd test/Illumina_PE && bash ../../ikra.sh Illumina_PE_fastq.csv mouse --fastq -t 10

test all (for developers)

cd test && bash test.sh && bash test.full.sh

For Mac Users

Dr.Ota(DBCLS) solved the problem that salmon doesn’t work on Mac. The cause of the problem is that Docker is allocated only 2GB by default on Mac. The problem will be solved by allocating sufficient memory space(>=8Gb) for Docker, and applying and restarting Docker.

img img

ikra pipeline

Tips

You can find SRR data so quickly in http://sra.dbcls.jp/

Q&A

  • In exporting output.tsv to iDEP, which data type should I select?

When iDEP reads output.tsv, please put a check to Read counts data.

Issue

Please refer to issue

Releases

Please refer to Relases

  • add support for udocker
  • add setting of species
  • gtf and transcript file from GENCODE
  • salmon
  • trimmomatic(legacy)
  • trim_galore!
  • tximport
  • fastxtools(for Ion)
  • judging fastq or SRR(manual)
  • introduce "salmon gcbias correction"
  • salomn validateMappings
  • pigz(multithread version of gzip)
  • fasterq-dump
  • cwl development is in progress
  • rename to "ikra"
  • protein coding option

Legacy

Moved the flow using trimmomatic to ./legacy

Reference

Development of cwl ver.

2019/03/22 https://youtu.be/weJrq5QNt1M We tried developing it because Mr.Michael visited Japan. For now, cwlnized trim_galore and salmon in PE.

cd test/cwl_PE && bash test.sh

sorce and reference ー cwl_tools

Citation

Hiraoka Yu, Yamada Kohki, Ryuichiro Yamsasaki, YusukeKawasaki, Kitabatake Ryoko, Matsumoto Yasunari, Ishikawa Kaito, Umezu Yuto, Hirose Haruka, & Yoshiaki Yasumizu. (2021). yyoshiaki/ikra: ikra v2.0.1 (v2.0.1). Zenodo. https://doi.org/10.5281/zenodo.5541399

Licence (Updated in Ver. 2.0)

This software is freely available for academic users. Usage for commercial purposes is not allowed. Please refer to the LICENCE page. If you are not an academic user, please contact to the author.

Contact

Yoshiaki Yasumizu, M.D. [email protected]

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].