All Projects → OpenGene → Fusiondirect.jl

OpenGene / Fusiondirect.jl

Licence: other
(No maintenance) Detect gene fusion directly from raw fastq files

Programming Languages

julia
2034 projects

Projects that are alternatives of or similar to Fusiondirect.jl

catch
A package for designing compact and comprehensive capture probe sets.
Stars: ✭ 55 (+139.13%)
Mutual labels:  bioinformatics, ngs
Galaxy
Data intensive science for everyone.
Stars: ✭ 812 (+3430.43%)
Mutual labels:  bioinformatics, ngs
SVCollector
Method to optimally select samples for validation and resequencing
Stars: ✭ 20 (-13.04%)
Mutual labels:  bioinformatics, ngs
Manorm
A robust model for quantitative comparison of ChIP-Seq data sets.
Stars: ✭ 16 (-30.43%)
Mutual labels:  bioinformatics, ngs
platon
Identification & characterization of bacterial plasmid-borne contigs from short-read draft assemblies.
Stars: ✭ 52 (+126.09%)
Mutual labels:  bioinformatics, ngs
PHAT
Pathogen-Host Analysis Tool - A modern Next-Generation Sequencing (NGS) analysis platform
Stars: ✭ 17 (-26.09%)
Mutual labels:  bioinformatics, ngs
reg-gen
Regulatory Genomics Toolbox: Python library and set of tools for the integrative analysis of high throughput regulatory genomics data.
Stars: ✭ 64 (+178.26%)
Mutual labels:  bioinformatics, ngs
Scde
R package for analyzing single-cell RNA-seq data
Stars: ✭ 147 (+539.13%)
Mutual labels:  bioinformatics, ngs
ctdna-pipeline
A simplified pipeline for ctDNA sequencing data analysis
Stars: ✭ 29 (+26.09%)
Mutual labels:  bioinformatics, ngs
gencore
Generate duplex/single consensus reads to reduce sequencing noises and remove duplications
Stars: ✭ 91 (+295.65%)
Mutual labels:  bioinformatics, ngs
Deepvariant
DeepVariant is an analysis pipeline that uses a deep neural network to call genetic variants from next-generation DNA sequencing data.
Stars: ✭ 2,404 (+10352.17%)
Mutual labels:  bioinformatics, ngs
Deeptools
Tools to process and analyze deep sequencing data.
Stars: ✭ 448 (+1847.83%)
Mutual labels:  bioinformatics, ngs
Afterqc
Automatic Filtering, Trimming, Error Removing and Quality Control for fastq data
Stars: ✭ 169 (+634.78%)
Mutual labels:  bioinformatics, ngs
atropos
An NGS read trimming tool that is specific, sensitive, and speedy. (production)
Stars: ✭ 109 (+373.91%)
Mutual labels:  bioinformatics, ngs
Fgbio
Tools for working with genomic and high throughput sequencing data.
Stars: ✭ 166 (+621.74%)
Mutual labels:  bioinformatics, ngs
OpenGene.jl
(No maintenance) OpenGene, core libraries for NGS data analysis and bioinformatics in Julia
Stars: ✭ 60 (+160.87%)
Mutual labels:  bioinformatics, ngs
Ugene
UGENE is free open-source cross-platform bioinformatics software
Stars: ✭ 112 (+386.96%)
Mutual labels:  bioinformatics, ngs
Ngless
NGLess: NGS with less work
Stars: ✭ 115 (+400%)
Mutual labels:  bioinformatics, ngs
peppy
Project metadata manager for PEPs in Python
Stars: ✭ 29 (+26.09%)
Mutual labels:  bioinformatics, ngs
Jvarkit
Java utilities for Bioinformatics
Stars: ✭ 313 (+1260.87%)
Mutual labels:  bioinformatics, ngs

FusionDirect

detect gene fusion directly from fastq files, written in Julia language

Features

  • no alignment needed, it just reads fastq files of pair sequencing
  • output fusion pattern (gene and position), along with the reads support this fusion
  • ultra sensitive, comparing to delly, factera or other tools
  • output file is a standard fasta file, which can be used to verify fusions using blast or other tools
  • very suitable for detecting fusions from cancer target sequencing data (exom seq or panel seq)

Julia

Julia is a fresh programming language with C/C++ like performance and Python like simple usage
On Ubuntu, you can install Julia by sudo apt-get install julia, and type julia to open Julia interactive prompt

Install FusionDirect

# from Julia REPL
Pkg.add("FusionDirect")

Use FusionDirect as a package

using FusionDirect

# the reference folder, which contains chr1.fa, chr2fa...
# download from http://hgdownload.cse.ucsc.edu/goldenPath/hg19/bigZips/chromFa.tar.gz and gunzip it
ref = "/opt/ref/hg19chr"
# a gene list with their coordination intervals, see the example bed files in data folder
bed = Pkg.dir("FusionDirect") * "/data/test_panel.bed"
read1 = "R1.fq.gz"
read2 = "R2.fq.gz"
detect(ref, bed, read1, read2)

Use FusionDirect as a standalone script from commandline

copy src/fusion.jl to anywhere you want, run

julia fusion.jl -f <REF_FILE_OR_FOLDER> -b <BED_FILE> -l <READ1_FILE> -r <READ2_FILE> > output.fa
# here gives an example 
# (hg19chr is downloaded and gunzipped from http://hgdownload.cse.ucsc.edu/goldenPath/hg19/bigZips/chromFa.tar.gz )
julia fusion.jl -f ~/hg19chr -b ~/.julia/v0.5/FusionDirect/data/lung_cancer_hg19.bed -l R1.fq -r R2.fq > ourput.fa

Get the reference

Can be downloaded from http://hgdownload.cse.ucsc.edu/goldenPath/hg19/bigZips/chromFa.tar.gz
You should run gunzip chromFa.tar.gz then pass the folder contains fa files to -f <REF>

Prepare the bed

A bed file to give a gene list (chr, start, end, genename), it usually includes the gene panel of your target sequencing and other genes you have interest (like EML4). You can use data/lung_cancer_hg19.bed if you don't know how to make it.
Here gives an example:

chr9    133588266   133763062   ABL1
chr14   105235686   105262088   AKT1
chr19   40736224    40791443    AKT2
chr2    29415640    30144432    ALK
chrX    66764465    66950461    AR
chr11   108093211   108239829   ATM
chr3    142168077   142297668   ATR
chr2    111876955   111926024   BCL2L11
chr7    140419127   140624564   BRAF
chr17   41196312    41277500    BRCA1
chr2    42396490    42559688    EML4

Understand the output

  • fasta: The output is a standard fasta, which can be directly used to double check these fusions with blast(http://blast.ncbi.nlm.nih.gov/Blast.cgi?PROGRAM=blastn&PAGE_TYPE=BlastSearch&LINK_LOC=blasthome)
  • duplication number: the first nubmer after > is the number of duplicated reads (including this displaying read), so it is at least 1.
  • fusion_site: The followed word can be merged, read1, read2 or crosspair, which means the fusion is detected on merged sequence, read1, read2 or read1/read2 are not on same contig.
  • conjunct_pos: the number after fusion_site, which means in which base the fusion happens. If fusion_site is merged, then the number is according to the merged sequence. If fusion_site is crosspair, then this value is set 0.
  • fusion_genes: following conjunct_pos, the two fusion genes, intron/exon number and the global fusion coordinations are given. + or - means forward strand or reverse strand. Note that fusion is on double strand DNA, so both + and - can exist on same fusion.
  • original_reads: original reads are given for read1/read2. See /1 or /2 in the tail of read name.
  • merged_sequence: if the pair of reads can be merged automatically, the fusion detection is done on the merged sequence. In this case, merged sequence is given with /merge in the tail of its read name.
#Fusion:ALK-EML4 (total: 3, unique: 2)
>2_merged_120_ALK:intron:19|+chr2:29446598_EML4:exon:21|-chr2:42553364/1
AATTGAACCTGTGTATTTATCCTCCTTAAGCTAGATTTCCATCATACTTAGAAATACTAATAAAATGATTAAAGAAGGTGTGTCTTTAATTGAAGCATGATTTAAAGTAAATGCAAAGCTATGTCGTCCAATCAATGTCCTTACAATC
>2_merged_120_ALK:intron:19|+chr2:29446598_EML4:exon:21|-chr2:42553364/2
GCTGCAAACTAATCAGGAATCGATCGGATTGTAAGGACATTGATTGGACGACATAGCTTTGCATTTACTTAAAATCATGCTTCAATTAAAGACACACCTTCTTTAATCATTTTATTAGTATTTCTAAGTATGATGGAAATCTATCTTAA
>2_merged_120_ALK:intron:19|+chr2:29446598_EML4:exon:21|-chr2:42553364/merged
AATTGAACCTGTGTATTTATCCTCCTTAAGCTAGATTTCCATCATACTTAGAAATACTAATAAAATGATTAAAGAAGGTGTGTCTTTAATTGAAGCATGATTTAAAGTAAATGCAAAGCTATGTCGTCCAATCAATGTCCTTACAATCCGATCGATTCCTGATTAGTTTGCAGC
>1_merged_60_ALK:intron:19|+chr2:29446598_EML4:exon:21|-chr2:42553364/1
TAAAATGATTAAAGAAGGTGTGTCTTTAATTGAAGCATGATTTAAAGTAAATGCAAAGCTATGTCGTCCAATCAATGTCCTTACAATCCGATCGATTCCTGATTAGTTTGCAGCCATTTGGAATGTCCCCTTTAAATTTAGAAACAG
>1_merged_60_ALK:intron:19|+chr2:29446598_EML4:exon:21|-chr2:42553364/2
GTAAAAGTGGCTAGTTTGAATCAAGATGCACTTTCAAATACATTTGTACACAAGCACTATGATTATACTTCCTGTTTCTAAATTTAAAGGGGACATTCCAAATGGCTGCAAACTAATCAGGAATCGATCGGATTGTAAGGACATTGATT
>1_merged_60_ALK:intron:19|+chr2:29446598_EML4:exon:21|-chr2:42553364/merged
TAAAATGATTAAAGAAGGTGTGTCTTTAATTGAAGCATGATTTAAAGTAAATGCAAAGCTATGTCGTCCAATCAATGTCCTTACAATCCGATCGATTCCTGATTAGTTTGCAGCCATTTGGAATGTCCCCTTTAAATTTAGAAACAGGAAGTATAATCATAGTGCTTGTGTACAAATGTATTTGAAAGTGCATCTTGATTCAAACTAGCCACTTTTAC
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].