All Projects → OpenGene → OpenGene.jl

OpenGene / OpenGene.jl

Licence: other
(No maintenance) OpenGene, core libraries for NGS data analysis and bioinformatics in Julia

Programming Languages

julia
2034 projects

Projects that are alternatives of or similar to OpenGene.jl

Migmap
HTS-compatible wrapper for IgBlast V-(D)-J mapping tool
Stars: ✭ 38 (-36.67%)
Mutual labels:  bioinformatics, ngs
Ugene
UGENE is free open-source cross-platform bioinformatics software
Stars: ✭ 112 (+86.67%)
Mutual labels:  bioinformatics, ngs
Gatk
Official code repository for GATK versions 4 and up
Stars: ✭ 1,002 (+1570%)
Mutual labels:  bioinformatics, ngs
SVCollector
Method to optimally select samples for validation and resequencing
Stars: ✭ 20 (-66.67%)
Mutual labels:  bioinformatics, ngs
Afterqc
Automatic Filtering, Trimming, Error Removing and Quality Control for fastq data
Stars: ✭ 169 (+181.67%)
Mutual labels:  bioinformatics, ngs
Fusiondirect.jl
(No maintenance) Detect gene fusion directly from raw fastq files
Stars: ✭ 23 (-61.67%)
Mutual labels:  bioinformatics, ngs
catch
A package for designing compact and comprehensive capture probe sets.
Stars: ✭ 55 (-8.33%)
Mutual labels:  bioinformatics, ngs
Deeptools
Tools to process and analyze deep sequencing data.
Stars: ✭ 448 (+646.67%)
Mutual labels:  bioinformatics, ngs
Fgbio
Tools for working with genomic and high throughput sequencing data.
Stars: ✭ 166 (+176.67%)
Mutual labels:  bioinformatics, ngs
Scde
R package for analyzing single-cell RNA-seq data
Stars: ✭ 147 (+145%)
Mutual labels:  bioinformatics, ngs
Manorm
A robust model for quantitative comparison of ChIP-Seq data sets.
Stars: ✭ 16 (-73.33%)
Mutual labels:  bioinformatics, ngs
PHAT
Pathogen-Host Analysis Tool - A modern Next-Generation Sequencing (NGS) analysis platform
Stars: ✭ 17 (-71.67%)
Mutual labels:  bioinformatics, ngs
Galaxy
Data intensive science for everyone.
Stars: ✭ 812 (+1253.33%)
Mutual labels:  bioinformatics, ngs
Fastp
An ultra-fast all-in-one FASTQ preprocessor (QC/adapters/trimming/filtering/splitting/merging...)
Stars: ✭ 966 (+1510%)
Mutual labels:  bioinformatics, ngs
Htslib
C library for high-throughput sequencing data formats
Stars: ✭ 529 (+781.67%)
Mutual labels:  bioinformatics, ngs
Bioconvert
Bioconvert is a collaborative project to facilitate the interconversion of life science data from one format to another.
Stars: ✭ 112 (+86.67%)
Mutual labels:  bioinformatics, ngs
platon
Identification & characterization of bacterial plasmid-borne contigs from short-read draft assemblies.
Stars: ✭ 52 (-13.33%)
Mutual labels:  bioinformatics, ngs
Jvarkit
Java utilities for Bioinformatics
Stars: ✭ 313 (+421.67%)
Mutual labels:  bioinformatics, ngs
Ngless
NGLess: NGS with less work
Stars: ✭ 115 (+91.67%)
Mutual labels:  bioinformatics, ngs
Deepvariant
DeepVariant is an analysis pipeline that uses a deep neural network to call genetic variants from next-generation DNA sequencing data.
Stars: ✭ 2,404 (+3906.67%)
Mutual labels:  bioinformatics, ngs

OpenGene

OpenGene.jl project aims to provide basic functions and rich utilities to analyze sequencing data, with the beautiful language Julia

If you want to be an author of OpenGene, please open an issue, or make a pull request.

If you are looking for BAM/SAM read/write, see OpenGene/HTSLIB
Bug reports and feature requests, please file an issue

Julia

Julia is a fresh programming language with C/C++ like performance and Python like simple usage
On Ubuntu, you can install Julia by sudo apt-get install julia, and type julia to open Julia interactive prompt. Details to install Julia is at platform specific instructions.

Add OpenGene

# run on Julia REPL
Pkg.add("OpenGene")

If you want to get the latest dev version of OpenGene (not for beginners)

Pkg.checkout("OpenGene")

This project is under active developing, remember to update it to get newest features:

Pkg.update()

Examples

sequence operation

julia> using OpenGene

julia> seq = dna("AAATTTCCCGGGATCGATCGATCG")
dna:AAATTTCCCGGGATCGATCGATCG
# reverse complement operator
julia> ~seq
dna:CGATCGATCGATCCCGGGAAATTT
# transcribiton, note that seq is treated as coding sequence, not template sequence
# so this operation only changes T to U
julia> transcribe(seq)
rna:CGAUCGAUCGAUCCCGGGAAAUUU

read/write a single fastq/fasta file

using OpenGene

istream = fastq_open("input.fastq.gz")
ostream = fastq_open("output.fastq.gz","w")

# fastq_read will return an object FastqRead {name, sequence, strand, quality}
# fastq_write can write a FastqRead into a ouput stream
while (fq = fastq_read(istream))!=false
    fastq_write(ostream, fq)
end

close(ostream)

fasta is supported similarly with fasta_open, fasta_read and fasta_write

read/write a pair of fastq files

using OpenGene

istream = fastq_open_pair("R1.fastq.gz", "R2.fastq.gz")
ostream = fastq_open_pair("Out.R1.fastq.gz","Out.R2.fastq.gz","w")

# fastq_read_pair will return a pair of FastqRead {read1, read2}
# fastq_write_pair can write this pair to two files
while (pair = fastq_read_pair(istream))!=false
    fastq_write_pair(ostream, pair)
end

close(ostream)

read/write a bed file

using OpenGene

# read all records, return an array of Intervals(chrom, chromstart, chromend)
intervals = bed_read_intervals("in.bed")
# write all records
bed_write_intervals("out.bed",intervals)

read/write a VCF

using OpenGene

# load the entire VCF data into a vcf object, which has a .header field and a .data field
vcfobj = vcf_read("in.vcf")
# write the vcf object into a file
vcf_write("out.vcf", vcfobj)

VCF Operations

using OpenGene

v1 = vcf_read("v1.vcf")
v2 = vcf_read("v2.vcf")

# merge by positions
v_merge = v1 + v2

# intersect by positions
v_intersect = v1 * v2

# remove v2 records from v1, by positions
v_minus = v1 - v2

read/write a GTF

using OpenGene

# load the gtf header and data
gtfobj = gtf_read("in.gtf")

# write the gtf object into a file
gtf_write("out.gtf", gtfobj)

# if the file is too big, use following to load header only
gtfobj, stream = gtf_read("in.gtf", loaddata = false)
while (row = gtf_read_row(stream)) != false
    # do something with row ...
end

locate the gene/exon/intron

using OpenGene, OpenGene.Reference

# load the gencode dataset, it will download a file from gencode website if it's not downloaded before
# once it's loaded, it will be cached so future loads will be fast
index = gencode_load("GRCh37")

# locate which gene chr:pos is in
gencode_locate(index, "chr5", 149526621)
# it will return
# 1-element Array{Any,1}:
#  Dict{ASCIIString,Any}("gene"=>"PDGFRB","number"=>1,"transcript"=>"ENST00000261799.4","type"=>"intron")
genes = gencode_genes(index, "TP53")
# return an array with only one record
genes[1].name, genes[1].chr, genes[1].start_pos, genes[1].end_pos
# ("TP53","chr17",7565097,7590856)

access assembly (hg19/hg38)

julia> using OpenGene

julia> using OpenGene.Reference

julia> hg19 = load_assembly("hg19")
# Dict{ASCIIString,OpenGene.FastaRead} with 93 entries:

julia> hg19["chr17"]
# >chr17
# dna:AAGCTTCTCACCCTGTTCCTGCATAGATAATTGCATGACA......agggtgtgggtgtgggtgtgggtgtgggtgtggtgtgtgggtgtgggtgtgGT

julia> hg19["chr17"].sequence[1:100]
# dna:AAGCTTCTCACCCTGTTCCTGCATAGATAATTGCATGACAATTGCCTTGTCCCTGCTGAATGTGCTCTGGGGTCTCTGGGGTCTCACCCACGACCAACTC

merge a pair of reads from pair-end sequencing

julia> using OpenGene, OpenGene.Algorithm

julia> r1=dna("TTTAGGCCTGTCACTGTGAACGCTATCAGCAAGCCTTTGCATGATTTTTCTCTTTCCCACTCCTACATTCTCGGTGATGACAACAACTGTAGCCTGATCCAGATATTTCGAAGTGCAACAAATCGTATTCAATATAGAGTAAGG")
dna:TTTAGGCCTGTCACTGTGAACGCTATCAGCAAGCCTTTGCATGATTTTTCTCTTTCCCACTCCTACATTCTCGGTGATGACAACAACTGTAGCCTGATCCAGATATTTCGAAGTGCAACAAATCGTATTCAATATAGAGTAAGG

julia> r2=dna("GTTAGCTATTACTGTAATCACCGCGAGACAAGTTAATGAGAGAGTTATTCATAAAACTTACTCTATATTGAATACGATTTGTAGCACATCGAAATATCTGGATCAGGCTACAGTTGTAGTCATCACCGAGAATGTAGGAGTGG")
dna:GTTAGCTATTACTGTAATCACCGCGAGACAAGTTAATGAGAGAGTTATTCATAAAACTTACTCTATATTGAATACGATTTGTAGCACATCGAAATATCTGGATCAGGCTACAGTTGTAGTCATCACCGAGAATGTAGGAGTGG

julia> offset, overlap_len, distance = overlap(r1, r2)
(56,88,4)

julia> merged = simple_merge(r1, r2, overlap_len)
dna:TTTAGGCCTGTCACTGTGAACGCTATCAGCAAGCCTTTGCATGATTTTTCTCTTTCCCACTCCTACATTCTCGGTGATGACAACAACTGTAGCCTGATCCAGATATTTCGAAGTGCAACAAATCGTATTCAATATAGAGTAAGGTTTATGAATAACTCTCTCATTAACTTGTCTCGCGGTGATTACAGTAATAGCTAAC
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].