All Projects → urmi-21 → orfipy

urmi-21 / orfipy

Licence: MIT License
Fast and flexible ORF finder

Programming Languages

python
139335 projects - #7 most used programming language
shell
77523 projects

Projects that are alternatives of or similar to orfipy

Biopython
Official git repository for Biopython (originally converted from CVS)
Stars: ✭ 2,936 (+10774.07%)
Mutual labels:  bioinformatics, dna, protein
Sns
Analysis pipelines for sequencing data
Stars: ✭ 43 (+59.26%)
Mutual labels:  bioinformatics, dna
Gatk
Official code repository for GATK versions 4 and up
Stars: ✭ 1,002 (+3611.11%)
Mutual labels:  bioinformatics, dna
Deepvariant
DeepVariant is an analysis pipeline that uses a deep neural network to call genetic variants from next-generation DNA sequencing data.
Stars: ✭ 2,404 (+8803.7%)
Mutual labels:  bioinformatics, dna
catch
A package for designing compact and comprehensive capture probe sets.
Stars: ✭ 55 (+103.7%)
Mutual labels:  bioinformatics, dna
Nucleus
Python and C++ code for reading and writing genomics data.
Stars: ✭ 657 (+2333.33%)
Mutual labels:  bioinformatics, dna
Ugene
UGENE is free open-source cross-platform bioinformatics software
Stars: ✭ 112 (+314.81%)
Mutual labels:  bioinformatics, dna
dna-traits
A fast 23andMe genome text file parser, now superseded by arv
Stars: ✭ 64 (+137.04%)
Mutual labels:  bioinformatics, dna
dnapacman
waka waka
Stars: ✭ 15 (-44.44%)
Mutual labels:  dna, protein
lightdock
Protein-protein, protein-peptide and protein-DNA docking framework based on the GSO algorithm
Stars: ✭ 110 (+307.41%)
Mutual labels:  dna, protein
FluentDNA
FluentDNA allows you to browse sequence data of any size using a zooming visualization similar to Google Maps. You can use FluentDNA as a standalone program or as a python module for your own bioinformatics projects.
Stars: ✭ 52 (+92.59%)
Mutual labels:  dna, protein
Khmer
In-memory nucleotide sequence k-mer counting, filtering, graph traversal and more
Stars: ✭ 640 (+2270.37%)
Mutual labels:  bioinformatics, dna
Pyfaidx
Efficient pythonic random access to fasta subsequences
Stars: ✭ 307 (+1037.04%)
Mutual labels:  bioinformatics, dna
Galaxy
Data intensive science for everyone.
Stars: ✭ 812 (+2907.41%)
Mutual labels:  bioinformatics, dna
Bio.jl
[DEPRECATED] Bioinformatics and Computational Biology Infrastructure for Julia
Stars: ✭ 257 (+851.85%)
Mutual labels:  bioinformatics, dna
Genomics
A collection of scripts and notes related to genomics and bioinformatics
Stars: ✭ 101 (+274.07%)
Mutual labels:  bioinformatics, dna
awesome-genetics
A curated list of awesome bioinformatics software.
Stars: ✭ 60 (+122.22%)
Mutual labels:  bioinformatics, dna
PHAT
Pathogen-Host Analysis Tool - A modern Next-Generation Sequencing (NGS) analysis platform
Stars: ✭ 17 (-37.04%)
Mutual labels:  bioinformatics, dna
BuddySuite
Bioinformatics toolkits for manipulating sequence, alignment, and phylogenetic tree files
Stars: ✭ 106 (+292.59%)
Mutual labels:  dna, protein
naf
Nucleotide Archival Format - Compressed file format for DNA/RNA/protein sequences
Stars: ✭ 35 (+29.63%)
Mutual labels:  dna, protein

Build Status PyPI - Python Version install with bioconda install with bioconda PyPI Downloads publication

Introduction

orfipy is a tool written in python/cython to extract ORFs in an extremely and fast and flexible manner. Other popular ORF searching tools are OrfM and getorf. Compared to OrfM and getorf, orfipy provides the most options to fine tune ORF searches. orfipy uses multiple CPU cores and is particularly faster for data containing multiple smaller fasta sequences such as de-novo transcriptome assemblies. Please read the paper here.

Please cite as: Urminder Singh, Eve Syrkin Wurtele, orfipy: a fast and flexible tool for extracting ORFs, Bioinformatics, 2021;, btab090, https://doi.org/10.1093/bioinformatics/btab090

Installation

Install latest stable version

pip install orfipy

Or install via conda

conda config --add channels defaults
conda config --add channels bioconda
conda config --add channels conda-forge

conda create -n orfipy -c bioconda orfipy

Install the development version from source

git clone https://github.com/urmi-21/orfipy.git
cd orfipy
pip install .

or use pip

pip install git+git://github.com/urmi-21/orfipy.git

Examples

Details of orfipy algorithm are in the paper. Please go through the SI if you are interested to know differences between orfipy and other ORF finder tools and how to set orfipy parameters to match the output of other tools.

Below are some usage examples for orfipy

To see full list of options use the command:

orfipy -h

Input

orfipy version 0.0.3 and above, supports sequences in Fasta/Fastq format (orfipy uses pyfastx). Input files can be in .gz format.

Extract ORF sequences and write ORF sequences in orfs.fa file

orfipy input.fasta --dna orfs.fa --min 10 --max 10000 --procs 4 --table 1 --outdir orfs_out

Use standard codon table but use only ATG as start codon

orfipy input.fa.gz --dna orfs.fa --start ATG

Note: Users can also provide their own translation table, as a .json file, to orfipy using --table option. Example of json file containing a valid translation table is here

See available codon tables

orfipy --show-table

Extract ORFs BED file

orfipy input.fasta --bed orfs.bed --min 50 --procs 4
or
orfipy input.fasta --min 50 --procs 4 > orfs.bed 

Extract ORFs BED12 file

Note: Add --include-stop for orfipy output to be consistent with Transdecoder.Predict output .bed file.

orfipy testseq.fa --min 100 --bed12 of.bed --partial-5 --partial-3 --include-stop

Extract ORFs peptide sequences using default translation table

orfipy input.fasta --pep orfs_peptides.fa --min 50 --procs 4

API

Users can directly import the ORF search algorithm, written in cython, in their python ecosystem.

>>> import orfipy_core 
>>> seq='ATGCATGACTAGCATCAGCATCAGCAT'
>>> for start,stop,strand,description in orfipy_core.orfs(seq,minlen=3,maxlen=1000):
...     print(start,stop,strand,description)
... 
0 9 + ID=Seq_ORF.1;ORF_type=complete;ORF_len=9;ORF_frame=1;Start:ATG;Stop:TAG

orfipy_core.orfs function can take following arguments

  • seq: Required input sequence (str)
  • name ['Seq'] Name (str)
  • minlen [0] min length (int)
  • maxlen [1000000] max length (int)
  • strand ['b'] Strand to use, (b)oth, (f)wd or (r)ev (char)
  • starts ['TTG','CTG','ATG'] Start codons to use (list)
  • stops=['TAA','TAG','TGA'] Stop codons to use (list)
  • include_stop [False] Include stop codon in ORF (bool)
  • partial3 [False] Report ORFs without a stop (bool)
  • partial5 [False] Report ORFs without a start (bool)
  • between_stops [False] Report ORFs defined as between stops (bool)

Comparison with getorf and OrfM

Comparison of orfipy features and performance with getorf and OrfM. Tools were run on different data and ORFs were output to both nucleotide and peptide Fasta files (fasta), only peptide Fasta (peptide) and BED (bed). For details see the publication and SI

  • orfipy is most flexible, particularly faster for data containing multiple smaller fasta sequences such as de-novo transcriptome assemblies or collection of microbial genomes.
  • OrfM is fast (faster for Fastq), uses less memory, but ORF search options are limited
  • getorf is memory efficient but slower, no Fastq support. Provides some flexibility in ORF searches.

Funding

This work is funded in part by the National Science Foundation award IOS 1546858, "Orphan Genes: An Untapped Genetic Reservoir of Novel Traits". This work used the Extreme Science and Engineering Discovery Environment (XSEDE), which is supported by National Science Foundation grant number ACI-1548562 (Bridges HPC environment through allocations TG-MCB190098 and TG-MCB200123 awarded from XSEDE and HPC Consortium).

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].