All Projects → jdrudolph → Goenrich

jdrudolph / Goenrich

GO enrichment with python -- pandas meets networkx

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Goenrich

Terpene Profile Parser For Cannabis Strains
Parser and database to index the terpene profile of different strains of Cannabis from online databases
Stars: ✭ 63 (-21.25%)
Mutual labels:  bioinformatics
Coursera Specializations
Solutions to assignments of Coursera Specializations - Deep learning, Machine learning, Algorithms & Data Structures, Image Processing and Python For Everybody
Stars: ✭ 72 (-10%)
Mutual labels:  bioinformatics
Fastq.bio
An interactive web tool for quality control of DNA sequencing data
Stars: ✭ 76 (-5%)
Mutual labels:  bioinformatics
Gubbins
Rapid phylogenetic analysis of large samples of recombinant bacterial whole genome sequences using Gubbins
Stars: ✭ 67 (-16.25%)
Mutual labels:  bioinformatics
Globalbioticinteractions
Global Biotic Interactions provides access to existing species interaction datasets
Stars: ✭ 71 (-11.25%)
Mutual labels:  bioinformatics
Startapp
The START App: R Shiny Transcriptome Analysis Resource Tool
Stars: ✭ 73 (-8.75%)
Mutual labels:  bioinformatics
Qiime16stutorial
A tutorial on methods of 16S analysis with QIIME 1
Stars: ✭ 59 (-26.25%)
Mutual labels:  bioinformatics
Mygene.info
MyGene.info: A BioThings API for gene annotations
Stars: ✭ 79 (-1.25%)
Mutual labels:  bioinformatics
Awesome Expression Browser
😎 A curated list of software and resources for exploring and visualizing (browsing) expression data 😎
Stars: ✭ 72 (-10%)
Mutual labels:  bioinformatics
Oswitch
Provides access to complex Bioinformatics software (even BioLinux!) in just one command.
Stars: ✭ 75 (-6.25%)
Mutual labels:  bioinformatics
Arcs
🌈Scaffold genome sequence assemblies using linked read sequencing data
Stars: ✭ 67 (-16.25%)
Mutual labels:  bioinformatics
Bcalm
compacted de Bruijn graph construction in low memory
Stars: ✭ 69 (-13.75%)
Mutual labels:  bioinformatics
Flowr
Robust and efficient workflows using a simple language agnostic approach
Stars: ✭ 73 (-8.75%)
Mutual labels:  bioinformatics
Gramtools
Genome inference from a population reference graph
Stars: ✭ 65 (-18.75%)
Mutual labels:  bioinformatics
Sibeliaz
A fast whole-genome aligner based on de Bruijn graphs
Stars: ✭ 76 (-5%)
Mutual labels:  bioinformatics
Lambda
LAMBDA – the Local Aligner for Massive Biological DatA
Stars: ✭ 59 (-26.25%)
Mutual labels:  bioinformatics
Bgt
Flexible genotype query among 30,000+ samples whole-genome
Stars: ✭ 72 (-10%)
Mutual labels:  bioinformatics
Svtyper
Bayesian genotyper for structural variants
Stars: ✭ 79 (-1.25%)
Mutual labels:  bioinformatics
Biosequences.jl
Biological sequences for the julia language
Stars: ✭ 77 (-3.75%)
Mutual labels:  bioinformatics
Plass
Protein-Level ASSembler (PLASS): sensitive and precise protein assembler
Stars: ✭ 74 (-7.5%)
Mutual labels:  bioinformatics

goenrich

.. image:: https://badges.gitter.im/Join%20Chat.svg :target: https://gitter.im/jdrudolph/goenrich?utm_source=badge&utm_medium=badge&utm_campaign=pr-badge&utm_content=badge

.. image:: https://readthedocs.org/projects/goenrich/badge/?version=latest :target: https://goenrich.readthedocs.org/en/latest

.. image:: https://travis-ci.org/jdrudolph/goenrich.svg?branch=master :target: https://travis-ci.org/jdrudolph/goenrich

Convenient GO enrichments from python. For use in python projects.

#. Builds the GO-ontology graph #. Propagates GO-annotations up the graph #. Performs enrichment test for all categories #. Performs multiple testing correction #. Allows for export to pandas for processing and graphviz for visualization

Installation

| Install package from pypi and download ontology and needed annotations.

.. code:: shell

pip install goenrich
mkdir db
# Ontology
wget http://purl.obolibrary.org/obo/go/go-basic.obo -O db/go-basic.obo
# UniprotACC
wget http://geneontology.org/gene-associations/goa_human.gaf.gz -O db/gene_association.goa_human.gaf.gz
# Yeast SGD
wget http://downloads.yeastgenome.org/curation/literature/gene_association.sgd.gz -O db/gene_association.sgd.gz
# Entrez GeneID
wget ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/gene2go.gz -O db/gene2go.gz

Run GO enrichment

.. code:: python

import goenrich

build the ontology

O = goenrich.obo.ontology('db/go-basic.obo')

use all entrez geneid associations form gene2go as background

use annot = goenrich.read.goa('db/gene_association.goa_human.gaf.gz') for uniprot

use annot = goenrich.read.sgd('db/gene_association.sgd.gz') for yeast

gene2go = goenrich.read.gene2go('db/gene2go.gz')

use values = {k: set(v) for k,v in annot.groupby('go_id')['db_object_symbol']} for uniprot/yeast

values = {k: set(v) for k,v in gene2go.groupby('GO_ID')['GeneID']}

propagate the background through the ontology

background_attribute = 'gene2go' goenrich.enrich.propagate(O, values, background_attribute)

extract some list of entries as example query

use query = annot['db_object_symbol'].unique()[:20]

query = gene2go['GeneID'].unique()[:20]

for additional export to graphviz just specify the gvfile argument

the show argument keeps the graph reasonably small

df = goenrich.enrich.analyze(O, query, background_attribute, gvfile='test.dot')

generate html

df.dropna().head().to_html('example.html')

call to graphviz

import subprocess subprocess.check_call(['dot', '-Tpng', 'test.dot', '-o', 'test.png'])

Generate png image using graphviz:

.. code:: shell

dot -Tpng example.dot > example.png

or directly from python:

.. code:: python

import subprocess subprocess.check_call(['dot', '-Tpng', 'example.dot', '-o', 'example.png'])

.. image:: https://cloud.githubusercontent.com/assets/2606663/8525018/cad3a288-23fe-11e5-813c-bd205a47eed8.png

Check the documentation for all available parameters

Licence & Contributors

This work is licenced under the MIT licence

Contributions are welcome!

Special thanks

  • @lukauskas <https://github.com/lukauskas/>_ for implementing i/o support for file-like objects.
  • @zfrenchee <https://github.com/zfrenchee/>_ for fixing a bug in the calculation of the test statistic.
  • @pommy1 <https://github.com/pommy1/>_ for implementing support for networkx >= 2.0.0.

Building the documentation

.. code:: shell

sphinx-apidoc -f -o docs goenrich goenrich/tests

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].