All Projects → enormandeau → go_enrichment

enormandeau / go_enrichment

Licence: GPL-3.0 license
Transcripts annotation and GO enrichment Fisher tests

Programming Languages

python
139335 projects - #7 most used programming language
shell
77523 projects

Projects that are alternatives of or similar to go enrichment

graphsim
R package: Simulate Expression data from igraph network using mvtnorm (CRAN; JOSS)
Stars: ✭ 16 (-33.33%)
Mutual labels:  genomics, gene-expression
enformer-pytorch
Implementation of Enformer, Deepmind's attention network for predicting gene expression, in Pytorch
Stars: ✭ 146 (+508.33%)
Mutual labels:  genomics, gene-expression
switchde
Inference of switch-like differential expression along single-cell trajectories
Stars: ✭ 19 (-20.83%)
Mutual labels:  genomics, gene-expression
gnomad-browser
Explore gnomAD datasets on the web
Stars: ✭ 61 (+154.17%)
Mutual labels:  genomics
homerkit
Read HOMER motif analysis output in R.
Stars: ✭ 13 (-45.83%)
Mutual labels:  enrichment
scarf
Toolkit for highly memory efficient analysis of single-cell RNA-Seq, scATAC-Seq and CITE-Seq data. Analyze atlas scale datasets with millions of cells on laptop.
Stars: ✭ 54 (+125%)
Mutual labels:  genomics
haystack bio
Haystack: Epigenetic Variability and Transcription Factor Motifs Analysis Pipeline
Stars: ✭ 42 (+75%)
Mutual labels:  gene-expression
STing
Ultrafast sequence typing and gene detection from NGS raw reads
Stars: ✭ 15 (-37.5%)
Mutual labels:  genomics
Clair3
Clair3 - Symphonizing pileup and full-alignment for high-performance long-read variant calling
Stars: ✭ 119 (+395.83%)
Mutual labels:  genomics
disq
A library for manipulating bioinformatics sequencing formats in Apache Spark
Stars: ✭ 29 (+20.83%)
Mutual labels:  genomics
assembly improvement
Improve the quality of a denovo assembly by scaffolding and gap filling
Stars: ✭ 46 (+91.67%)
Mutual labels:  genomics
phenol
phenol: Phenotype ontology library
Stars: ✭ 15 (-37.5%)
Mutual labels:  genomics
BigComputeLabs
Big Compute Learning Labs
Stars: ✭ 19 (-20.83%)
Mutual labels:  genomics
bap
Bead-based single-cell atac processing
Stars: ✭ 20 (-16.67%)
Mutual labels:  genomics
MultiAssayExperiment
Bioconductor package for management of multi-assay data
Stars: ✭ 57 (+137.5%)
Mutual labels:  genomics
haslr
A fast tool for hybrid genome assembly of long and short reads
Stars: ✭ 68 (+183.33%)
Mutual labels:  genomics
nf-hack17-tutorial
Nextflow basic tutorial for newbie users
Stars: ✭ 32 (+33.33%)
Mutual labels:  genomics
fwdpy11
Forward-time simulation in Python using fwdpp
Stars: ✭ 25 (+4.17%)
Mutual labels:  genomics
SplitThreader
Explore rearrangements and copy-number amplifications in a cancer genome
Stars: ✭ 65 (+170.83%)
Mutual labels:  genomics
DGCA
Differential Gene Correlation Analysis
Stars: ✭ 32 (+33.33%)
Mutual labels:  gene-expression

go_enrichment

Transcripts annotation and GO enrichment Fisher tests

Overview

go_enrichment annotates transcript sequences and performs GO enrichment Fisher tests. The transcript sequences are blasted against the swissprot protein database and the uniprot information corresponding to the hit is retrieved from the uniprot website. Fisher tests are performed with the goatools Python module.

Prerequisites

To use go_enrichment, you will need a UNIX system (Linux or Mac OSX) and the following dependencies installed on your computer (see Installation section for more details about installing these prerequisites):

  • wget
  • gnu parallel
  • blastplus 2.7.1+, the NCBI suite of blast tools
  • swissprot blast database ftp://ftp.ncbi.nlm.nih.gov/blast/db/swissprot.tar.gz
  • GO database (see GO database section below)
  • goatools

Installation

If you do not have administrator rights on the computer you will be using or have little experience compiling, installing and adding programs to your PATH environment variable, you will potentially need to ask an administrator to install the following programs and databases.

Wget

Your UNIX system should already have wget installed. Test this by running:

wget

If you get a message saying there is a missing URL, wget is installed. Otherwise, if you are using a computer with OSX. Google install wget OSX and follow the installations instructions. For Debian or Ubuntu derived Linux distributions, install wget with:

sudo apt-get install wget

Gnu Parallel

We will use wget to download gnu parallel:

wget http://ftp.gnu.org/gnu/parallel/parallel-latest.tar.bz2
tar xvfB parallel-latest.tar.bz2
cd parallel-*
./configure && make && sudo make install

Blast tools 2.7.1+

The blast executables (pre-compiled for different architectures) can be found here: ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/LATEST/. Download the right ones for your computer, uncompress the archive and copy all the files that are in the bin folder so that they are accessible through the $PATH variable on your system.

For example, if you have administrator rights on the system, you could do:

sudo cp /path_to_blastplus/bin/\* /usr/local/bin

Test the installation by launching blastn's help:

blastn -h

Swissprot database

We will use wget to download the swissprot databases.

# Create a temporary bash session
bash

# Create folder to contain the databases
mkdir ~/blastplus_databases
cd ~/blastplus_databases

# Downloading the database
wget ftp://ftp.ncbi.nlm.nih.gov/blast/db/swissprot.*

# Confirming the integrity of the downloaded files
cat *.md5 | md5sum -c

# Decompressing
for file in `ls swissprot.*.gz`; do tar -xzf $file ; done

# Exit temporary bash session
exit

GO database

Installing the GO database will be faster:

# Create a temporary bash session
bash

# Moving to the GO database folder
cd 02_go_database

# Downloading the GO databases
wget http://geneontology.org/ontology/go-basic.obo

# Exit temporary bash session
exit

goatools

NEW WAY TO INSTALL (2021-06-30)

conda create -n goatools -c bioconda goatools=1.1.7
conda activate goatools

Old way

goatools is a python module. It depends on a certain number of other python modules. In order to make the installation easier, we will be using the anaconda python data analysis platform. anaconda will make it easy to install most of the module dependencies and does not require administrator rights. To get the anaconda install file, go to https://www.continuum.io/downloads and choose the appropriate platform and python 2.7, then launch the installation and follow the instructions. When asked if you want anaconda to add itself to your $PATH variable, say yes. You can then update with:

conda update conda

Then go to the goatools GitHub page https://github.com/tanghaibao/goatools and follow the installation instructions.

Workflow

This is a brief description of the steps as well as the input and output formats expected by go_enrichment.

Step 1 - Blast against swissprot

Put your sequences of interest in the 03_sequences folder in a file named transcriptome.fasta. If you use another name, you will need to modify the SEQUENCE_FILE variable in the script.

You need the script to point to the locally installed blastplus database by modifying the SWISSPROT_DB variable.

Then run:

./01_scripts/01_blast_against_swissprot.sh

Step 2 - Get annotation information from uniprot

This step will use the blast results to download the information of the genes to which the transcript sequences correspond.

Run:

./01_scripts/02_get_uniprot_info.sh

Step 3 - Annotate the transcripts

Use this step to create a .csv file containing the transcript names as well as some annotation information (Name, Accession, Fullname, Altnames, GO).

Run:

./01_scripts/03_annotate_genes.py 03_sequences/transcriptome.fasta 05_annotations/ sequence_annotation.txt

Step 4 - Extract genes

Before we can perform the Fisher tests, we need to generate two text files containing (one per line):

  • The names of all the analyzed transcripts, 'all_ids.txt'
  • The names of the significant transcripts, 'significant_ids.txt'

Step 5 - Run goatools

WARNING! This is currently broken. Follow the next steps to use goatools:

Install goatools

See Installation section, including getting the GO databases https://github.com/tanghaibao/goatools

Run goatools

python2 scripts/find_enrichment.py --pval=0.05 --indent ../wanted_transcripts.ids ../all_ids.txt ../all_go_annotations.csv > ../go_annotation.tsv

This script will launch goatools and perform the Fisher tests. Note: edit the script to point to your own installation of find_enrichment.py

TODO put back in the following script

./01_scripts/04_goatools.sh

Step 6 - Filter goatools results

We can now reformat the results of goatools to make them more useful.

./01_scripts/05_filter_goatools.py enrichment.csv 02_go_database/go-basic.obo filtered.csv

Licence

go_enrichment is licensed under the GPL3 license. See the LICENCE file for more details.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].