All Projects → ding-lab → hotspot3d

ding-lab / hotspot3d

Licence: GPL-3.0 license
3D hotspot mutation proximity analysis tool

Programming Languages

perl
6916 projects
r
7636 projects
shell
77523 projects

Projects that are alternatives of or similar to hotspot3d

mousetrap
Process and Analyze Mouse-Tracking Data
Stars: ✭ 33 (-23.26%)
Mutual labels:  analysis, clustering
RNAseq titration results
Cross-platform normalization enables machine learning model training on microarray and RNA-seq data simultaneously
Stars: ✭ 22 (-48.84%)
Mutual labels:  analysis, cancer
MolArt
MOLeculAR structure annoTator
Stars: ✭ 25 (-41.86%)
Mutual labels:  protein-structure, uniprot
Stringlifier
Stringlifier is on Opensource ML Library for detecting random strings in raw text. It can be used in sanitising logs, detecting accidentally exposed credentials and as a pre-processing step in unsupervised ML-based analysis of application text data.
Stars: ✭ 85 (+97.67%)
Mutual labels:  analysis, clustering
Msmbuilder
🏗 Statistical models for biomolecular dynamics 🏗
Stars: ✭ 118 (+174.42%)
Mutual labels:  analysis, clustering
Awesome Single Cell
Community-curated list of software packages and data resources for single-cell, including RNA-seq, ATAC-seq, etc.
Stars: ✭ 1,937 (+4404.65%)
Mutual labels:  analysis, clustering
enspara
Modeling molecular ensembles with scalable data structures and parallel computing
Stars: ✭ 28 (-34.88%)
Mutual labels:  clustering, protein-structure
PyPHLAWD
Python version of PHLAWD
Stars: ✭ 16 (-62.79%)
Mutual labels:  clustering
fornalder
Visualize long-term trends in collections of Git repositories.
Stars: ✭ 80 (+86.05%)
Mutual labels:  analysis
pingnoo
An open-source cross-platform traceroute/ping analyser.
Stars: ✭ 149 (+246.51%)
Mutual labels:  analysis
TwitterSearch2Gephi
This windows CLI app lets you collect data from twitter via REST API and convert it into a CSV data set that can be used with Gephi. Other social networks (Reddit, Youtube, WWW) are also supported.
Stars: ✭ 21 (-51.16%)
Mutual labels:  analysis
ggshakeR
An analysis and visualization R package that works with publicly available soccer data
Stars: ✭ 69 (+60.47%)
Mutual labels:  analysis
SvelteScaling
Does Svelte scale?
Stars: ✭ 21 (-51.16%)
Mutual labels:  analysis
tika-similarity
Tika-Similarity uses the Tika-Python package (Python port of Apache Tika) to compute file similarity based on Metadata features.
Stars: ✭ 92 (+113.95%)
Mutual labels:  clustering
should-i-play-f6
Chess project to analyze the statistical effect of playing f3 (as white) or f6 (as black) on the outcome of the game.
Stars: ✭ 15 (-65.12%)
Mutual labels:  analysis
PHAT
Pathogen-Host Analysis Tool - A modern Next-Generation Sequencing (NGS) analysis platform
Stars: ✭ 17 (-60.47%)
Mutual labels:  analysis
liground
A free, open-source and modern Chess Variant Analysis GUI for the 21st century
Stars: ✭ 41 (-4.65%)
Mutual labels:  analysis
DigitalCellSorter
Digital Cell Sorter (DCS): single cell RNA-seq analysis toolkit. Documentation:
Stars: ✭ 19 (-55.81%)
Mutual labels:  clustering
cis
Home of the Community Intercomparison Suite.
Stars: ✭ 30 (-30.23%)
Mutual labels:  analysis
root pandas
A Python module for conveniently loading/saving ROOT files as pandas DataFrames
Stars: ✭ 108 (+151.16%)
Mutual labels:  analysis

HotSpot3D

This 3D proximity tool can be used to identify mutation hotspots from linear protein sequence and correlate the hotspots with known or potentially interacting domains, mutations, or drugs. Mutation-mutation and mutation-drug clusters can also be identified and viewed.

Usage

    Program:     HotSpot3D - 3D mutation proximity analysis program.

     Stable:     v0.6.0 

       Beta:     up to v1.8.2
    
     Author:     Beifang Niu, John Wallis, Adam D Scott, Sohini Sengupta, & Amila Weerasinghe

Usage: hotspot3d [options]

       Preprocessing
         drugport  --  0) Parse drugport database (OPTIONAL)
         uppro     --  1) Update proximity files
         prep      --  2) Run preprocessing steps 2a-2f
             calroi    --  2a) Generate region of interest (ROI) information
             statis    --  2b) Calculate p_values for pairs of mutations
             anno      --  2c) Add region of interest (ROI) annotation
             trans     --  2d) Add transcript annotation
             cosmic    --  2e) Add COSMIC annotation to proximity file
             prior     --  2f) Prioritization

       Analysis
	     main      --  Run analysis steps a-f (beta)
             search    --  a) 3D mutation proximity searching
             cluster   --  b) Determine mutation-mutation and mutation-drug clusters
             sigclus   --  c) Determine significance of clusters (BETA/OPTIONAL)
             summary   --  d) Summarize clusters (OPTIONAL)
             visual    --  e) Visulization of 3D proximity (OPTIONAL)

Support

For user support please email [email protected] & [email protected]

Update

To reinstall code of the same version (in some cases, may need --sudo):

cpanm --reinstall HotSpot3D-#.tar.gz

Install (Ubuntu 14.04.01)

Make sure that you have cpanm:

cpan App::cpanminus

For configuration, we recommend using local::lib:

cpanm --local-lib=~/perl5 local::lib && eval $(perl -I ~/perl5/lib/perl5/ -Mlocal::lib)

Dependencies include the modules: LWP::Simple, Test::Most, List::Util, List::MoreUtils, Parallel::ForkManager

cpanm LWP::Simple

cpanm Test::Most

cpanm List::Util

cpanm List::MoreUtils

cpanm Parallel::ForkManager

Install HotSpot3D package:

git clone https://github.com/ding-lab/hotspot3d

cd hotspot3d

For the latest stable version:

git checkout v0.6.0

cpanm HotSpot3D-0.6.0.tar.gz

For the latest beta version:

git pull origin v1.8.2

cpanm HotSpot3D-1.8.2.tar.gz

Final note: Installations under some organizations may use an internal perl version. To make use of the /usr/ perl, edit the first line of ~/perl5/bin/hotspot3d.

from: #!/org/bin/perl

to: #!/usr/bin/perl

We have developed webservice for HotSpot3D, and you can submit your mutation file to do analysis online here: http://niulab.scgrid.cn/HotSpot3D/ , once you don't want to install HotSpot3D locally.

Configure Environment

It is helpful to add your perl5 lib directory, and to add your perl5 bin directory.

You can add the following lines to your ~/.bash_profile. Then run 'source ~/.bash_profile'.

	export PERL5LIB=~/perl5/lib/perl5/:${PERL5LIB}

	export PERL5BIN=~/perl5/bin/:${PERL5BIN}

	export PATH=~/perl5/bin/:${PATH}

Add cosmic v67 information to 3D proximity results :

	mkdir preprocessing_dir/cosmic

	cp COSMIC/cosmic_67_for_HotSpot3D_missense_only.tsv.bz2 ./preprocessing_dir/cosmic/

	cd ./preprocessing_dir/cosmic/ 

	bzip2 -d cosmic_67_for_HotSpot3D_missense_only.tsv.bz2

Example - Preprocessing

Download from Synapse

  1. Go to https://www.synapse.org/#!Synapse:syn8699796, and check out the wiki for any updates/details.

  2. Select the Files tab, then go into the AverageResidueDistance data directory (syn8717211).

  3. The DrugPort processing results are located here (syn9704835) for those interested.

  4. Select the reference/assembly version of interest (GRCh37 with Ensembl version 74 (syn9701918), or GRCh38 with Ensembl version 87 (syn9704851)).

  5. You will need to download the hugo.uniprot.pdb.transcript.csv (syn9704852).

  6. Two download options are available, prioritization.tar.gz (syn9704853) contains all human proteins that have been preprocessed. This is a large file that can take an hour or more depending on internet speeds. Alternatively, you can download the prioritization/ (syn9705109) or any specific protein proximity files within. The proximity files are compressed for faster/more targeted downloading.

NOTE: Proximity data only contains pairs within 20Angstroms between mutations. This should be sufficient for many HotSpot3D applications.

Generate on your own

  1. (Optional) Run drugport module to parse Drugport data and generate a drugport parsing results flat file :

     hotspot3d drugport --pdb-file-dir=pdb_files_dir
    
  2. Run 3D proximity calculation that also updates any existing preprocessed data (default launches LSF jobs) :

     hotspot3d uppro --output-dir=preprocessing_dir --pdb-file-dir=pdb_files_dir --drugport-file=drugport_parsing_results_file 1>hotspot3d.uppro.err 2>hotspot3d.uppro.out
    
  3. Run automated preprocessing for other measurments and annotations (can alternatively run steps 2a-2f individually) :

     hotspot3d prep --output-dir=preprocessing_dir
    

Example - Analysis

3D proximity searching based on prioritization results and visualization

  1. Proximity searching (acquire proximity information for input mutations):

     hotspot3d search --maf-file=your.maf --prep-dir=preprocessing_dir
    
  2. Cluster pairwise data:

     hotspot3d cluster --pairwise-file=3D_Proximity.pairwise --maf-file=your.maf
    
  3. Cluster significance calculation:

     hotspot3d sigclus --prep-dir=preprocessing_dir --pairwise-file=3D_Proximity.pairwise --clusters-file=3D_Proximity.pairwise.singleprotein.collapsed.clusters
    
  4. Clustering Summary:

     hotspot3d summary --clusters-file=3D_Proximity.pairwise.singleprotein.collapsed.clusters
    
  5. Visualization (works with PyMol):

     hotspot3d visual --pairwise-file=3D_Proximity.pairwise --clusters-file=3D_Proximity.pairwise.singleprotein.collapsed.clusters --pdb=3XSR
    

Annotations

Check out scripts/ for various annotation scripts to add more details to the .clusters file.

HGNC download can be found here: http://www.genenames.org/cgi-bin/genefamilies/.

Information on the Ensembl .gtf can be found here: http://useast.ensembl.org/info/website/upload/gff.html, and downloads can be found at the Ensembl ftp site, ftp://ftp.ensembl.org/pub/.

See the scripts/README.annotations for more details.

Tips

Mutation file - Standard .maf with custom coding transcript and protein annotations (ENST00000275493 and p.L858R)

There are only a handful of columns necessary from .maf files. They are:

	Hugo_Symbol
	
	Chromosome
	
	Start_Position
	
	End_Position
	
	Variant_Classification
	
	Reference_Allele
	
	Tumor_Seq_Allele1
	
	Tumor_Seq_Allele2
	
	Tumor_Sample_Barcode

And two non-standard columns:

	a transcript ID column
	
	a protein peptide change column (HGVS p. single letter abbreviations, ie p.T790M)

Current Annotation Support:

	Transcript ID - Ensembl coding transcript ID's (ENST)

	Gene name - HUGO symbol

Clustering with different pairs data:

	For monomers, you need to include the option '--meric-type monomer'

	For homomers, you need to include the option '--meric-type homomer'

	For heteromers, you need to include the option '--meric-type heteromer'

	For both homomers & heteromers simultaneously, you need to include the option '--meric-type multimer'

	For no regard to *mer status, you can include the option 
	'--meric-type unspecified', although this is run by default without the option

	For DrugPort only, do not input the .pairwise file; input only DrugPort pairs file.

	For *mer+DrugPort include the .pairwise file with the DrugPort pairs file, 
	and include the appropriate --meric-type as described above.

Clustering based on different distance measures:

    There are some pairs found on multiple structures. 
	In HotSpot3D versions v0.6.2 and earlier, 
	clustering only used the shortest distance among different structures 
	(shortest structure distance, SSD). 
	In HotSpot3D versions v0.6.3 and later, 
	clustering can be done using the average distance among different structures 
	(average structure distance, ASD), and this is now default.

Citation

If you use HotSpot3D in your research, please cite:

  • Protein-structure-guided discovery of functional mutations across 19 cancer types; Niu B, Scott AD, Sengupta S, Bailey MH, Batra P, Ning J, Wyczalkowski MA, Liang WW, Zhang Q, McLellan MD, Sun SQ, Tripathi P, Lou C, Ye K, Mashl RJ, Wallis J, Wendl MC, Chen F, Ding L; Nat Genet 2016 Aug;48(8):827-37
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].