All Projects → fhalab → Embeddings_reproduction

fhalab / Embeddings_reproduction

Licence: other

Projects that are alternatives of or similar to Embeddings reproduction

Unsupervised anomaly detection
A Notebook where I implement differents anomaly detection algorithms on a simple exemple. The goal was just to understand how the different algorithms works and their differents caracteristics.
Stars: ✭ 82 (-1.2%)
Mutual labels:  jupyter-notebook
Coronabr
Série histórica dos dados sobre COVID-19, a partir de informações do Ministério da Saúde
Stars: ✭ 83 (+0%)
Mutual labels:  jupyter-notebook
Econml
ALICE (Automated Learning and Intelligence for Causation and Economics) is a Microsoft Research project aimed at applying Artificial Intelligence concepts to economic decision making. One of its goals is to build a toolkit that combines state-of-the-art machine learning techniques with econometrics in order to bring automation to complex causal inference problems. To date, the ALICE Python SDK (econml) implements orthogonal machine learning algorithms such as the double machine learning work of Chernozhukov et al. This toolkit is designed to measure the causal effect of some treatment variable(s) t on an outcome variable y, controlling for a set of features x.
Stars: ✭ 1,238 (+1391.57%)
Mutual labels:  jupyter-notebook
Language Translation
Neural machine translator for English2German translation.
Stars: ✭ 82 (-1.2%)
Mutual labels:  jupyter-notebook
Imageclassification
Deep Learning: Image classification, feature visualization and transfer learning with Keras
Stars: ✭ 83 (+0%)
Mutual labels:  jupyter-notebook
Ml pocket reference
Resources for Machine Learning Pocket Reference
Stars: ✭ 83 (+0%)
Mutual labels:  jupyter-notebook
Nasnet Keras
Keras implementation of NASNet-A
Stars: ✭ 82 (-1.2%)
Mutual labels:  jupyter-notebook
Deepembeding
图像检索和向量搜索,similarity learning,compare deep metric and deep-hashing applying in image retrieval
Stars: ✭ 83 (+0%)
Mutual labels:  jupyter-notebook
Animecolordeoldify
Colorise Anime/Manga Sketches with DeOldify
Stars: ✭ 83 (+0%)
Mutual labels:  jupyter-notebook
Mlcourse
生命情報の機械学習入門(新学術領域「先進ゲノム支援」中級講習会資料)
Stars: ✭ 83 (+0%)
Mutual labels:  jupyter-notebook
Neural Networks
brief introduction to Python for neural networks
Stars: ✭ 82 (-1.2%)
Mutual labels:  jupyter-notebook
Ydf Recsys2015 Challenge
Solution of RecSys Challenge 2015
Stars: ✭ 82 (-1.2%)
Mutual labels:  jupyter-notebook
Rsn
Learning to Exploit Long-term Relational Dependencies in Knowledge Graphs, ICML 2019
Stars: ✭ 83 (+0%)
Mutual labels:  jupyter-notebook
Tf playground
Stars: ✭ 82 (-1.2%)
Mutual labels:  jupyter-notebook
Tensorflow Tutorials
TensorFlow Tutorials with YouTube Videos
Stars: ✭ 8,919 (+10645.78%)
Mutual labels:  jupyter-notebook
Pyepr
Powerful, automated analysis and design of quantum microwave chips & devices [Energy-Participation Ratio and more]
Stars: ✭ 81 (-2.41%)
Mutual labels:  jupyter-notebook
Machine Learning Portfolio
Machine learning portfolio
Stars: ✭ 83 (+0%)
Mutual labels:  jupyter-notebook
Spacenet building detection
Project to train/test convolutional neural networks to extract buildings from SpaceNet satellite imageries.
Stars: ✭ 83 (+0%)
Mutual labels:  jupyter-notebook
Platzidata
Stars: ✭ 83 (+0%)
Mutual labels:  jupyter-notebook
Amazon Sagemaker Script Mode
Amazon SageMaker examples for prebuilt framework mode containers, a.k.a. Script Mode, and more (BYO containers and models etc.)
Stars: ✭ 82 (-1.2%)
Mutual labels:  jupyter-notebook

Code to reproduce the paper Learned Protein Embeddings for Machine Learning.

Installation

embeddings_reproduction can be installed with pip from the command line using the following command:

$ pip install git+https://github.com/fhalab/embeddings_reproduction.git

It can also be installed in editable mode (-e) from the source with:

$ git clone https://github.com/fhalab/embeddings_reproduction.git
$ cd embeddings_reproduction
$ pip install -e .

The second option might be necessary depending on how your computer handles Git-LFS. Because some of the files are large, the connection might time out.

Computing Environment

This was originally developed using Anaconda Python 3.5 and the following packages and versions:

gensim==1.0.1
numpy==1.13.1
pandas==0.20.3
scipy==0.19.1
sklearn==0.19.0
matplotlib==2.0.2
seaborn==0.8.1

File structure

The repository is divided into code, inputs and outputs. Inputs contains all the unlabeled sequences used to build docvec models, the labeled sequences used to build Gaussian process regression models, and AAIndex, ProFET, and one-hot encodings of the labeled sequences. Code contains Python implementations of Gaussian process regression and the mismatch string kernel in addition to Jupyter notebooks that reproduce the analyses in the paper. Outputs contains all the embeddings produced during the course of analysis and csvs storing the results of the cross-validation over embedding hyperparameters, the negative controls, and the results of varying the embedding dimension or the number of unlabeled sequences. Note that while code to train docvec models is provided, the actual docvec models produced by gensim are not included in the repository because they are too large. These are at freely available at http://cheme.caltech.edu/~kkyang/.

Inferring embeddings using a pretrained model

To infer embeddings, you need a model and all it's associated files, and an iterable of sequences. For example, to infer embeddings using original_5_7 (no randomization, k=5, w=7):

  1. Download original_5_7.pkl, original_5_7.pkl.docvecs.doctag_syn0.npy, original_5_7.pkl.syn1neg.npy, and original_5_7.pkl.wv.syn0.npy. Make sure they are all in the same directory.
  2. After installing the embeddings_reproduction package, and assuming we're in the same directory as the models:
from embeddings_reproduction import embedding_tools

embeds = embedding_tools.get_embeddings_new('original_5_7.pkl', seqs, k=5, overlap=False)

The choice of pretrained model should be treated as a hyperparameter and chosen using validation.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].