All Projects → tbepler → Protein Sequence Embedding Iclr2019

tbepler / Protein Sequence Embedding Iclr2019

Licence: other
Source code for "Learning protein sequence embeddings using information from structure" - ICLR 2019

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Protein Sequence Embedding Iclr2019

Attention Mechanisms
Implementations for a family of attention mechanisms, suitable for all kinds of natural language processing tasks and compatible with TensorFlow 2.0 and Keras.
Stars: ✭ 203 (+4.64%)
Mutual labels:  recurrent-neural-networks, language-model
Ctcwordbeamsearch
Connectionist Temporal Classification (CTC) decoder with dictionary and language model for TensorFlow.
Stars: ✭ 398 (+105.15%)
Mutual labels:  recurrent-neural-networks, language-model
Relational Rnn Pytorch
An implementation of DeepMind's Relational Recurrent Neural Networks in PyTorch.
Stars: ✭ 236 (+21.65%)
Mutual labels:  recurrent-neural-networks, language-model
Ctcdecoder
Connectionist Temporal Classification (CTC) decoding algorithms: best path, prefix search, beam search and token passing. Implemented in Python.
Stars: ✭ 529 (+172.68%)
Mutual labels:  recurrent-neural-networks, language-model
Mead Baseline
Deep-Learning Model Exploration and Development for NLP
Stars: ✭ 238 (+22.68%)
Mutual labels:  recurrent-neural-networks, language-model
Bit Rnn
Quantize weights and activations in Recurrent Neural Networks.
Stars: ✭ 86 (-55.67%)
Mutual labels:  recurrent-neural-networks, language-model
Gpt Neo
An implementation of model parallel GPT2& GPT3-like models, with the ability to scale up to full GPT3 sizes (and possibly more!), using the mesh-tensorflow library.
Stars: ✭ 1,252 (+545.36%)
Mutual labels:  language-model
Bert Sklearn
a sklearn wrapper for Google's BERT model
Stars: ✭ 182 (-6.19%)
Mutual labels:  language-model
Mss pytorch
Singing Voice Separation via Recurrent Inference and Skip-Filtering Connections - PyTorch Implementation. Demo:
Stars: ✭ 165 (-14.95%)
Mutual labels:  recurrent-neural-networks
Sru
SRU is a recurrent unit that can run over 10 times faster than cuDNN LSTM, without loss of accuracy tested on many tasks.
Stars: ✭ 2,009 (+935.57%)
Mutual labels:  recurrent-neural-networks
Keras English Resume Parser And Analyzer
keras project that parses and analyze english resumes
Stars: ✭ 192 (-1.03%)
Mutual labels:  recurrent-neural-networks
Automatic Speech Recognition
🎧 Automatic Speech Recognition: DeepSpeech & Seq2Seq (TensorFlow)
Stars: ✭ 192 (-1.03%)
Mutual labels:  language-model
Optimus
Optimus: the first large-scale pre-trained VAE language model
Stars: ✭ 180 (-7.22%)
Mutual labels:  language-model
Deep News Summarization
News summarization using sequence to sequence model with attention in TensorFlow.
Stars: ✭ 167 (-13.92%)
Mutual labels:  recurrent-neural-networks
Bert As Language Model
bert as language model, fork from https://github.com/google-research/bert
Stars: ✭ 185 (-4.64%)
Mutual labels:  language-model
Indic Bert
BERT-based Multilingual Model for Indian Languages
Stars: ✭ 160 (-17.53%)
Mutual labels:  language-model
Char Rnn Chinese
Multi-layer Recurrent Neural Networks (LSTM, GRU, RNN) for character-level language models in Torch. Based on code of https://github.com/karpathy/char-rnn. Support Chinese and other things.
Stars: ✭ 192 (-1.03%)
Mutual labels:  language-model
Xlnet Gen
XLNet for generating language.
Stars: ✭ 164 (-15.46%)
Mutual labels:  language-model
Lstm anomaly thesis
Anomaly detection for temporal data using LSTMs
Stars: ✭ 178 (-8.25%)
Mutual labels:  recurrent-neural-networks
Hdltex
HDLTex: Hierarchical Deep Learning for Text Classification
Stars: ✭ 191 (-1.55%)
Mutual labels:  recurrent-neural-networks

Learning protein sequence embeddings using information from structure

This repository contains the source code and links to the data and pretrained embedding models accompanying the ICLR 2019 paper: Learning protein sequence embeddings using information from structure

@inproceedings{
bepler2018learning,
title={Learning protein sequence embeddings using information from structure},
author={Tristan Bepler and Bonnie Berger},
booktitle={International Conference on Learning Representations},
year={2019},
}

Setup and dependencies

Dependencies:

  • python 3
  • pytorch >= 0.4
  • numpy
  • scipy
  • pandas
  • sklearn
  • cython
  • h5py (for embedding script)

Run setup.py to compile the cython files:

python setup.py build_ext --inplace

Data sets

The data sets with train/dev/test splits are provided as .tar.gz files from the links below.

The training and evaluation scripts assume that these data sets have been extracted into a directory called 'data'.

Pretrained models

Our trained versions of the structure-based embedding models and the bidirectional language model can be downloaded here.

Author

Tristan Bepler ([email protected])

Cite

Please cite the above paper if you use this code or pretrained models in your work.

License

The source code and trained models are provided free for non-commercial use under the terms of the CC BY-NC 4.0 license. See LICENSE file and/or https://creativecommons.org/licenses/by-nc/4.0/legalcode for more information.

Contact

If you have any questions, comments, or would like to report a bug, please file a Github issue or contact me at [email protected].

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].