Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → THUDM → ProteinLM

THUDM / ProteinLM

Licence: Apache-2.0 license

Protein Language Model

Programming Languages

139335 projects - #7 most used programming language

36643 projects - #6 most used programming language

77523 projects

1817 projects

3793 projects

50402 projects - #5 most used programming language

30231 projects

Labels

deep-learning transfer-learning pretrained-models protein-language-model

Projects that are alternatives of or similar to ProteinLM

Finetuning any DNN for better embedding on neural search tasks

Stars: ✭ 442 (+481.58%)

Mutual labels: transfer-learning, pretrained-models

Neural network model repository for highly sparse and sparse-quantized models with matching sparsification recipes

Stars: ✭ 264 (+247.37%)

Mutual labels: transfer-learning, pretrained-models

super-gradients

Easily train or fine-tune SOTA computer vision models with one open source training library

Stars: ✭ 429 (+464.47%)

Mutual labels: transfer-learning, pretrained-models

PyTorch implementation of "Pyramid Scene Parsing Network".

Stars: ✭ 15 (-80.26%)

Mutual labels: transfer-learning, pretrained-models

Keras implementation of BERT with pre-trained weights

Stars: ✭ 820 (+978.95%)

Mutual labels: transfer-learning, pretrained-models

Pytorch Imagenet Models Example + Transfer Learning (and fine-tuning)

Stars: ✭ 134 (+76.32%)

Mutual labels: transfer-learning, pretrained-models

🏡 Fast & easy transfer learning for NLP. Harvesting language models for the industry. Focus on Question Answering.

Stars: ✭ 1,140 (+1400%)

Mutual labels: transfer-learning, pretrained-models

Open-Source-Models

Address book for computer vision models.

Stars: ✭ 30 (-60.53%)

Mutual labels: transfer-learning, pretrained-models

GAP-text2SQL: Learning Contextual Representations for Semantic Parsing with Generation-Augmented Pre-Training

Stars: ✭ 83 (+9.21%)

Mutual labels: pretrained-models

transfer-learning-text-tf

Tensorflow implementation of Semi-supervised Sequence Learning (https://arxiv.org/abs/1511.01432)

Stars: ✭ 82 (+7.89%)

Mutual labels: transfer-learning

Implementation of ULMFit algorithm for text classification via transfer learning

Stars: ✭ 94 (+23.68%)

Mutual labels: transfer-learning

pytorch cnn trainer

A Simple but Powerful CNN Trainer For PyTorch

Stars: ✭ 26 (-65.79%)

Mutual labels: transfer-learning

Land-Cover-Classification-using-Sentinel-2-Dataset

Application of deep learning on Satellite Imagery of Sentinel-2 satellite that move around the earth from June, 2015. This image patches can be trained and classified using transfer learning techniques.

Stars: ✭ 36 (-52.63%)

Mutual labels: transfer-learning

brand-sentiment-analysis

Scripts utilizing Heartex platform to build brand sentiment analysis from the news

Stars: ✭ 21 (-72.37%)

Mutual labels: transfer-learning

image-background-remove-tool

✂️ Automated high-quality background removal framework for an image using neural networks. ✂️

Stars: ✭ 767 (+909.21%)

Mutual labels: transfer-learning

Skin Lesions Classification DCNNs

Transfer Learning with DCNNs (DenseNet, Inception V3, Inception-ResNet V2, VGG16) for skin lesions classification

Stars: ✭ 47 (-38.16%)

Mutual labels: transfer-learning

Warehouse Robot Path Planning

A multi agent path planning solution under a warehouse scenario using Q learning and transfer learning.🤖️

Stars: ✭ 59 (-22.37%)

Mutual labels: transfer-learning

neuro-evolution

A project on improving Neural Networks performance by using Genetic Algorithms.

Stars: ✭ 25 (-67.11%)

Mutual labels: transfer-learning

Code for the paper: "SimPLE: Similar Pseudo Label Exploitation for Semi-Supervised Classification"

Stars: ✭ 50 (-34.21%)

Mutual labels: transfer-learning

Customisable Unified Physical Simulations (CUPS) for Reinforcement Learning. Experiments run on the ai2thor environment (http://ai2thor.allenai.org/) e.g. using A3C, RainbowDQN and A3C_GA (Gated Attention multi-modal fusion) for Task-Oriented Language Grounding (tasks specified by natural language instructions) e.g. "Pick up the Cup or else"

Stars: ✭ 38 (-50%)

Mutual labels: transfer-learning

View All Similar Projects ➔

ProteinLM

ProteinLM
Overview
Guidance
- Download ProteinLM
  - ProteinLM (200M)
  - ProteinLM (3B)
Project Structure
Usage
Downstream Tasks Performance
Citation
Contact
Reference

We pretrain protein language model based on Megatron-LM framework, and then evaluate the pretrained model results on TAPE (Tasks Assessing Protein Embeddings), which contains a set of five biologically relevant semi-supervised learning tasks. And our pretrained model achieved good performance on these tasks.

Overview

The proposal of pre-training models such as Bert have greatly promoted the development of natural language processing, improving the performance of language models. Inspired by the similarity of amino acid sequence and text sequence, we consider applying the method of pre-training language model to biological data.

Guidance

We provide pretrain and finetune code in two separate folders. If you use the pretrained model we provide, you can simply download the checkpoint and follow the finetune guide. If you want to pretrain your own model yourself, you can refer to the pretrain guide.

Pretrain README
Finetune README

Download ProteinLM

ProteinLM (200M)

For the pretrained model with 200 million parameters, you can download model checkpoint via GoogleDrive, or TsinghuaCloud.

ProteinLM (3B)

For the pretrained model with 3 billion parameters, you can download model checkpoint from here.

Project Structure

.
├── pretrain                (protein language model pretrain)
│   ├── megatron            (model folder)
│   ├── pretrain_tools      (multi-node pretrain)
│   ├── protein_tools       (data preprocess shells)
└── tape
    ├── conda_env           (conda env in yaml format)
    ├── converter           (converter script and model config files)
    ├── scripts             (model generator, finetune)
    └── tape                (tape model)

Usage

As the structure above shows, there are two stages as follows.

Pretrain
- Prepare dataset (PFAM)
- Preprocess data
- Pretrain
Finetune
- Convert pretrain protein model checkpoint
- Finetune on downstream tasks

Detailed explanations are given in each folder's readme.

Downstream Tasks Performance

Task	Metric	TAPE	ProteinLM (200M)	ProteinLM (3B)
contact prediction	P@L/5	0.36	0.52	0.75
remote homology	Top 1 Accuracy	0.21	0.26	0.30
secondary structure	Accuracy (3-class)	0.73	0.75	0.79
fluorescence	Spearman's rho	0.68	0.68	0.68
stability	Spearman's rho	0.73	0.77	0.79

Citation

Please cite our paper if you find our work useful for your research. Our paper is can be accessed here.

@article{DBLP:journals/corr/abs-2108-07435,
  author    = {Yijia Xiao and
               Jiezhong Qiu and
               Ziang Li and
               Chang{-}Yu Hsieh and
               Jie Tang},
  title     = {Modeling Protein Using Large-scale Pretrain Language Model},
  journal   = {CoRR},
  volume    = {abs/2108.07435},
  year      = {2021},
  url       = {https://arxiv.org/abs/2108.07435},
  eprinttype = {arXiv},
  eprint    = {2108.07435},
  timestamp = {Fri, 20 Aug 2021 13:55:54 +0200},
  biburl    = {https://dblp.org/rec/journals/corr/abs-2108-07435.bib},
  bibsource = {dblp computer science bibliography, https://dblp.org}
}

Contact

If you have any problem using ProteinLM, feel free to contact via [email protected].

Reference

Our work is based on the following papers. And part of the code is based on Megatron-LM and TAPE.

Evaluating Protein Transfer Learning with TAPE

@article{DBLP:journals/corr/abs-1909-08053,
  author    = {Mohammad Shoeybi and
               Mostofa Patwary and
               Raul Puri and
               Patrick LeGresley and
               Jared Casper and
               Bryan Catanzaro},
  title     = {Megatron-LM: Training Multi-Billion Parameter Language Models Using
               Model Parallelism},
  journal   = {CoRR},
  volume    = {abs/1909.08053},
  year      = {2019},
  url       = {http://arxiv.org/abs/1909.08053},
  archivePrefix = {arXiv},
  eprint    = {1909.08053},
  timestamp = {Tue, 24 Sep 2019 11:33:51 +0200},
  biburl    = {https://dblp.org/rec/journals/corr/abs-1909-08053.bib},
  bibsource = {dblp computer science bibliography, https://dblp.org}
}

Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism

@article{DBLP:journals/corr/abs-1906-08230,
  author    = {Roshan Rao and
               Nicholas Bhattacharya and
               Neil Thomas and
               Yan Duan and
               Xi Chen and
               John F. Canny and
               Pieter Abbeel and
               Yun S. Song},
  title     = {Evaluating Protein Transfer Learning with {TAPE}},
  journal   = {CoRR},
  volume    = {abs/1906.08230},
  year      = {2019},
  url       = {http://arxiv.org/abs/1906.08230},
  archivePrefix = {arXiv},
  eprint    = {1906.08230},
  timestamp = {Sat, 23 Jan 2021 01:20:25 +0100},
  biburl    = {https://dblp.org/rec/journals/corr/abs-1906-08230.bib},
  bibsource = {dblp computer science bibliography, https://dblp.org}
}

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 76

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (0) 🔗