All Projects → huggingface → Pytorch Openai Transformer Lm

huggingface / Pytorch Openai Transformer Lm

Licence: mit
🐥A PyTorch implementation of OpenAI's finetuned transformer language model with a script to import the weights pre-trained by OpenAI

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Pytorch Openai Transformer Lm

Awesome Bert Nlp
A curated list of NLP resources focused on BERT, attention mechanism, Transformer networks, and transfer learning.
Stars: ✭ 567 (-55.28%)
Mutual labels:  neural-networks, language-model, transformer
FNet-pytorch
Unofficial implementation of Google's FNet: Mixing Tokens with Fourier Transforms
Stars: ✭ 204 (-83.91%)
Mutual labels:  transformer, language-model
Highway-Transformer
[ACL‘20] Highway Transformer: A Gated Transformer.
Stars: ✭ 26 (-97.95%)
Mutual labels:  transformer, language-model
Bert Pytorch
Google AI 2018 BERT pytorch implementation
Stars: ✭ 4,642 (+266.09%)
Mutual labels:  language-model, transformer
Relational Rnn Pytorch
An implementation of DeepMind's Relational Recurrent Neural Networks in PyTorch.
Stars: ✭ 236 (-81.39%)
Mutual labels:  language-model, transformer
Automatic Speech Recognition
🎧 Automatic Speech Recognition: DeepSpeech & Seq2Seq (TensorFlow)
Stars: ✭ 192 (-84.86%)
Mutual labels:  neural-networks, language-model
Neural sp
End-to-end ASR/LM implementation with PyTorch
Stars: ✭ 408 (-67.82%)
Mutual labels:  language-model, transformer
MinTL
MinTL: Minimalist Transfer Learning for Task-Oriented Dialogue Systems
Stars: ✭ 61 (-95.19%)
Mutual labels:  transformer, language-model
Gpt2 French
GPT-2 French demo | Démo française de GPT-2
Stars: ✭ 47 (-96.29%)
Mutual labels:  language-model, transformer
Nlp Paper
NLP Paper
Stars: ✭ 484 (-61.83%)
Mutual labels:  language-model, transformer
Gpt Scrolls
A collaborative collection of open-source safe GPT-3 prompts that work well
Stars: ✭ 195 (-84.62%)
Mutual labels:  language-model, transformer
Gpt2
PyTorch Implementation of OpenAI GPT-2
Stars: ✭ 64 (-94.95%)
Mutual labels:  language-model, transformer
Tupe
Transformer with Untied Positional Encoding (TUPE). Code of paper "Rethinking Positional Encoding in Language Pre-training". Improve existing models like BERT.
Stars: ✭ 143 (-88.72%)
Mutual labels:  language-model, transformer
Nn
🧑‍🏫 50! Implementations/tutorials of deep learning papers with side-by-side notes 📝; including transformers (original, xl, switch, feedback, vit, ...), optimizers (adam, adabelief, ...), gans(cyclegan, stylegan2, ...), 🎮 reinforcement learning (ppo, dqn), capsnet, distillation, ... 🧠
Stars: ✭ 5,720 (+351.1%)
Mutual labels:  neural-networks, transformer
Transformers
🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
Stars: ✭ 55,742 (+4296.06%)
Mutual labels:  language-model, transformer
Seq2seqchatbots
A wrapper around tensor2tensor to flexibly train, interact, and generate data for neural chatbots.
Stars: ✭ 466 (-63.25%)
Mutual labels:  neural-networks, transformer
Vietnamese Electra
Electra pre-trained model using Vietnamese corpus
Stars: ✭ 55 (-95.66%)
Mutual labels:  language-model, transformer
Indonesian Language Models
Indonesian Language Models and its Usage
Stars: ✭ 64 (-94.95%)
Mutual labels:  language-model, transformer
Transformers without tears
Transformers without Tears: Improving the Normalization of Self-Attention
Stars: ✭ 80 (-93.69%)
Mutual labels:  transformer
Sonnet
TensorFlow-based neural network library
Stars: ✭ 9,129 (+619.95%)
Mutual labels:  neural-networks

PyTorch implementation of OpenAI's Finetuned Transformer Language Model

This is a PyTorch implementation of the TensorFlow code provided with OpenAI's paper "Improving Language Understanding by Generative Pre-Training" by Alec Radford, Karthik Narasimhan, Tim Salimans and Ilya Sutskever.

This implementation comprises a script to load in the PyTorch model the weights pre-trained by the authors with the TensorFlow implementation.

Transformer Language Model

The model classes and loading script are located in model_pytorch.py.

The names of the modules in the PyTorch model follow the names of the Variable in the TensorFlow implementation. This implementation tries to follow the original code as closely as possible to minimize the discrepancies.

This implementation thus also comprises a modified Adam optimization algorithm as used in OpenAI's paper with:

Requirements

To use the model it-self by importing model_pytorch.py, you just need:

  • PyTorch (version >=0.4)

To run the classifier training script in train.py you will need in addition:

  • tqdm
  • sklearn
  • spacy
  • ftfy
  • pandas

You can download the weights of the OpenAI pre-trained version by cloning Alec Radford's repo and placing the model folder containing the pre-trained weights in the present repo.

Using the pre-trained model as a Transformer Language Model

The model can be used as a transformer language model with OpenAI's pre-trained weights as follow:

from model_pytorch import TransformerModel, load_openai_pretrained_model, DEFAULT_CONFIG

args = DEFAULT_CONFIG
model = TransformerModel(args)
load_openai_pretrained_model(model)

This model generates Transformer's hidden states. You can use the LMHead class in model_pytorch.py to add a decoder tied with the weights of the encoder and get a full language model. You can also use the ClfHead class in model_pytorch.py to add a classifier on top of the transformer and get a classifier as described in OpenAI's publication. (see an example of both in the __main__ function of train.py)

To use the positional encoder of the transformer, you should encode your dataset using the encode_dataset() function of utils.py. Please refer to the beginning of the __main__ function in train.py to see how to properly define the vocabulary and encode your dataset.

Fine-tuning the pre-trained model on a classification task

This model can also be integrated in a classifier as detailed in OpenAI's paper. An example of fine-tuning on the ROCStories Cloze task is included with the training code in train.py

The ROCStories dataset can be downloaded from the associated website.

As with the TensorFlow code, this code implements the ROCStories Cloze Test result reported in the paper which can be reproduced by running:

python -m spacy download en
python train.py --dataset rocstories --desc rocstories --submit --analysis --data_dir [path to data here]

First experiments on the ROCStories test set

Finetuning the PyTorch model for 3 Epochs on ROCStories takes 10 minutes to run on a single NVidia K-80.

The single run test accuracy of this PyTorch version is 85.84%, while the authors reports a median accuracy with the TensorFlow code of 85.8% and the paper reports a best single run accuracy of 86.5%.

The authors implementations uses 8 GPU and can thus accomodate a batch of 64 samples while the present implementation is single GPU and is in consequence limited to 20 instances on a K80 for memory reasons. In our test, increasing the batch size from 8 to 20 samples increased the test accuracy by 2.5 points. A better accuracy may be obtained by using a multi-GPU setting (not tried yet).

The previous SOTA on the ROCStories dataset is 77.6% ("Hidden Coherence Model" of Chaturvedi et al. published in "Story Comprehension for Predicting What Happens Next" EMNLP 2017, which is a very nice paper too!)

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].