All Projects → pbloem → language-models

pbloem / language-models

Licence: MIT license
Keras implementations of three language models: character-level RNN, word-level RNN and Sentence VAE (Bowman, Vilnis et al 2016).

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to language-models

A Hierarchical Latent Structure For Variational Conversation Modeling
PyTorch Implementation of "A Hierarchical Latent Structure for Variational Conversation Modeling" (NAACL 2018 Oral)
Stars: ✭ 153 (+292.31%)
Mutual labels:  vae
Cada Vae Pytorch
Official implementation of the paper "Generalized Zero- and Few-Shot Learning via Aligned Variational Autoencoders" (CVPR 2019)
Stars: ✭ 198 (+407.69%)
Mutual labels:  vae
Human body prior
VPoser: Variational Human Pose Prior
Stars: ✭ 244 (+525.64%)
Mutual labels:  vae
Optimus
Optimus: the first large-scale pre-trained VAE language model
Stars: ✭ 180 (+361.54%)
Mutual labels:  vae
Twostagevae
Stars: ✭ 192 (+392.31%)
Mutual labels:  vae
Vq Vae
Minimalist implementation of VQ-VAE in Pytorch
Stars: ✭ 224 (+474.36%)
Mutual labels:  vae
Sylvester Flows
Stars: ✭ 152 (+289.74%)
Mutual labels:  vae
nlp-papers
Must-read papers on Natural Language Processing (NLP)
Stars: ✭ 87 (+123.08%)
Mutual labels:  language-models
S Vae Tf
Tensorflow implementation of Hyperspherical Variational Auto-Encoders
Stars: ✭ 198 (+407.69%)
Mutual labels:  vae
Vae Cvae Mnist
Variational Autoencoder and Conditional Variational Autoencoder on MNIST in PyTorch
Stars: ✭ 229 (+487.18%)
Mutual labels:  vae
Pytorch Vae
A CNN Variational Autoencoder (CNN-VAE) implemented in PyTorch
Stars: ✭ 181 (+364.1%)
Mutual labels:  vae
Disentangled vae
Replicating "Understanding disentangling in β-VAE"
Stars: ✭ 188 (+382.05%)
Mutual labels:  vae
Tf Vqvae
Tensorflow Implementation of the paper [Neural Discrete Representation Learning](https://arxiv.org/abs/1711.00937) (VQ-VAE).
Stars: ✭ 226 (+479.49%)
Mutual labels:  vae
Factorvae
Pytorch implementation of FactorVAE proposed in Disentangling by Factorising(http://arxiv.org/abs/1802.05983)
Stars: ✭ 176 (+351.28%)
Mutual labels:  vae
Video prediction
Stochastic Adversarial Video Prediction
Stars: ✭ 247 (+533.33%)
Mutual labels:  vae
Vae Lagging Encoder
PyTorch implementation of "Lagging Inference Networks and Posterior Collapse in Variational Autoencoders" (ICLR 2019)
Stars: ✭ 153 (+292.31%)
Mutual labels:  vae
Pytorch Vq Vae
PyTorch implementation of VQ-VAE by Aäron van den Oord et al.
Stars: ✭ 204 (+423.08%)
Mutual labels:  vae
MIDI-VAE
No description or website provided.
Stars: ✭ 56 (+43.59%)
Mutual labels:  vae
DeepSSM SysID
Official PyTorch implementation of "Deep State Space Models for Nonlinear System Identification", 2020.
Stars: ✭ 62 (+58.97%)
Mutual labels:  vae
Variational Recurrent Autoencoder Tensorflow
A tensorflow implementation of "Generating Sentences from a Continuous Space"
Stars: ✭ 228 (+484.62%)
Mutual labels:  vae

Keras Language Models

Keras implementations of three language models: character-level RNN, word-level RNN and Sentence VAE (Bowman, Vilnis et al 2016).

Each model is implemented and tested and should run out-of-the box. The default parameters will provide a reasonable result relatively quickly. You can get better results by using bigger datasets, more epochs, or by tweaking the batch size/learning rate.

If you are new to RNNs or language modeling, I recommend checking out the following resources:

installation

The three models are provided as standalone scripts. Just download or clone the repository and run any of the following:

python chars.py
python words.py
python sentences.py

Add -h to see the parameters you can change. Make sure you have python 3, and the required packages installed.

Model 1: Character level RNN language model

This is the language model discussed in Andrey Karpathy's popular blog post The unreasonable effectiveness of RNNs. Here is an image from that post explaining the basic principle.

training

We train the model by asking it to predict the next character in the sentence, given only the preceding characters. This is easy to do on an unrolled RNN by shifting the input forward by one token (for instance by prepending a start-of-sentence character) and training it to predict the non-shifted sequence.

generating

To generate sequences, we start from a seed: a sequence of a few characters taken from the corpus. We feed these to the RNN and ask it to predict the next character, in the form of a probability distribution over all characters. We sample a character from this distribution, add it to the seed, and repeat.

Further notes

  • The default corpus is Alice in wonderland. This takes about 4 minutes per epoch on my laptop with no GPU. The collected works of shakespeare (```-t shakespeare``) take a little over one hour on a CPU.
  • With a high-end (TitanX) GPU, alice takes about 30 seconds per epoch and shakespeare takes about 30 minutes with default settings.
  • If you have the memory, increase the sequence length with, for instance, -m 1000, which will reduce the training time per epoch to a little over 10m for shakespeare and about 30s for alice.
  • Training a good character level model can take a long time. For a big corpus , you should expect a couple of days training time, even with a GPU.

With the standard settings, I get the following samples after <> epochs:

 
 

Model 2: Word level RNN language model

This is basically the same as the previous model, but instead of treating language as a sequence of characters, we treat it as a sequence of words. This means we can use a much simpler RNN (one layer will be enough), but it also means that the dimension of the input sequence is much bigger. Previously, we had about 100 possible input tokens, and we could simply model text as a sequence of one-hot vectors. Since we will have about 10000 different words, it pays to pass them through an embedding layer first. This layer embeds the words into a low dimensional space (300 dimensions in our example), where similar words can end up close to each other. We learn this embedding together with the weights of the RNN.

notes

  • Note that the -m switch here will actually remove sentences from your corpus (unlike the previous model, where it just controlled how the corpus was cut into chunks).

Model 3: Sentence VAE

RNN language models are pretty spectacular, but they have trouble maintaining long term structure. For instance: you can train the character-level model to produce Shakespeare, with realistic-looking character names, but over time you will not see the same characters recurring. Similarly, the word-level language model produces grammatical text, but often meanders starting the next sentence before the previous has finished (like a Donald Trump speech).

To provide long-term structure, we can use a Sentence VAE. This is a model first introduced in Bowman, Vilnis et al 2016. It combines three ideas:

  • Sequence-to-sequence autoencoders. Autoencoders that use an RNN to encode a sequence into a latent representation z, and then use another RNN to decode it. The model is trained on reconstruction error.
  • RNN Language modeling. The decoder decodes the original sentence from Z, but is also provided with the sentence as input, as in the previous models. This is also known as teacher forcing. It gives us the best of both worlds: we use a language model to lean the low-level word-to-word structure and and an autoencoder to learn the high level structure, encoded in z.
  • Variational autoencoders. The first two ingredients suffice to create a decent language model for high level structure, but we want one more thing: if we sample a random z, we want it to decode into a grammatical sentence. Similarly, is we encoder two sentences into latent representations z1 and z2, we want the z half in between them to decode into a grammatical sentence whose meaning is a mixture of the first two. Using a variational instead of a regular autoencoder helps us to achieve this structure in the latent space.

Variational autoencoders are powerful, but complicated models. To learn more, check out the following resources:

notes

  • If you have the memory, -b 1024 -l 1e-2 is a decent way to train quickly (about 10 seconds per epoch on a TitanX).

Implementation tips

For each implementation I got stuck for a long time on several bugs. If you're porting/adapting any of these models to another platforms, make sure to check the following.

  • Check your loss curves. Since inference is so different from training, it's quite possible that the model is training perfectly well, and there's just a stupid bug in the generating code.
  • For most models, it doesn't matter much whether you sum or average the loss per word/character for your loss function. For the VAE it matters hugely, since you have another loss term (the KL loss) which needs to be balanced with the reconstruction loss.

Packages

To install all required packages, run the following pip command (let me know if I've forgotten anything):

pip install numpy keras matplotlib nltk tensorboardx scipy

You'll also need tensorflow. If you don't have a GPU, run pip install tensorflow. If you do, run pip install tensorflow-gpu and make sure your drivers are installed correctly.

I've tried to use keras calls only, and to avoid tensorflow-specific things, but I haven't tested whether this works with other backends than tensorflow. If you're testing with another backend, let me know how you get on.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].