All Projects → FlorianPfisterer → 2D-LSTM-Seq2Seq

FlorianPfisterer / 2D-LSTM-Seq2Seq

Licence: MIT License
PyTorch implementation of a 2D-LSTM Seq2Seq Model for NMT.

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to 2D-LSTM-Seq2Seq

Nspm
🤖 Neural SPARQL Machines for Knowledge Graph Question Answering.
Stars: ✭ 156 (+524%)
Mutual labels:  lstm, seq2seq
Screenshot To Code
A neural network that transforms a design mock-up into a static website.
Stars: ✭ 13,561 (+54144%)
Mutual labels:  lstm, seq2seq
Poetry Seq2seq
Chinese Poetry Generation
Stars: ✭ 159 (+536%)
Mutual labels:  lstm, seq2seq
Pointer Networks Experiments
Sorting numbers with pointer networks
Stars: ✭ 53 (+112%)
Mutual labels:  lstm, seq2seq
fiction generator
Fiction generator with Tensorflow. 模仿王小波的风格的小说生成器
Stars: ✭ 27 (+8%)
Mutual labels:  lstm, seq2seq
Chinese Chatbot
中文聊天机器人,基于10万组对白训练而成,采用注意力机制,对一般问题都会生成一个有意义的答复。已上传模型,可直接运行,跑不起来直播吃键盘。
Stars: ✭ 124 (+396%)
Mutual labels:  lstm, seq2seq
Deep Time Series Prediction
Seq2Seq, Bert, Transformer, WaveNet for time series prediction.
Stars: ✭ 183 (+632%)
Mutual labels:  lstm, seq2seq
Tensorflow seq2seq chatbot
Stars: ✭ 81 (+224%)
Mutual labels:  lstm, seq2seq
Pytorch Seq2seq
Tutorials on implementing a few sequence-to-sequence (seq2seq) models with PyTorch and TorchText.
Stars: ✭ 3,418 (+13572%)
Mutual labels:  lstm, seq2seq
Tensorflow novelist
模仿莎士比亚创作戏剧!屌炸天的是还能创作金庸武侠小说!快star,保持更新!!
Stars: ✭ 244 (+876%)
Mutual labels:  lstm, seq2seq
Stock Prediction Models
Gathers machine learning and deep learning models for Stock forecasting including trading bots and simulations
Stars: ✭ 4,660 (+18540%)
Mutual labels:  lstm, seq2seq
Base-On-Relation-Method-Extract-News-DA-RNN-Model-For-Stock-Prediction--Pytorch
基於關聯式新聞提取方法之雙階段注意力機制模型用於股票預測
Stars: ✭ 33 (+32%)
Mutual labels:  lstm, seq2seq
Deep News Summarization
News summarization using sequence to sequence model with attention in TensorFlow.
Stars: ✭ 167 (+568%)
Mutual labels:  lstm, seq2seq
Natural Language Processing With Tensorflow
Natural Language Processing with TensorFlow, published by Packt
Stars: ✭ 222 (+788%)
Mutual labels:  lstm, seq2seq
lstm-math
Neural network that solves math equations on the character level
Stars: ✭ 26 (+4%)
Mutual labels:  lstm, seq2seq
dts
A Keras library for multi-step time-series forecasting.
Stars: ✭ 130 (+420%)
Mutual labels:  lstm, seq2seq
HTR-ctc
Pytorch implementation of HTR on IAM dataset (word or line level + CTC loss)
Stars: ✭ 15 (-40%)
Mutual labels:  lstm
NeuralTextSimplification
Exploring Neural Text Simplification
Stars: ✭ 64 (+156%)
Mutual labels:  seq2seq
autonomio
Core functionality for the Autonomio augmented intelligence workbench.
Stars: ✭ 27 (+8%)
Mutual labels:  lstm
battery-rul-estimation
Remaining Useful Life (RUL) estimation of Lithium-ion batteries using deep LSTMs
Stars: ✭ 25 (+0%)
Mutual labels:  lstm

2D-LSTM Seq2Seq Model

This repository contains a PyTorch implementation of a 2D-LSTM model for sequence-to-sequence learning.

In addition, it contains code to apply the 2D-LSTM to neural machine translation (NMT) based on the paper "Towards two-dimensional sequence to sequence model in neural machine translation" by Parnia Bahar, Christopher Brix and Hermann Ney.

Getting Started

Prerequisites

Clone the project and make sure to install the dependencies listed in requirements.txt.

If you use the included dataset helper functions for the small IWSLT14 deu-eng NMT dataset (taken from harvardnlp/var-attn/data), it will automatically preprocess the data into .csv files before the first run.

I've successfully run all tests using:

Running Scripts

With the dependencies installed, you can run the scripts in the main/ folder. To run the IWSLT14 training script for example, just run

python -m main.train_iwslt14_small

The available command line arguments for this script can be found below.

The 2D-LSTM Seq2Seq Model

2D recurrent neural networks are widely used in many applications manipulating 2D objects such as like images. For instance, 2D-LSTMs have become the state-of-the-art in Handwritten Text Recognition [1].

The method described in [2] is an approach to apply such 2D-LSTMs to sequence-to-sequence tasks such as neural machine translation.

General Architecture

A source sentence is read by a standard (i.e. 1D) bidirectional LSTM encoder using end-to-end trained embedding vectors. Its hidden states (concatenating both directions) are then used as the inputs in the horizontal dimension of the 2D-LSTM.

Vertically, the generated (embedded) tokens of the respective previous row are given to the 2D cell. In training mode, teacher forcing is used (i.e. the correct tokens are used).

The hidden state of the cell in the last column is then fed into a fully-connected softmax layer which forms the prediction for the next output token.

The basic idea is that the 2D-LSTM re-reads the input sentence for each new output token, conditioned on the previously generated token.

The 2D-LSTM Cell

The 2D-LSTM cell at horizonal step i and vertical step j consumes encoder hidden state concatenated to the (embedded) token as well as the hidden and cell states from the previous vertical and the previous horizontal step.

See lstm2d_cell.py or the paper for details.

Training vs. Inference

In inference mode, the target tokens are not known in advance. Thus, only the naive implementation of going through each row after the other is feasible.

In training mode however, the target tokens are known in advance (and we use teacher forcing). Thus, we can traverse the 2D-grid in an efficient diagonal-wise fashion ([1], [2]).

To enable both training and inference yet make use of the possible parallelization in training mode, the 2D-LSTM code contains two different implementations of the forward propagation: one for training (forward(x, x_lengths, y)) and another one for inference (predict(x, x_lengths)).

Running Training

The train_iwslt14_small.py script contains code to train a 2D-LSTM model on the small IWSLT14 deu-eng NMT dataset (taken from harvardnlp/var-attn/data).

The following command line arguments are supported, with the given default values:

  • --batch_size=32: The batch size to use for training and inference.
  • --epochs=20: The number of epochs to train.
  • --shuffle=True: Whether or not to shuffle the training examples.
  • --lr=0.0005: The learning rate to use.
  • --embed_dim=128: The dimension of the embedding vectors for both the source and target language.
  • --encoder_state_dim=64: The dimension of the bidirectional encoder LSTM states.
  • --state_2d_dim=128: The dimension of the 2D-LSTM hidden & cell states.
  • --disable_cuda=False: Disable CUDA (i.e. use the CPU for all computations).
  • --dropout_p=0.2: The dropout probability, used after the embeddings and before the final softmax layer.

Tests

This repository contains test cases in the test/ folder that make sure the 2D-LSTM model does what it should do. After installing the additional dependencies listed in test-requirements.txt, you can run all of them using

python -m unittest 

2D-LSTM Cell Tests

These tests make sure the input and output dimensions of a single 2D-LSTM cell are as expected and the outputs are the same for each example in the batch, if the same is true for the inputs. They can be found in test_lstm2d_cell.py.

2D-LSTM Model Tests

These tests have varied purposes:

  • The tests in test_lstm2d_training.py and test_lstm2d_inference.py make sure the input and output dimensions are as expected in training and inference mode, respectively.
  • The tests in test_lstm2d_train_vs_inference.py validate the training and inference forward propagation code by comparing the predictions in both modes to each other when the same target tokens are used. This includes the handling of padding for batches that contain sequences of different lengths.
  • The tests in test_lstm2d_fit.py make sure the 2D-LSTM can fit a small synthetic dataset in a few iterations (to sanity-check it).

Future Work

This model currently does not use any attention mechanism. Future research might try removing the 2D-recurrence in favor of a Transformer-like self-attention mechanism [3].

Contributing

If you have ideas on how to improve or extend this code or you have spotted a problem, feel free to open a PR or contact me (see below).

Author

I'm Florian Pfisterer. Email me or reach out on Twitter @FlorianPfi.

License

This project is licensed under the MIT License - see LICENSE.md for details.

Acknowledgments

I would like to thank:

  • Ngoc Quan Pham for his advice and support throughout this project.
  • Parnia Bahar for her thorough response to my email questions about the details of her paper.
  • Timo Denk for our inspiring paper discussions around the topic, his ideas and last but not least his awesome TeX2Img API used in this README!

References

[1] Voigtlander et al., 2016, "Handwriting Recognition with Large Multidimensional Long Short-Term Memory Recurrent Neural Networks", https://www.vision.rwth-aachen.de/media/papers/MDLSTM_final.pdf

[2] Bahar et al., 2018, "Towards Two-Dimensional Sequence to Sequence Model in Neural Machine Translation", https://arxiv.org/abs/1810.03975

[3] Vaswani et al., 2017, "Attention Is All You Need", https://arxiv.org/abs/1706.03762

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].