All Projects → philipperemy → Tensorflow Multi Dimensional Lstm

philipperemy / Tensorflow Multi Dimensional Lstm

Licence: apache-2.0
Multi dimensional LSTM as described in Alex Graves' Paper https://arxiv.org/pdf/0705.2011.pdf

Projects that are alternatives of or similar to Tensorflow Multi Dimensional Lstm

Nlp Models Tensorflow
Gathers machine learning and Tensorflow deep learning models for NLP problems, 1.13 < Tensorflow < 2.0
Stars: ✭ 1,603 (+940.91%)
Mutual labels:  jupyter-notebook, lstm
Chinese Chatbot
中文聊天机器人,基于10万组对白训练而成,采用注意力机制,对一般问题都会生成一个有意义的答复。已上传模型,可直接运行,跑不起来直播吃键盘。
Stars: ✭ 124 (-19.48%)
Mutual labels:  jupyter-notebook, lstm
Reinforcementlearning Atarigame
Pytorch LSTM RNN for reinforcement learning to play Atari games from OpenAI Universe. We also use Google Deep Mind's Asynchronous Advantage Actor-Critic (A3C) Algorithm. This is much superior and efficient than DQN and obsoletes it. Can play on many games
Stars: ✭ 118 (-23.38%)
Mutual labels:  jupyter-notebook, lstm
Ml Ai Experiments
All my experiments with AI and ML
Stars: ✭ 107 (-30.52%)
Mutual labels:  jupyter-notebook, lstm
Deeplearningfornlpinpytorch
An IPython Notebook tutorial on deep learning for natural language processing, including structure prediction.
Stars: ✭ 1,744 (+1032.47%)
Mutual labels:  jupyter-notebook, lstm
Deeplearning tutorials
The deeplearning algorithms implemented by tensorflow
Stars: ✭ 1,580 (+925.97%)
Mutual labels:  jupyter-notebook, lstm
Multilstm
keras attentional bi-LSTM-CRF for Joint NLU (slot-filling and intent detection) with ATIS
Stars: ✭ 122 (-20.78%)
Mutual labels:  jupyter-notebook, lstm
Lstm chem
Implementation of the paper - Generative Recurrent Networks for De Novo Drug Design.
Stars: ✭ 87 (-43.51%)
Mutual labels:  jupyter-notebook, lstm
Handwriting Synthesis
Implementation of "Generating Sequences With Recurrent Neural Networks" https://arxiv.org/abs/1308.0850
Stars: ✭ 135 (-12.34%)
Mutual labels:  jupyter-notebook, lstm
Deep Learning With Python
Example projects I completed to understand Deep Learning techniques with Tensorflow. Please note that I do no longer maintain this repository.
Stars: ✭ 134 (-12.99%)
Mutual labels:  jupyter-notebook, lstm
Pytorch Learners Tutorial
PyTorch tutorial for learners
Stars: ✭ 97 (-37.01%)
Mutual labels:  jupyter-notebook, lstm
Image Caption Generator
[DEPRECATED] A Neural Network based generative model for captioning images using Tensorflow
Stars: ✭ 141 (-8.44%)
Mutual labels:  jupyter-notebook, lstm
Pytorch Pos Tagging
A tutorial on how to implement models for part-of-speech tagging using PyTorch and TorchText.
Stars: ✭ 96 (-37.66%)
Mutual labels:  jupyter-notebook, lstm
Lstm Gru Pytorch
LSTM and GRU in PyTorch
Stars: ✭ 109 (-29.22%)
Mutual labels:  jupyter-notebook, lstm
End To End Sequence Labeling Via Bi Directional Lstm Cnns Crf Tutorial
Tutorial for End-to-end Sequence Labeling via Bi-directional LSTM-CNNs-CRF
Stars: ✭ 87 (-43.51%)
Mutual labels:  jupyter-notebook, lstm
Linear Attention Recurrent Neural Network
A recurrent attention module consisting of an LSTM cell which can query its own past cell states by the means of windowed multi-head attention. The formulas are derived from the BN-LSTM and the Transformer Network. The LARNN cell with attention can be easily used inside a loop on the cell state, just like any other RNN. (LARNN)
Stars: ✭ 119 (-22.73%)
Mutual labels:  jupyter-notebook, lstm
Machine Learning
My Attempt(s) In The World Of ML/DL....
Stars: ✭ 78 (-49.35%)
Mutual labels:  jupyter-notebook, lstm
Language Translation
Neural machine translator for English2German translation.
Stars: ✭ 82 (-46.75%)
Mutual labels:  jupyter-notebook, lstm
Abstractive Summarization
Implementation of abstractive summarization using LSTM in the encoder-decoder architecture with local attention.
Stars: ✭ 128 (-16.88%)
Mutual labels:  jupyter-notebook, lstm
Ethnicolr
Predict Race and Ethnicity Based on the Sequence of Characters in a Name
Stars: ✭ 137 (-11.04%)
Mutual labels:  jupyter-notebook, lstm

Multi Dimensional Recurrent Networks

Tensorflow Implementation of the model described in Alex Graves' paper https://arxiv.org/pdf/0705.2011.pdf.


Example: 2D LSTM Architecture

What is MD LSTM?

Basically a LSTM that is multidirectional, for example, that can operate on a 2D grid. Here's a figure describing the way it works:


Example: 2D LSTM Architecture

How to get started?

git clone [email protected]:philipperemy/tensorflow-multi-dimensional-lstm.git
cd tensorflow-multi-dimensional-lstm

# create a new virtual python environment
virtualenv -p python3 venv
source venv/bin/activate
pip install -r requirements.txt

# usage: trainer.py [-h] --model_type {MD_LSTM,HORIZONTAL_SD_LSTM,SNAKE_SD_LSTM}
python trainer.py --model_type MD_LSTM
python trainer.py --model_type HORIZONTAL_SD_LSTM
python trainer.py --model_type SNAKE_SD_LSTM

Random diagonal Task

The random diagonal task consists in initializing a matrix with values very close to 0 except two which are set to 1. Those two values are on a straight line parallel to the diagonal of the matrix. The idea is to predict where those two values are. Here are some examples:

____________
|          |
|x         |
| x        |
|          |
|__________|


____________
|          |
|          |
|     x    |
|      x   |
|__________|

____________
|          |
| x        |
|  x       |
|          |
|__________|

A model performing on this task is considered as successful if it can correctly predict the second x (it's impossible to predict the first x).

  • A simple recurrent model going vertical or horizontal cannot predict any locations of x. This model is called HORIZONTAL_SD_LSTM. It should perform the worst.
  • If the matrix is flattened as one single vector, then the first location of x still cannot be predicted. However, a recurrent model should understand that the second x always comes after the first x (width+1 steps). (Model is SNAKE_SD_LSTM).
  • When predicting the second location of x, a MD recurrent model has a full view of the TOP LEFT corner. In that case, it should understand that when the first x is in the bottom right of its window, the second x will be next on the diagonal axis. Of course the first location x still cannot be predicted at all with this MD model.

After training on this task for 8x8 matrices, the losses look like this:

Overall loss of the random diagonal task (loss applied on all the elements of the inputs)

Overall loss of the random diagonal task (loss applied only on the location of the second x)

No surprise that MD LSTM performs the best here. It has direct connections between the grid cell that contains the first x and the second x (2 connections). The snake LSTM has width+1 = 9 steps between the two x. As expected, the vertical LSTM does not learn anything apart from outputting values very close to 0.


MD LSTM predictions (left) and ground truth (right) before training (predictions are all random).


MD LSTM predictions (left) and ground truth (right) after training. As expected, the MD LSTM can only predict the second x and not the first one. That means the task is correctly predicted.

Limitations

  • I could test it successfully with 32x32 matrices but the implementation is far from being well optimised.
  • This implementation can become numerically unstable quite easily.
  • I've noticed that inputs should be != 0. Otherwise some gradients are nan. So consider inputs += eps in case.
  • It's hard to use in Keras. This implementation is in pure tensorflow.
  • It runs on a GPU but the code is not optimized at all so I would say it's equally fast (CPU vs GPU).

Contributions

Welcome!

Special Thanks

  • A big thank you to Mosnoi Ion who provided the first skeleton of this MD LSTM.
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].