All Projects → guillaume-chevalier → Linear Attention Recurrent Neural Network

guillaume-chevalier / Linear Attention Recurrent Neural Network

Licence: mit
A recurrent attention module consisting of an LSTM cell which can query its own past cell states by the means of windowed multi-head attention. The formulas are derived from the BN-LSTM and the Transformer Network. The LARNN cell with attention can be easily used inside a loop on the cell state, just like any other RNN. (LARNN)

Projects that are alternatives of or similar to Linear Attention Recurrent Neural Network

Image Caption Generator
A neural network to generate captions for an image using CNN and RNN with BEAM Search.
Stars: ✭ 126 (+5.88%)
Mutual labels:  lstm, recurrent-neural-networks, attention-mechanism, attention-model
Pytorch Learners Tutorial
PyTorch tutorial for learners
Stars: ✭ 97 (-18.49%)
Mutual labels:  jupyter-notebook, lstm, rnn, recurrent-neural-networks
Pytorch Pos Tagging
A tutorial on how to implement models for part-of-speech tagging using PyTorch and TorchText.
Stars: ✭ 96 (-19.33%)
Mutual labels:  jupyter-notebook, lstm, rnn, recurrent-neural-networks
Poetry Seq2seq
Chinese Poetry Generation
Stars: ✭ 159 (+33.61%)
Mutual labels:  jupyter-notebook, lstm, rnn, attention-mechanism
Bitcoin Price Prediction Using Lstm
Bitcoin price Prediction ( Time Series ) using LSTM Recurrent neural network
Stars: ✭ 67 (-43.7%)
Mutual labels:  jupyter-notebook, lstm, rnn, recurrent-neural-networks
automatic-personality-prediction
[AAAI 2020] Modeling Personality with Attentive Networks and Contextual Embeddings
Stars: ✭ 43 (-63.87%)
Mutual labels:  recurrent-neural-networks, lstm, rnn, attention-mechanism
Pytorch Sentiment Analysis
Tutorials on getting started with PyTorch and TorchText for sentiment analysis.
Stars: ✭ 3,209 (+2596.64%)
Mutual labels:  jupyter-notebook, lstm, rnn, recurrent-neural-networks
Lstm Human Activity Recognition
Human Activity Recognition example using TensorFlow on smartphone sensors dataset and an LSTM RNN. Classifying the type of movement amongst six activity categories - Guillaume Chevalier
Stars: ✭ 2,943 (+2373.11%)
Mutual labels:  jupyter-notebook, lstm, rnn, recurrent-neural-networks
Lstm Sentiment Analysis
Sentiment Analysis with LSTMs in Tensorflow
Stars: ✭ 886 (+644.54%)
Mutual labels:  jupyter-notebook, lstm, rnn
Telemanom
A framework for using LSTMs to detect anomalies in multivariate time series data. Includes spacecraft anomaly data and experiments from the Mars Science Laboratory and SMAP missions.
Stars: ✭ 589 (+394.96%)
Mutual labels:  jupyter-notebook, lstm, rnn
Lstmvis
Visualization Toolbox for Long Short Term Memory networks (LSTMs)
Stars: ✭ 959 (+705.88%)
Mutual labels:  jupyter-notebook, lstm, recurrent-neural-networks
Deep Learning Time Series
List of papers, code and experiments using deep learning for time series forecasting
Stars: ✭ 796 (+568.91%)
Mutual labels:  jupyter-notebook, lstm, recurrent-neural-networks
Stockpriceprediction
Stock Price Prediction using Machine Learning Techniques
Stars: ✭ 700 (+488.24%)
Mutual labels:  jupyter-notebook, lstm, rnn
Ml Ai Experiments
All my experiments with AI and ML
Stars: ✭ 107 (-10.08%)
Mutual labels:  jupyter-notebook, lstm, rnn
Video Classification
Tutorial for video classification/ action recognition using 3D CNN/ CNN+RNN on UCF101
Stars: ✭ 543 (+356.3%)
Mutual labels:  jupyter-notebook, lstm, rnn
Neural Networks
All about Neural Networks!
Stars: ✭ 34 (-71.43%)
Mutual labels:  jupyter-notebook, lstm, rnn
Rnn Notebooks
RNN(SimpleRNN, LSTM, GRU) Tensorflow2.0 & Keras Notebooks (Workshop materials)
Stars: ✭ 48 (-59.66%)
Mutual labels:  jupyter-notebook, lstm, rnn
Sockeye
Sequence-to-sequence framework with a focus on Neural Machine Translation based on Apache MXNet
Stars: ✭ 990 (+731.93%)
Mutual labels:  attention-mechanism, attention-model, attention-is-all-you-need
Sentiment Analysis Nltk Ml Lstm
Sentiment Analysis on the First Republic Party debate in 2016 based on Python,NLTK and ML.
Stars: ✭ 61 (-48.74%)
Mutual labels:  jupyter-notebook, lstm, recurrent-neural-networks
Gdax Orderbook Ml
Application of machine learning to the Coinbase (GDAX) orderbook
Stars: ✭ 60 (-49.58%)
Mutual labels:  jupyter-notebook, lstm, recurrent-neural-networks

LARNN: Linear Attention Recurrent Neural Network

A fixed-size, go-back-k recurrent attention module on an RNN so as to have linear short-term memory by the means of attention. The LARNN model can be easily used inside a loop on the cell state just like any other RNN. The cell state keeps the k last states for its multi-head attention mechanism.

The LARNN is derived from the Long Short-Term Memory (LSTM) cell. The LARNN introduces attention on the state's past values up to a certain range, limited by a time window k to keep the forward processing linear in time in terms sequence length (time steps).

Therefore, multi-head attention with positional encoding is used on the most recent past values of the inner state cell so as to enable a better mid-term memory, such that at each new time steps, the cell looks back at it's own previous cell state values with an attention query.

The LARNN Cell

Note that the positional encoding is concatenated rather than added. Also, the ELU activation is used in the cell. There is also batch normalization at many places (not drawn). The Multi-Head Attention Mechanism uses an ELU activation rather than unactivated Linears, for the keys and values and the query. There is here only one query rather than many queries.

Yes, it LARNNs.

Downloading the dataset

cd data
python3 download_dataset.py
cd ..

Meta-optimize the LARNN

This will launch a round of meta-optimisation which will save the results under a new ./results/ folder.

python3 hyperopt_optimize.py --dataset UCIHAR --device cuda

Two training rounds have been executed and renamed under the folders ./results_round_1/ and ./results_round_2/ for now.

Visualize the results

You can visually inspect the effect of every hyperparameter on the accuracy, and their correlated effect, by navigating at:

You could also copy and run one of those files on new results by simply changing the results folder in the jupyter-notebook such that your new folder is taken.

The hyperparameters space searched with meta-optimization

Here are the hyperparameters and their respective value ranges, which have been explored:

HYPERPARAMETERS_SPACE = {
    ### Optimization parameters
    # This loguniform scale will multiply the learning rate, so as to make
    # it vary exponentially, in a multiplicative fashion rather than in
    # a linear fashion, to handle his exponentialy varying nature:
    'learning_rate': 0.005 * hp.loguniform('learning_rate_mult', -0.5, 0.5),
    # How many epochs before the learning_rate is multiplied by 0.75
    'decay_each_N_epoch': hp.quniform('decay_each_N_epoch', 3 - 0.499, 10 + 0.499, 1),
    # L2 weight decay:
    'l2_weight_reg': 0.005 * hp.loguniform('l2_weight_reg_mult', -1.3, 1.3),
    # Number of loops on the whole train dataset
    'training_epochs': 25,
    # Number of examples fed per training step
    'batch_size': 256,

    ### LSTM/RNN parameters
    # The dropout on the hidden unit on top of each LARNN cells
    'dropout_drop_proba': hp.uniform('dropout_drop_proba', 0.05, 0.5),
    # Let's multiply the "default" number of hidden units:
    'hidden_size': 64 * hp.loguniform('hidden_size_mult', -0.6, 0.6),
    # The number 'h' of attention heads: from 1 to 20 attention heads.
    'attention_heads': hp.quniform('attention_heads', 6 - 0.499, 36 + 0.499, 1),

    ### LARNN (Linear Attention RNN) parameters
    # How restricted is the attention back in time steps (across sequence)
    'larnn_window_size': hp.uniform('larnn_window_size', 10, 50),
    # How the new attention is placed in the LSTM
    'larnn_mode': hp.choice('larnn_mode', [
        'residual',  # Attention will be added to Wx and Wh as `Wx*x + Wh*h + Wa*a + b`.
        'layer'  # Attention will be post-processed like `Wa*(concat(x, h, a)) + bs`
        # Note:
        #     `a(K, Q, V) = MultiHeadSoftmax(Q*K'/sqrt(dk))*V` like in Attention Is All You Need (AIAYN).
        #     `Q = Wxh*concat(x, h) + bxh`
        #     `V = K = Wk*(a "larnn_window_size" number of most recent cells)`
    ]),
    # Wheter or not to use Positional Encoding similar to the one used in https://arxiv.org/abs/1706.03762
    'use_positional_encoding': hp.choice('use_positional_encoding', [False, True]),
    # Wheter or not to use BN(ELU(.)) in the Linear() layers of the keys and values in the multi-head attention.
    'activation_on_keys_and_values': hp.choice('activation_on_keys_and_values', [False, True]),

    # Number of layers, either stacked or residualy stacked:
    'num_layers': hp.choice('num_layers', [2, 3]),
    # Use residual connections for the 2nd (stacked) layer?
    'is_stacked_residual': hp.choice('is_stacked_residual', [False, True])
}

The best results were found with those hyperparameters, for a test accuracy of 91.924%:

{
    "activation_on_keys_and_values": true,
    "attention_heads": 27,
    "batch_size": 256,
    "decay_each_N_epoch": 26,
    "dropout_drop_proba": 0.08885391813337816,
    "hidden_size": 81,
    "is_stacked_residual": true,
    "l2_weight_reg": 0.0006495900377590891,
    "larnn_mode": "residual",
    "larnn_window_size": 38,
    "learning_rate": 0.006026504115228934,
    "num_layers": 3,
    "training_epochs": 100,
    "use_positional_encoding": false
}

Retrain on best hyperparameters found by meta-optimization

You can re-train on the best hyperparameters found with this command:

python3 train.py --dataset UCIHAR --device cuda

Note: before being able to run this command, you will need to have .json files from training results under the path ./results/UCIHAR/. Currently, the best results are found within ./results_round_2/UCIHAR/, the folder could be renamed to make this command work.

Debug the LARNN model

This command is practical if you want to edit the model and potentially print-debug its dimensions:

python3 larnn.py

Some thoughts and self-assessment

Although the LARNN cell obtains better results than the LSTM cell as explored here, the LARNN is more complicated and hence the LSTM cell is still very interesting and probably of greater value.

However, the LARNN would still have to be compared to a deeply stacked setup such as done here, where better results are obtained, but by using many more cells which means the current project could still perform better with more cells and proper regularization.

It seems that the positional encoding tried here is not helpful for the learning.

So overall, despite the LARNN not bringing huge improvements in accuracy, the most interesting thing about this project are:

  • The code which is reusable and neat for being easily adaptable to automatically hyperoptimize on other datasets and networks.
  • The discovery that adding an activation on the multi-head self-attention mechanism's keys, queries and values performed well in the context here, better than using no activation.
  • To my best knowledge, a new neural attention data structure is created by using a queue for an attention mechanism, sliding through time, and this data structure could potentially be very interesting in many other applications where attention is required.
  • The figures are reusable, published under CC-BY in the subfolder, while the code is published under the MIT License and also reusable.

The current dataset is solveable with good accuracy without any attention mechanism. So the current project was more to code something interesting to than genuinely try to improve the accuracy on a small dataset. I coded this in 1 week so I couldn't use a very complicated dataset and rebuild a complete data pipeline - I had to reuse old code of mine that I already knew.

References

The current project contains code derived from those other projects:

More information on which pieces of code comes from where in the headers of each Python files. All of those references are licensed under permissive open-source licenses, such as the MIT License and the Apache 2.0 License.

Paper (Citing)

For more information, see the paper's page on arXiv.

License

My project is freely available under the terms of the MIT License.

Copyright (c) 2018 Guillaume Chevalier

Note: my drawings are specially available under the CC-BY license rather than under the MIT License.

Connect with me

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].