All Projects → philipperemy → Keras Attention Mechanism

philipperemy / Keras Attention Mechanism

Licence: apache-2.0
Attention mechanism Implementation for Keras.

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Keras Attention Mechanism

Structured Self Attention
A Structured Self-attentive Sentence Embedding
Stars: ✭ 459 (-81.67%)
Mutual labels:  attention-mechanism, attention-model
attention-mechanism-keras
attention mechanism in keras, like Dense and RNN...
Stars: ✭ 19 (-99.24%)
Mutual labels:  attention-mechanism, attention-model
Attentionalpoolingaction
Code/Model release for NIPS 2017 paper "Attentional Pooling for Action Recognition"
Stars: ✭ 248 (-90.1%)
Mutual labels:  attention-mechanism, attention-model
Linear Attention Recurrent Neural Network
A recurrent attention module consisting of an LSTM cell which can query its own past cell states by the means of windowed multi-head attention. The formulas are derived from the BN-LSTM and the Transformer Network. The LARNN cell with attention can be easily used inside a loop on the cell state, just like any other RNN. (LARNN)
Stars: ✭ 119 (-95.25%)
Mutual labels:  attention-mechanism, attention-model
Deepattention
Deep Visual Attention Prediction (TIP18)
Stars: ✭ 65 (-97.4%)
Mutual labels:  attention-mechanism, attention-model
Nmt Keras
Neural Machine Translation with Keras
Stars: ✭ 501 (-79.99%)
Mutual labels:  attention-mechanism, attention-model
Compact-Global-Descriptor
Pytorch implementation of "Compact Global Descriptor for Neural Networks" (CGD).
Stars: ✭ 22 (-99.12%)
Mutual labels:  attention-mechanism, attention-model
Sockeye
Sequence-to-sequence framework with a focus on Neural Machine Translation based on Apache MXNet
Stars: ✭ 990 (-60.46%)
Mutual labels:  attention-mechanism, attention-model
Pytorch Attention Guided Cyclegan
Pytorch implementation of Unsupervised Attention-guided Image-to-Image Translation.
Stars: ✭ 67 (-97.32%)
Mutual labels:  attention-mechanism, attention-model
Image Caption Generator
A neural network to generate captions for an image using CNN and RNN with BEAM Search.
Stars: ✭ 126 (-94.97%)
Mutual labels:  attention-mechanism, attention-model
Pytorch Acnn Model
code of Relation Classification via Multi-Level Attention CNNs
Stars: ✭ 170 (-93.21%)
Mutual labels:  attention-model
Multimodal Sentiment Analysis
Attention-based multimodal fusion for sentiment analysis
Stars: ✭ 172 (-93.13%)
Mutual labels:  attention-mechanism
Sparse Structured Attention
Sparse and structured neural attention mechanisms
Stars: ✭ 198 (-92.09%)
Mutual labels:  attention-mechanism
Speech emotion recognition blstm
Bidirectional LSTM network for speech emotion recognition.
Stars: ✭ 203 (-91.89%)
Mutual labels:  attention-model
Lstm attention
attention-based LSTM/Dense implemented by Keras
Stars: ✭ 168 (-93.29%)
Mutual labels:  attention-mechanism
Hnatt
Train and visualize Hierarchical Attention Networks
Stars: ✭ 192 (-92.33%)
Mutual labels:  attention-mechanism
Eeg Dl
A Deep Learning library for EEG Tasks (Signals) Classification, based on TensorFlow.
Stars: ✭ 165 (-93.41%)
Mutual labels:  attention-mechanism
Slot Attention
Implementation of Slot Attention from GoogleAI
Stars: ✭ 168 (-93.29%)
Mutual labels:  attention-mechanism
Gat
Graph Attention Networks (https://arxiv.org/abs/1710.10903)
Stars: ✭ 2,229 (-10.98%)
Mutual labels:  attention-mechanism
Guided Attention Inference Network
Contains implementation of Guided Attention Inference Network (GAIN) presented in Tell Me Where to Look(CVPR 2018). This repository aims to apply GAIN on fcn8 architecture used for segmentation.
Stars: ✭ 204 (-91.85%)
Mutual labels:  attention-mechanism

Keras Attention Mechanism

license dep1 Simple Keras Attention CI

Many-to-one attention mechanism for Keras.

Installation

pip install attention

Example

import numpy as np
from tensorflow.keras import Input
from tensorflow.keras.layers import Dense, LSTM
from tensorflow.keras.models import load_model, Model

from attention import Attention


def main():
    # Dummy data. There is nothing to learn in this example.
    num_samples, time_steps, input_dim, output_dim = 100, 10, 1, 1
    data_x = np.random.uniform(size=(num_samples, time_steps, input_dim))
    data_y = np.random.uniform(size=(num_samples, output_dim))

    # Define/compile the model.
    model_input = Input(shape=(time_steps, input_dim))
    x = LSTM(64, return_sequences=True)(model_input)
    x = Attention(32)(x)
    x = Dense(1)(x)
    model = Model(model_input, x)
    model.compile(loss='mae', optimizer='adam')
    print(model.summary())

    # train.
    model.fit(data_x, data_y, epochs=10)

    # test save/reload model.
    pred1 = model.predict(data_x)
    model.save('test_model.h5')
    model_h5 = load_model('test_model.h5')
    pred2 = model_h5.predict(data_x)
    np.testing.assert_almost_equal(pred1, pred2)
    print('Success.')


if __name__ == '__main__':
    main()

Other Examples

Browse examples.

Install the requirements before running the examples: pip install -r examples/examples-requirements.txt.

IMDB Dataset

In this experiment, we demonstrate that using attention yields a higher accuracy on the IMDB dataset. We consider two LSTM networks: one with this attention layer and the other one with a fully connected layer. Both have the same number of parameters for a fair comparison (250K).

Here are the results on 10 runs. For every run, we record the max accuracy on the test set for 10 epochs.

Measure No Attention (250K params) Attention (250K params)
MAX Accuracy 88.22 88.76
AVG Accuracy 87.02 87.62
STDDEV Accuracy 0.18 0.14

As expected, there is a boost in accuracy for the model with attention. It also reduces the variability between the runs, which is something nice to have.

Adding two numbers

Let's consider the task of adding two numbers that come right after some delimiters (0 in this case):

x = [1, 2, 3, 0, 4, 5, 6, 0, 7, 8]. Result is y = 4 + 7 = 11.

The attention is expected to be the highest after the delimiters. An overview of the training is shown below, where the top represents the attention map and the bottom the ground truth. As the training progresses, the model learns the task and the attention map converges to the ground truth.

Finding max of a sequence

We consider many 1D sequences of the same length. The task is to find the maximum of each sequence.

We give the full sequence processed by the RNN layer to the attention layer. We expect the attention layer to focus on the maximum of each sequence.

After a few epochs, the attention layer converges perfectly to what we expected.

References

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].