All Projects → alokwhitewolf → Visual-Attention-Model

alokwhitewolf / Visual-Attention-Model

Licence: MIT license
Chainer implementation of Deepmind's Visual Attention Model paper

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Visual-Attention-Model

Textclassifier
Text classifier for Hierarchical Attention Networks for Document Classification
Stars: ✭ 985 (+3548.15%)
Mutual labels:  recurrent-neural-networks, attention-mechanism
Document Classifier Lstm
A bidirectional LSTM with attention for multiclass/multilabel text classification.
Stars: ✭ 136 (+403.7%)
Mutual labels:  recurrent-neural-networks, attention-mechanism
Simplednn
SimpleDNN is a machine learning lightweight open-source library written in Kotlin designed to support relevant neural network architectures in natural language processing tasks
Stars: ✭ 81 (+200%)
Mutual labels:  recurrent-neural-networks, attention-mechanism
automatic-personality-prediction
[AAAI 2020] Modeling Personality with Attentive Networks and Contextual Embeddings
Stars: ✭ 43 (+59.26%)
Mutual labels:  recurrent-neural-networks, attention-mechanism
Machine Learning Curriculum
💻 Make machines learn so that you don't have to struggle to program them; The ultimate list
Stars: ✭ 761 (+2718.52%)
Mutual labels:  chainer, recurrent-neural-networks
Da Rnn
📃 **Unofficial** PyTorch Implementation of DA-RNN (arXiv:1704.02971)
Stars: ✭ 256 (+848.15%)
Mutual labels:  recurrent-neural-networks, attention-mechanism
Image Caption Generator
A neural network to generate captions for an image using CNN and RNN with BEAM Search.
Stars: ✭ 126 (+366.67%)
Mutual labels:  recurrent-neural-networks, attention-mechanism
Linear Attention Recurrent Neural Network
A recurrent attention module consisting of an LSTM cell which can query its own past cell states by the means of windowed multi-head attention. The formulas are derived from the BN-LSTM and the Transformer Network. The LARNN cell with attention can be easily used inside a loop on the cell state, just like any other RNN. (LARNN)
Stars: ✭ 119 (+340.74%)
Mutual labels:  recurrent-neural-networks, attention-mechanism
Attention is all you need
Transformer of "Attention Is All You Need" (Vaswani et al. 2017) by Chainer.
Stars: ✭ 303 (+1022.22%)
Mutual labels:  chainer, attention-mechanism
Multi-task-Conditional-Attention-Networks
A prototype version of our submitted paper: Conversion Prediction Using Multi-task Conditional Attention Networks to Support the Creation of Effective Ad Creatives.
Stars: ✭ 21 (-22.22%)
Mutual labels:  chainer, attention-mechanism
datastories-semeval2017-task6
Deep-learning model presented in "DataStories at SemEval-2017 Task 6: Siamese LSTM with Attention for Humorous Text Comparison".
Stars: ✭ 20 (-25.93%)
Mutual labels:  recurrent-neural-networks, attention-mechanism
DARNN
A Dual-Stage Attention-Based Recurrent Neural Network for Time Series Prediction
Stars: ✭ 90 (+233.33%)
Mutual labels:  recurrent-neural-networks, attention-mechanism
stanford-cs231n-assignments-2020
This repository contains my solutions to the assignments for Stanford's CS231n "Convolutional Neural Networks for Visual Recognition" (Spring 2020).
Stars: ✭ 84 (+211.11%)
Mutual labels:  recurrent-neural-networks, attention-mechanism
Keras Attention
Visualizing RNNs using the attention mechanism
Stars: ✭ 697 (+2481.48%)
Mutual labels:  recurrent-neural-networks, attention-mechanism
Attention Mechanisms
Implementations for a family of attention mechanisms, suitable for all kinds of natural language processing tasks and compatible with TensorFlow 2.0 and Keras.
Stars: ✭ 203 (+651.85%)
Mutual labels:  recurrent-neural-networks, attention-mechanism
Chainer Rnn Ner
Named Entity Recognition with RNN, implemented by Chainer
Stars: ✭ 19 (-29.63%)
Mutual labels:  chainer, recurrent-neural-networks
Neural-Chatbot
A Neural Network based Chatbot
Stars: ✭ 68 (+151.85%)
Mutual labels:  recurrent-neural-networks, attention-mechanism
GuneyOzsanOutThereMusicVideo
Procedurally generated, real-time, demoscene style, open source music video made with Unity 3D for Out There by Guney Ozsan.
Stars: ✭ 26 (-3.7%)
Mutual labels:  visual
3dgan-chainer
📦 A Chainer implementation of 3D Generative Adversarial Network.
Stars: ✭ 25 (-7.41%)
Mutual labels:  chainer
Probabilistic-RNN-DA-Classifier
Probabilistic Dialogue Act Classification for the Switchboard Corpus using an LSTM model
Stars: ✭ 22 (-18.52%)
Mutual labels:  recurrent-neural-networks

Visual Attention Model

Chainer implementation of Deepmind's Recurrent Models of Visual Attention. Image Humans do not tend to process a whole scene in its entirety at once. Instead we focus attention selectively on parts of the visual space to acquire information when and where it is needed, and combine information from different fixations over time to build up an internal representation of the scene.Focusing the computational resources on parts of a scene saves “bandwidth” as fewer “pixels” need to be processed. Image

The model is a recurrent neural network (RNN) which processes inputs sequentially, attending to different locations within the images (or video frames) one at a time, and incrementally combines information from these fixations to build up a dynamic internal representation of the scene or envi- ronment. Instead of processing an entire image or even bounding box at once, at each step, the model selects the next location to attend to based on past information and the demands of the task. Both the number of parameters in the model and the amount of computation it performs can be controlled independently of the size of the input image, which is in contrast to convolutional networks whose computational demands scale linearly with the number of image pixels.

The Network Architecture

Network Architecture image from Sunner Li's Blogpost
Glimpse Sensor


Glimpse Sensor is the implementation of RetinaThe idea is to allow our network to “take a glance” at the image around a given location, called a glimpse, then extract and resize this glimpse into various scales of image crops, but each scale is using the same resolution. For example, the glimpse in the above example contains 3 different scales, each scale has the same resolution (a.k.a. sensor bandwidth), e.g. 12x12. Therefore, the smallest scale of crop in the centre is most detailed, whereas the largest crop in the outer ring is most blurred. In summary, Glimpse Sensor takes a full-sized image and a location, outputs the “Retina-like” representation of the image around the given location.

Glimpse Network

Once we have defined glimpse sensor, Glimpse Network is simply a wrapped around Glimpse Sensor, to take a full-sized image and a location, extract a retina representation of the image via Glimpse Sensor, flatten, then combine the extracted retina representation with the glimpse location using hidden layers and ReLU, emitting a single vector g. This vector contains the information of both “what” (our retina representation) and “where” (the focused location within the image).

Recurrent Network
Recurrent Network takes feature vector input from Glimpse Network, remembers the useful information via it’s hidden states (and memory cell).

Location Network
Location Network takes hidden states from Recurrent Network as input, and tries to predict the next location to look at. This location prediction will become input to the Glimpse Network in the next time step in the unrolled recurrent network. The Location Network is the key component in this whole idea since it directly determines where to pay attention to in the next time step. In order to maximize the performance of this Location Network, the paper introduce a stochastic process (i.e. gaussian distribution) to generate next location, and use reinforcement learning techniques to learn. It is also known as “hard” attention, since this stochastic process is non-differentiable (compared to “soft” attention). The intuition behind stochasticity is to balance between exploitation (to predict future using the history) and exploration (to try unprecedented stuff). Note that, this stochasticity makes the component non-differentiable, which will incur problem during back-propagation. And REINFORCE gradient policy algorithm is used to solve this problem.

Activation Network
Activation Network takes hidden states from Recurrent Network as input, and tries to predict the digit. In addition, the prediction result is used to generate the reward point, which is used to train the Location Network (since the stochasticity makes it non-differentiable).



Architecture Combined
Combining all the element illustrated above, we have our network architecture below. Image

Experiments

  • MNIST Image
  • Translated MNIST
  • Cluttered MNIST
  • SVHN

Credits

Some of the texts and images have been medium posts by Tristan and Sunner Li

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].