Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → uzaymacar → Attention Mechanisms

uzaymacar / Attention Mechanisms

Licence: mit

Implementations for a family of attention mechanisms, suitable for all kinds of natural language processing tasks and compatible with TensorFlow 2.0 and Keras.

Programming Languages

python

139335 projects - #7 most used programming language

Labels

tensorflow keras natural-language-processing recurrent-neural-networks attention-mechanism language-model natural-language-understanding machine-translation text-generation sentiment-classification

Projects that are alternatives of or similar to Attention Mechanisms

Tokenizers

💥 Fast State-of-the-Art Tokenizers optimized for Research and Production

Stars: ✭ 5,077 (+2400.99%)

Mutual labels: natural-language-processing, language-model, natural-language-understanding

Keras Attention

Visualizing RNNs using the attention mechanism

Stars: ✭ 697 (+243.35%)

Mutual labels: natural-language-processing, recurrent-neural-networks, attention-mechanism

Awesome Bert Nlp

A curated list of NLP resources focused on BERT, attention mechanism, Transformer networks, and transfer learning.

Stars: ✭ 567 (+179.31%)

Mutual labels: natural-language-processing, attention-mechanism, language-model

Pytorch Sentiment Analysis

Tutorials on getting started with PyTorch and TorchText for sentiment analysis.

Stars: ✭ 3,209 (+1480.79%)

Mutual labels: natural-language-processing, recurrent-neural-networks, sentiment-classification

Easy Bert

A Dead Simple BERT API for Python and Java (https://github.com/google-research/bert)

Stars: ✭ 106 (-47.78%)

Mutual labels: natural-language-processing, language-model, natural-language-understanding

Zhihu

This repo contains the source code in my personal column (https://zhuanlan.zhihu.com/zhaoyeyu), implemented using Python 3.6. Including Natural Language Processing and Computer Vision projects, such as text generation, machine translation, deep convolution GAN and other actual combat code.

Stars: ✭ 3,307 (+1529.06%)

Mutual labels: natural-language-processing, recurrent-neural-networks, machine-translation

Multimodal Sentiment Analysis

Attention-based multimodal fusion for sentiment analysis

Stars: ✭ 172 (-15.27%)

Mutual labels: natural-language-processing, attention-mechanism, sentiment-classification

Deep-NLP-Resources

Curated list of all NLP Resources

Stars: ✭ 65 (-67.98%)

Mutual labels: machine-translation, text-generation, language-model

Simplednn

SimpleDNN is a machine learning lightweight open-source library written in Kotlin designed to support relevant neural network architectures in natural language processing tasks

Stars: ✭ 81 (-60.1%)

Mutual labels: natural-language-processing, recurrent-neural-networks, attention-mechanism

Reading comprehension tf

Machine Reading Comprehension in Tensorflow

Stars: ✭ 37 (-81.77%)

Mutual labels: natural-language-processing, recurrent-neural-networks, natural-language-understanding

Texar Pytorch

Integrating the Best of TF into PyTorch, for Machine Learning, Natural Language Processing, and Text Generation. This is part of the CASL project: http://casl-project.ai/

Stars: ✭ 636 (+213.3%)

Mutual labels: natural-language-processing, machine-translation, text-generation

Chars2vec

Character-based word embeddings model based on RNN for handling real world texts

Stars: ✭ 130 (-35.96%)

Mutual labels: natural-language-processing, language-model, natural-language-understanding

Spacy Transformers

🛸 Use pretrained transformers like BERT, XLNet and GPT-2 in spaCy

Stars: ✭ 919 (+352.71%)

Mutual labels: natural-language-processing, language-model, natural-language-understanding

Transformers

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.

Stars: ✭ 55,742 (+27359.11%)

Mutual labels: natural-language-processing, language-model, natural-language-understanding

Texar

Toolkit for Machine Learning, Natural Language Processing, and Text Generation, in TensorFlow. This is part of the CASL project: http://casl-project.ai/

Stars: ✭ 2,236 (+1001.48%)

Mutual labels: natural-language-processing, machine-translation, text-generation

Pytorch Question Answering

Important paper implementations for Question Answering using PyTorch

Stars: ✭ 154 (-24.14%)

Mutual labels: natural-language-processing, attention-mechanism

Natural Language Processing Specialization

This repo contains my coursework, assignments, and Slides for Natural Language Processing Specialization by deeplearning.ai on Coursera

Stars: ✭ 151 (-25.62%)

Mutual labels: natural-language-processing, natural-language-understanding

Rnn lstm from scratch

How to build RNNs and LSTMs from scratch with NumPy.

Stars: ✭ 156 (-23.15%)

Mutual labels: natural-language-processing, recurrent-neural-networks

Mtbook

《机器翻译：基础与模型》肖桐朱靖波著 - Machine Translation: Foundations and Models

Stars: ✭ 2,307 (+1036.45%)

Mutual labels: natural-language-processing, machine-translation

Awesome Speech Recognition Speech Synthesis Papers

Automatic Speech Recognition (ASR), Speaker Verification, Speech Synthesis, Text-to-Speech (TTS), Language Modelling, Singing Voice Synthesis (SVS), Voice Conversion (VC)

Stars: ✭ 2,085 (+927.09%)

Mutual labels: attention-mechanism, language-model

View All Similar Projects ➔

Attention Mechanisms

Introduction
Attention Types
Alignment Functions
Implementation Details
Examples
Contributing
Resources

Introduction

This repository includes custom layer implementations for a whole family of attention mechanisms, compatible with TensorFlow and Keras integration. Attention mechanisms have transformed the landscape of machine translation, and their utilization in other domains of natural language processing & understanding are increasing day by day. In a broader sense, they aim to eliminate the disadvantageous compression and loss of information in RNNs. These originate due to the fixed-length encoding of hidden states derived from input sequences by recurrent layers in sequence-to-sequence models. The layers in this repository can be used for both many-to-many and many-to-one sequence tasks. Applications include sentiment classification, text generation, machine translation, and question answering. It is also worthwhile to mention that this project will soon be deployed as a Python package. Check Contributing subsection on how to contribute to this project.

Attention Types

Self Attention

First introduced in Long Short-Term Memory-Networks for Machine Reading by Jianpeng Cheng et al. The idea is to relate different positions of the same hidden state space derived from the input sequence, based on the argument that multiple components together form the overall semantics of a sequence. This approach brings together these differently positioned information through multiple hops attention. This particular implementation follows A Structured Self-Attentive Sentence Embedding by Zhouhan Lin et al. where authors propose an additional loss metric for regularization to prevent the redundancy problems of the embedding matrix if the attention mechanism always provides similar annotation weights.

Global (Soft) Attention

First introduced in Neural Machine Translation by Jointly Learning to Align and Translate by Dzmitry Bahdanau et al. The idea is to derive a context vector based on all hidden states of the encoder RNN. Hence, it is said that this type of attention attends to the entire input state space.

Local (Hard) Attention

First introduced in Show, Attend and Tell: Neural Image Caption Generation with Visual Attention by Kelvin Xu et al. and adapted to NLP in Effective Approaches to Attention-based Neural Machine Translation by Minh-Thang Luong et al. The idea is to eliminate the attentive cost of global attention by instead focusing on a small subset of tokens in hidden states set derived from the input sequence. This window is proposed as [p_t-D, p_t+D] where D=width, and we disregard positions that cross sequence boundaries. The aligned position, p_t, is decided either through a) monotonic alignment: set p_t=t, or b) predictive alignment: set p_t = S*sigmoid(FC1(tanh(FC2(h_t))) where fully-connected layers are trainable weight matrices. Since yielding an integer index value is undifferentiable due to tf.cast() and similar methods, this implementation instead derives a aligned position float value and uses Gaussian distribution to adjust the attention weights of all source hidden states instead of slicing the actual window. We also propose an experimental alignment type, c) completely predictive alignment: set p_t as in ii), but apply it to all source hidden states (h_s) instead of the target hidden state (h_t). Then, choose top @window_width positions to build the context vector and zero out the rest. Currently, this option is only avaiable for many-to-one scenarios.

Hierarchical Attention

First introduced in Hierarchical Attention Networks for Document Classification by Zichao Yang et al. The idea is to reflect the hierarchical structure that exists within documents. The original paper proposes a bottom-up approach by applying attention mechanisms sequentially at word- and sentence-levels, but a top-down approach (ex. word- and character-levels) is also applicable. Hence, this type of mechanisms is said to attend differentially to more and less important content when constructing the document representation.

Alignment Functions

Each function is trying to compute an alignment score given a target hidden state (h_t) and source hidden states (h_s).

Name	Formula for $score(h_t, h_s)$	Defined by
Dot Product	$h_t^\intercal \cdot h_s$	Luong et al. (2015)
Scaled Dot Product	$\frac{h_t^\intercal \cdot h_s}{\sqrt{H}}$	Vaswani et al. (2017)
General	$h_t^\intercal \cdot W_a \cdot h_s$	Luong et al. (2015)
Concat	$v_a^\intercal \cdot \tanh(W_a[h_t:h_s])$	Bahdanau et al. (2015)
Location	$W_a \cdot h_t$	Luong et al. (2015)

where H is the number of hidden states given by the encoder RNN, and where W_a and v_a are trainable weight matrices.

Implementation Details

As of now, all attention mechanisms in this repository are successfully tested with applications in both many-to-one and many-to-many sequence tasks. Check the Examples subsection for example applications.
It should be noted that there is no claim that the attention mechanisms represented in this repository (and their accompanying hyperparameters represented in Examples subsection) is optimized in anyway; there is still a lot of room for improvement from both a software development and research perspective.
Every layer is a subclass of tf.keras.layers.Layer().
The __init__() method of each custom class calls the the initialization method of its parent and defines additional attributes specific to each layer.
The get_config() method calls the configuration method of its parent and defines custom attributes introduced with the layer.
If a custom layer includes method build(), then it contains trainable parameters. Take the Attention() layer for example, the backpropagation of the loss signals which inputs to give more care to and hence indicates a change in weights of the layer.
The call() method is the actual operation that is performed on the input tensors.
compute_output_shape() methods are avoided for spacing.

Examples

These layers can be plugged-in to your projects (whether language models or other types of RNNs) within seconds, just like any other TensorFlow layer with Keras integration. See the below general-purpose example for instance:

from tensorflow.keras.layers import Input, Embedding, LSTM, Dense
from layers import Attention

X = Input(shape=(sequence_length,), batch_size=batch_size) # define input layer for summary
## Token Embedding (Pretrained or Not) ##
embedding = Embedding(input_dim=vocabulary_size, output_dim=embedded_dimensions)(X)
## Encoding Recurrent Layers ##
encoder = LSTM(units=recurrent_units, return_sequences=True)(embedding) # keep timesteps
## Decoding with Attention ##
decoder, attention_weights = Attention(context='many-to-one', alignment_type=attention_type, model_api='functional')(encoder)
## Prediction Layer ##
Y = Dense(units=vocabulary_size, activation='softmax')(decoder)

where alignment_type is one of 'global', 'local-m', 'local-p', and 'local-p*'. For self attention, call the SelfAttention(size=attention_size) layer instead.

Check below subtopics for more examples, analyses, and comparisons. For fair comparison, all compared models utilize similar parameters. For example, a batch size of 100 and a number of epochs of 20 were preferred on some examples.

Sentiment Classification

You can find a sentiment classification (many-to-one, binary) example on IMBD Reviews Dataset inside examples/sentiment_classification.py. This example compares three distinct tf.keras.Model()(Functional API) models (all word-level) and aims to measure the effectiveness of the implemented self-attention layer over the conventional MLP (Multi Layer Perceptron) model. Refer to the below table for metrics:

Model ID	Maximum Validation Binary Accuracy
Simple Multi-Layer Perceptron Model	0.8730
Simple Multi-Layer Perceptron Model w/ Self-Attention (Non-Penalized)	0.8907
Simple Multi-Layer Perceptron Model w/ Self-Attention (Penalized)	0.8870

Text Generation

You can find a text generation (many-to-one) example on Shakespeare Dataset inside examples/text_generation.py. This example compares three distinct tf.keras.Model()(Functional API) models (all character-level) and aims to measure the effectiveness of the implemented attention and self-attention layers over the conventional LSTM (Long Short Term Memory) models. Refer to the below table for metrics:

Model ID	Maximum Validation Categorical Accuracy
LSTM Model	0.5953
LSTM Model w/ Self-Attention (Non-Penalized)	0.6049
LSTM Model w/ Local-p* Attention	0.6234

Document Classification

You can find a document (news) classification (many-to-one, multi-class) example on Reuters Dataset inside examples/document_classification.py. This example compares four distinct tf.keras.Model()(Functional API) models (all word-level) and aims to measure the effectiveness of the implemented attention and self-attention layers over the conventional LSTM (Long Short Term Memory) models. Refer to the below table for metrics:

Model ID	Maximum Validation Categorical Accuracy
LSTM Model	0.7210
LSTM Model w/ Self-Attention (Non-Penalized)	0.7790
LSTM Model w/ Global Attention	0.7496
LSTM Model w/ Local-p* Attention	0.7446

Machine Translation

You can find a machine translation (many-to-many) example on English-to-Spanish Dataset inside examples/machine_translation.py. This example pretty much follows TensorFlow's Machine Translation Example with some adaptions. It compares four distinct tf.keras.Model()(Functional API) models (all word-level) and aims to measure the effectiveness of the implemented attention layer. Refer to the below table for metrics:

Model ID	Maximum Validation Categorical Accuracy
Encoder-Decoder Model	0.8848
Encoder-Decoder Model w/ Global Attention	0.8860
Encoder-Decoder Model w/ Local-m Attention	0.9524
Encoder-Decoder Model w/ Local-p Attention	0.8873

Contributing

Whether it is bugs you have encountered, performance concerns, or any kind of input you have in mind, this is the perfect time to share them! Check CONTRIBUTING.md for more information and guidelines on this topic.

Resources

All papers mentioned above.
Lilian Weng's Beautiful Blog Post
Philippe Remy's Attention Implementations & Corresponding Issues
TensorFlow's Text Generation Example
TensorFlow's Machine Translation Example
Jason Brownlee's Amazing Blog Post

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 203

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (3) 🔗

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

uzaymacar / Attention Mechanisms

Programming Languages

Labels

Projects that are alternatives of or similar to Attention Mechanisms

Attention Mechanisms

Table of Contents

Introduction

Attention Types

Self Attention

Global (Soft) Attention

Local (Hard) Attention

Hierarchical Attention

Alignment Functions

Implementation Details

Examples

Sentiment Classification

Text Generation

Document Classification

Machine Translation

Contributing

Resources