All Projects → lucidrains → memory-compressed-attention

lucidrains / memory-compressed-attention

Licence: MIT license
Implementation of Memory-Compressed Attention, from the paper "Generating Wikipedia By Summarizing Long Sequences"

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to memory-compressed-attention

Lightnetplusplus
LightNet++: Boosted Light-weighted Networks for Real-time Semantic Segmentation
Stars: ✭ 218 (+363.83%)
Mutual labels:  attention-mechanism
lstm-attention
Attention-based bidirectional LSTM for Classification Task (ICASSP)
Stars: ✭ 87 (+85.11%)
Mutual labels:  attention-mechanism
SA-DL
Sentiment Analysis with Deep Learning models. Implemented with Tensorflow and Keras.
Stars: ✭ 35 (-25.53%)
Mutual labels:  attention-mechanism
X Transformers
A simple but complete full-attention transformer with a set of promising experimental features from various papers
Stars: ✭ 211 (+348.94%)
Mutual labels:  attention-mechanism
Aoanet
Code for paper "Attention on Attention for Image Captioning". ICCV 2019
Stars: ✭ 242 (+414.89%)
Mutual labels:  attention-mechanism
Transformers-RL
An easy PyTorch implementation of "Stabilizing Transformers for Reinforcement Learning"
Stars: ✭ 107 (+127.66%)
Mutual labels:  attention-mechanism
Keras Attention Mechanism
Attention mechanism Implementation for Keras.
Stars: ✭ 2,504 (+5227.66%)
Mutual labels:  attention-mechanism
STAM-pytorch
Implementation of STAM (Space Time Attention Model), a pure and simple attention model that reaches SOTA for video classification
Stars: ✭ 109 (+131.91%)
Mutual labels:  attention-mechanism
Attentionalpoolingaction
Code/Model release for NIPS 2017 paper "Attentional Pooling for Action Recognition"
Stars: ✭ 248 (+427.66%)
Mutual labels:  attention-mechanism
Im2LaTeX
An implementation of the Show, Attend and Tell paper in Tensorflow, for the OpenAI Im2LaTeX suggested problem
Stars: ✭ 16 (-65.96%)
Mutual labels:  attention-mechanism
Triplet Attention
Official PyTorch Implementation for "Rotate to Attend: Convolutional Triplet Attention Module." [WACV 2021]
Stars: ✭ 222 (+372.34%)
Mutual labels:  attention-mechanism
Linformer Pytorch
My take on a practical implementation of Linformer for Pytorch.
Stars: ✭ 239 (+408.51%)
Mutual labels:  attention-mechanism
question-generation
Neural Models for Key Phrase Detection and Question Generation
Stars: ✭ 29 (-38.3%)
Mutual labels:  attention-mechanism
Dalle Pytorch
Implementation / replication of DALL-E, OpenAI's Text to Image Transformer, in Pytorch
Stars: ✭ 3,661 (+7689.36%)
Mutual labels:  attention-mechanism
Video-Description-with-Spatial-Temporal-Attention
[ACM MM 2017 & IEEE TMM 2020] This is the Theano code for the paper "Video Description with Spatial Temporal Attention"
Stars: ✭ 53 (+12.77%)
Mutual labels:  attention-mechanism
Neat Vision
Neat (Neural Attention) Vision, is a visualization tool for the attention mechanisms of deep-learning models for Natural Language Processing (NLP) tasks. (framework-agnostic)
Stars: ✭ 213 (+353.19%)
Mutual labels:  attention-mechanism
DARNN
A Dual-Stage Attention-Based Recurrent Neural Network for Time Series Prediction
Stars: ✭ 90 (+91.49%)
Mutual labels:  attention-mechanism
Neural-Chatbot
A Neural Network based Chatbot
Stars: ✭ 68 (+44.68%)
Mutual labels:  attention-mechanism
amta-net
Asymmetric Multi-Task Attention Network for Prostate Bed Segmentation in CT Images
Stars: ✭ 26 (-44.68%)
Mutual labels:  attention-mechanism
TianChi AIEarth
TianChi AIEarth Contest Solution
Stars: ✭ 57 (+21.28%)
Mutual labels:  attention-mechanism

Memory Compressed Attention

Implementation of the Self-Attention layer of the proposed Memory-Compressed Attention, in Pytorch. This repository offers both the causal and non-causal variant, and will take care of the padding if the sequence length is not divisible by the compression ratio.

The code also resolves an edge-case where the very first query have no keys to attend to in the auto-regressive scenario. The solution is to use null key/values, appended to the final compressed set, so that there is always at least 1 key for all queries to attend to.

Install

$ pip install memory_compressed_attention

Usage

import torch
from memory_compressed_attention import MemoryCompressedAttention

attn = MemoryCompressedAttention(
    dim = 512,
    heads = 8,                 # number of heads
    causal = False,            # auto-regressive or not
    compression_factor = 3,    # compression ratio
    dropout = 0.1              # dropout post-attention
)

x = torch.randn(1, 1024, 512)
mask = torch.ones(1, 1024).bool()

attn(x, input_mask = mask) # (1, 1024, 512)

Citations

@misc{liu2018generating,
    title={Generating Wikipedia by Summarizing Long Sequences},
    author={Peter J. Liu and Mohammad Saleh and Etienne Pot and Ben Goodrich and Ryan Sepassi and Lukasz Kaiser and Noam Shazeer},
    year={2018},
    eprint={1801.10198},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].