All Projects → lucidrains → STAM-pytorch

lucidrains / STAM-pytorch

Licence: MIT license
Implementation of STAM (Space Time Attention Model), a pure and simple attention model that reaches SOTA for video classification

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to STAM-pytorch

uniformer-pytorch
Implementation of Uniformer, a simple attention and 3d convolutional net that achieved SOTA in a number of video classification tasks, debuted in ICLR 2022
Stars: ✭ 90 (-17.43%)
Mutual labels:  transformers, attention-mechanism, video-classification
RETRO-pytorch
Implementation of RETRO, Deepmind's Retrieval based Attention net, in Pytorch
Stars: ✭ 473 (+333.94%)
Mutual labels:  transformers, attention-mechanism
Reformer Pytorch
Reformer, the efficient Transformer, in Pytorch
Stars: ✭ 1,644 (+1408.26%)
Mutual labels:  transformers, attention-mechanism
keras-deep-learning
Various implementations and projects on CNN, RNN, LSTM, GAN, etc
Stars: ✭ 22 (-79.82%)
Mutual labels:  attention-mechanism, video-classification
long-short-transformer
Implementation of Long-Short Transformer, combining local and global inductive biases for attention over long sequences, in Pytorch
Stars: ✭ 103 (-5.5%)
Mutual labels:  transformers, attention-mechanism
nuwa-pytorch
Implementation of NÜWA, state of the art attention network for text to video synthesis, in Pytorch
Stars: ✭ 347 (+218.35%)
Mutual labels:  transformers, attention-mechanism
transganformer
Implementation of TransGanFormer, an all-attention GAN that combines the finding from the recent GanFormer and TransGan paper
Stars: ✭ 137 (+25.69%)
Mutual labels:  transformers, attention-mechanism
Vit Pytorch
Implementation of Vision Transformer, a simple way to achieve SOTA in vision classification with only a single transformer encoder, in Pytorch
Stars: ✭ 7,199 (+6504.59%)
Mutual labels:  transformers, attention-mechanism
Dalle Pytorch
Implementation / replication of DALL-E, OpenAI's Text to Image Transformer, in Pytorch
Stars: ✭ 3,661 (+3258.72%)
Mutual labels:  transformers, attention-mechanism
DARNN
A Dual-Stage Attention-Based Recurrent Neural Network for Time Series Prediction
Stars: ✭ 90 (-17.43%)
Mutual labels:  attention-mechanism
SnowflakeNet
(TPAMI 2022) Snowflake Point Deconvolution for Point Cloud Completion and Generation with Skip-Transformer
Stars: ✭ 74 (-32.11%)
Mutual labels:  transformers
COCO-LM
[NeurIPS 2021] COCO-LM: Correcting and Contrasting Text Sequences for Language Model Pretraining
Stars: ✭ 109 (+0%)
Mutual labels:  transformers
KoBERT-Transformers
KoBERT on 🤗 Huggingface Transformers 🤗 (with Bug Fixed)
Stars: ✭ 162 (+48.62%)
Mutual labels:  transformers
thermostat
Collection of NLP model explanations and accompanying analysis tools
Stars: ✭ 126 (+15.6%)
Mutual labels:  transformers
lstm-attention
Attention-based bidirectional LSTM for Classification Task (ICASSP)
Stars: ✭ 87 (-20.18%)
Mutual labels:  attention-mechanism
naru
Neural Relation Understanding: neural cardinality estimators for tabular data
Stars: ✭ 76 (-30.28%)
Mutual labels:  transformers
nlp-papers
Must-read papers on Natural Language Processing (NLP)
Stars: ✭ 87 (-20.18%)
Mutual labels:  transformers
TA3N
[ICCV 2019 Oral] TA3N: https://github.com/cmhungsteve/TA3N (Most updated repo)
Stars: ✭ 45 (-58.72%)
Mutual labels:  video-classification
Video-Description-with-Spatial-Temporal-Attention
[ACM MM 2017 & IEEE TMM 2020] This is the Theano code for the paper "Video Description with Spatial Temporal Attention"
Stars: ✭ 53 (-51.38%)
Mutual labels:  attention-mechanism
Im2LaTeX
An implementation of the Show, Attend and Tell paper in Tensorflow, for the OpenAI Im2LaTeX suggested problem
Stars: ✭ 16 (-85.32%)
Mutual labels:  attention-mechanism

STAM - Pytorch

Implementation of STAM (Space Time Attention Model), yet another pure and simple SOTA attention model that bests all previous models in video classification. This corroborates the finding of TimeSformer. Attention is all we need.

Install

$ pip install stam-pytorch

Usage

import torch
from stam_pytorch import STAM

model = STAM(
    dim = 512,
    image_size = 256,     # size of image
    patch_size = 32,      # patch size
    num_frames = 5,       # number of image frames, selected out of video
    space_depth = 12,     # depth of vision transformer
    space_heads = 8,      # heads of vision transformer
    space_mlp_dim = 2048, # feedforward hidden dimension of vision transformer
    time_depth = 6,       # depth of time transformer (in paper, it was shallower, 6)
    time_heads = 8,       # heads of time transformer
    time_mlp_dim = 2048,  # feedforward hidden dimension of time transformer
    num_classes = 100,    # number of output classes
    space_dim_head = 64,  # space transformer head dimension
    time_dim_head = 64,   # time transformer head dimension
    dropout = 0.,         # dropout
    emb_dropout = 0.      # embedding dropout
)

frames = torch.randn(2, 5, 3, 256, 256) # (batch x frames x channels x height x width)
pred = model(frames) # (2, 100)

Citations

@misc{sharir2021image,
    title   = {An Image is Worth 16x16 Words, What is a Video Worth?}, 
    author  = {Gilad Sharir and Asaf Noy and Lihi Zelnik-Manor},
    year    = {2021},
    eprint  = {2103.13915},
    archivePrefix = {arXiv},
    primaryClass = {cs.CV}
}
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].