All Projects → lucidrains → uniformer-pytorch

lucidrains / uniformer-pytorch

Licence: MIT license
Implementation of Uniformer, a simple attention and 3d convolutional net that achieved SOTA in a number of video classification tasks, debuted in ICLR 2022

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to uniformer-pytorch

STAM-pytorch
Implementation of STAM (Space Time Attention Model), a pure and simple attention model that reaches SOTA for video classification
Stars: ✭ 109 (+21.11%)
Mutual labels:  transformers, attention-mechanism, video-classification
keras-deep-learning
Various implementations and projects on CNN, RNN, LSTM, GAN, etc
Stars: ✭ 22 (-75.56%)
Mutual labels:  attention-mechanism, video-classification
long-short-transformer
Implementation of Long-Short Transformer, combining local and global inductive biases for attention over long sequences, in Pytorch
Stars: ✭ 103 (+14.44%)
Mutual labels:  transformers, attention-mechanism
RETRO-pytorch
Implementation of RETRO, Deepmind's Retrieval based Attention net, in Pytorch
Stars: ✭ 473 (+425.56%)
Mutual labels:  transformers, attention-mechanism
nuwa-pytorch
Implementation of NÜWA, state of the art attention network for text to video synthesis, in Pytorch
Stars: ✭ 347 (+285.56%)
Mutual labels:  transformers, attention-mechanism
transganformer
Implementation of TransGanFormer, an all-attention GAN that combines the finding from the recent GanFormer and TransGan paper
Stars: ✭ 137 (+52.22%)
Mutual labels:  transformers, attention-mechanism
Vit Pytorch
Implementation of Vision Transformer, a simple way to achieve SOTA in vision classification with only a single transformer encoder, in Pytorch
Stars: ✭ 7,199 (+7898.89%)
Mutual labels:  transformers, attention-mechanism
Reformer Pytorch
Reformer, the efficient Transformer, in Pytorch
Stars: ✭ 1,644 (+1726.67%)
Mutual labels:  transformers, attention-mechanism
Dalle Pytorch
Implementation / replication of DALL-E, OpenAI's Text to Image Transformer, in Pytorch
Stars: ✭ 3,661 (+3967.78%)
Mutual labels:  transformers, attention-mechanism
C3D-tensorflow
Action recognition with C3D network implemented in tensorflow
Stars: ✭ 34 (-62.22%)
Mutual labels:  video-classification
LSTM-Attention
A Comparison of LSTMs and Attention Mechanisms for Forecasting Financial Time Series
Stars: ✭ 53 (-41.11%)
Mutual labels:  attention-mechanism
Visual-Attention-Model
Chainer implementation of Deepmind's Visual Attention Model paper
Stars: ✭ 27 (-70%)
Mutual labels:  attention-mechanism
TransQuest
Transformer based translation quality estimation
Stars: ✭ 85 (-5.56%)
Mutual labels:  transformers
organic-chemistry-reaction-prediction-using-NMT
organic chemistry reaction prediction using NMT with Attention
Stars: ✭ 30 (-66.67%)
Mutual labels:  attention-mechanism
MiCT-Net-PyTorch
Video Recognition using Mixed Convolutional Tube (MiCT) on PyTorch with a ResNet backbone
Stars: ✭ 48 (-46.67%)
Mutual labels:  video-classification
conv3d-video-action-recognition
My experimentation around action recognition in videos. Contains Keras implementation for C3D network based on original paper "Learning Spatiotemporal Features with 3D Convolutional Networks", Tran et al. and it includes video processing pipelines coded using mPyPl package. Model is being benchmarked on popular UCF101 dataset and achieves result…
Stars: ✭ 50 (-44.44%)
Mutual labels:  video-classification
memory-compressed-attention
Implementation of Memory-Compressed Attention, from the paper "Generating Wikipedia By Summarizing Long Sequences"
Stars: ✭ 47 (-47.78%)
Mutual labels:  attention-mechanism
MinkLocMultimodal
MinkLoc++: Lidar and Monocular Image Fusion for Place Recognition
Stars: ✭ 65 (-27.78%)
Mutual labels:  3d-convolutional-network
clip-italian
CLIP (Contrastive Language–Image Pre-training) for Italian
Stars: ✭ 113 (+25.56%)
Mutual labels:  transformers
text
Using Transformers from HuggingFace in R
Stars: ✭ 66 (-26.67%)
Mutual labels:  transformers

Uniformer - Pytorch

Implementation of Uniformer, a simple attention and 3d convolutional net that achieved SOTA in a number of video classification tasks

Install

$ pip install uniformer-pytorch

Usage

Uniformer-S

import torch
from uniformer_pytorch import Uniformer

model = Uniformer(
    num_classes = 1000,                 # number of output classes
    dims = (64, 128, 256, 512),         # feature dimensions per stage (4 stages)
    depths = (3, 4, 8, 3),              # depth at each stage
    mhsa_types = ('l', 'l', 'g', 'g')   # aggregation type at each stage, 'l' stands for local, 'g' stands for global
)

video = torch.randn(1, 3, 8, 224, 224)  # (batch, channels, time, height, width)

logits = model(video) # (1, 1000)

Uniformer-B

import torch
from uniformer_pytorch import Uniformer

model = Uniformer(
    num_classes = 1000
    depths = (5, 8, 20, 7)
)

Citations

@inproceedings{anonymous2022uniformer,
    title   = {UniFormer: Unified Transformer for Efficient Spatial-Temporal Representation Learning},
    author  = {Anonymous},
    booktitle = {Submitted to The Tenth International Conference on Learning Representations },
    year    = {2022},
    url     = {https://openreview.net/forum?id=nBU_u6DLvoK},
    note    = {under review}
}
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].