All Projects → szq0214 → SReT

szq0214 / SReT

Licence: MIT license
Official PyTorch implementation of our ECCV 2022 paper "Sliced Recursive Transformer"

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to SReT

towhee
Towhee is a framework that is dedicated to making neural data processing pipelines simple and fast.
Stars: ✭ 821 (+1509.8%)
Mutual labels:  vit, vision-transformer
mobilevit-pytorch
A PyTorch implementation of "MobileViT: Light-weight, General-purpose, and Mobile-friendly Vision Transformer".
Stars: ✭ 349 (+584.31%)
Mutual labels:  vit, vision-transformer
transformer-ls
Official PyTorch Implementation of Long-Short Transformer (NeurIPS 2021).
Stars: ✭ 201 (+294.12%)
Mutual labels:  vision-transformer, efficient-transformers
pytorch-vit
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
Stars: ✭ 250 (+390.2%)
Mutual labels:  vit, vision-transformer
PASSL
PASSL包含 SimCLR,MoCo v1/v2,BYOL,CLIP,PixPro,BEiT,MAE等图像自监督算法以及 Vision Transformer,DEiT,Swin Transformer,CvT,T2T-ViT,MLP-Mixer,XCiT,ConvNeXt,PVTv2 等基础视觉算法
Stars: ✭ 134 (+162.75%)
Mutual labels:  vit, vision-transformer
LaTeX-OCR
pix2tex: Using a ViT to convert images of equations into LaTeX code.
Stars: ✭ 1,566 (+2970.59%)
Mutual labels:  vit, vision-transformer
FFCSThingy2.0
A course scheduling tool for FFCS in VIT, Vellore. Easily adaptable to any schedule/timetable. https://discord.com/invite/Un4UanH
Stars: ✭ 15 (-70.59%)
Mutual labels:  vit
awesome-efficient-gnn
Code and resources on scalable and efficient Graph Neural Networks
Stars: ✭ 498 (+876.47%)
Mutual labels:  efficient-neural-networks
Visual-Transformer-Paper-Summary
Summary of Transformer applications for computer vision tasks.
Stars: ✭ 51 (+0%)
Mutual labels:  vit
PLSC
Paddle Large Scale Classification Tools,supports ArcFace, CosFace, PartialFC, Data Parallel + Model Parallel. Model includes ResNet, ViT, DeiT, FaceViT.
Stars: ✭ 113 (+121.57%)
Mutual labels:  vit
cape
Continuous Augmented Positional Embeddings (CAPE) implementation for PyTorch
Stars: ✭ 29 (-43.14%)
Mutual labels:  vit
libai
LiBai(李白): A Toolbox for Large-Scale Distributed Parallel Training
Stars: ✭ 284 (+456.86%)
Mutual labels:  vision-transformer
BA-Transformer
[MICCAI 2021] Boundary-aware Transformers for Skin Lesion Segmentation
Stars: ✭ 86 (+68.63%)
Mutual labels:  transformer-architecture
HugsVision
HugsVision is a easy to use huggingface wrapper for state-of-the-art computer vision
Stars: ✭ 154 (+201.96%)
Mutual labels:  vit
mmwave-gesture-recognition
Basic Gesture Recognition Using mmWave Sensor - TI AWR1642
Stars: ✭ 32 (-37.25%)
Mutual labels:  transformer-architecture
keras-vision-transformer
The Tensorflow, Keras implementation of Swin-Transformer and Swin-UNET
Stars: ✭ 91 (+78.43%)
Mutual labels:  vision-transformer
ChangeFormer
Official PyTorch implementation of our IGARSS'22 paper: A Transformer-Based Siamese Network for Change Detection
Stars: ✭ 220 (+331.37%)
Mutual labels:  transformer-architecture
swin-transformer-pytorch
Implementation of the Swin Transformer in PyTorch.
Stars: ✭ 610 (+1096.08%)
Mutual labels:  transformer-architecture
pytorch-cifar-model-zoo
Implementation of Conv-based and Vit-based networks designed for CIFAR.
Stars: ✭ 62 (+21.57%)
Mutual labels:  vision-transformer
SAN
[ECCV 2020] Scale Adaptive Network: Learning to Learn Parameterized Classification Networks for Scalable Input Images
Stars: ✭ 41 (-19.61%)
Mutual labels:  efficient-neural-networks

Sliced Recursive Transformer (SReT)

Pytorch implementation of our paper: Sliced Recursive Transformer (ECCV 2022), Zhiqiang Shen, Zechun Liu and Eric Xing.

FLOPs and Params Comparison

Our Approach

  • Recursion operation:
  • Sliced Group Self-Attention:

Abstract

We present a neat yet effective recursive operation on vision transformers that can improve parameter utilization without involving additional parameters. This is achieved by sharing weights across the depth of transformer networks. The proposed method can obtain a substantial gain of about 2% simply using naive recursive operation, requires no special or sophisticated knowledge for designing principles of networks, and introduces minimal computational overhead to the training procedure. To reduce the additional computation caused by recursive operation while maintaining the superior accuracy, we propose an approximating method through multiple sliced group self-attentions across recursive layers which can reduce the cost consumption by 10~30% with minimal performance loss. We call our model Sliced Recursive Transformer (SReT), a novel and parameter-efficient vision transformer design that is compatible with a broad range of other designs for efficient ViT architectures. Our best model establishes significant improvement on ImageNet-1K over state-of-the-art methods while containing fewer parameters. The flexible scalability has shown great potential for scaling up models and constructing extremely deep vision transformers.

SReT Models

Install timm using:

pip install git+https://github.com/rwightman/pytorch-image-models.git

Create SReT models:

import torch
import SReT

model = SReT.SReT_S(pretrained=False)
print(model(torch.randn(1, 3, 224, 224)))
...

Load pre-trained SReT models:

import torch
import SReT

model = SReT.SReT_S(pretrained=False)
model.load_state_dict(torch.load('./pre-trained/SReT_S.pth')['model'])
print(model(torch.randn(1, 3, 224, 224)))
...

Train SReT models with knowledge distillation (recommend training with FKD, which is faster with higher performance):

import torch
import 
import SReT
import kd_loss

criterion_kd = kd_loss.KDLoss()

model = SReT.SReT_S_distill(pretrained=False)
student_outputs = model(images)
...
# we use the soft label only for distillation procedure as MEAL V2
# Note that 'student_outputs' and 'teacher_outputs' are logits before softmax
loss = criterion_kd(student_outputs/T, teacher_outputs/T)
...

Pre-trained Model

We currently provide the last epoch checkpoints and will add the best ones together with more models soon. (⋇ indicates without slice.) We notice that using a larger initial lr (0.001 $\times$ $batchsize \over 512$) with longer warmup epochs = 30 can obtain better results on SReT.

Model FLOPs #params accuracy weights (last) weights (best) logs configurations
SReT_⋇T 1.4G 4.8M 76.1 link TBA link link
SReT_T 1.1G 4.8M 76.0 link TBA link link
SReT_⋇LT 1.4G 5.0M 76.8 link TBA link link
SReT_LT [8-4-1,2-1-1] 1.2 G 5.0M 76.7 link TBA link link
SReT_LT [16-14-1,1-1-1] 1.2 G 5.0M 76.6 link TBA link link
SReT_⋇S 4.7G 20.9M 82.0 link TBA link link
SReT_S 4.2G 20.9M 81.9 link TBA link link
SReT_⋇T_Distill 1.4G 4.8M 77.7 link TBA link link
SReT_T_Distill 1.1G 4.8M 77.6 link TBA link link
SReT_⋇LT_Distill 1.4G 5.0M 77.9 link TBA link link
SReT_LT_Distill 1.2G 5.0M 77.7 link TBA link link
SReT_⋇T_Distill_Finetune384 6.4G 4.9M 79.7 link TBA link link
SReT_⋇S_Distill_Finetune384 18.5G 21.0M 83.8 link TBA link link
SReT_⋇S_Distill_Finetune512 42.8G 21.3M 84.3 link TBA link link

Citation

If you find our code is helpful for your research, please cite:

@article{shen2021sliced,
      title={Sliced Recursive Transformer}, 
      author={Zhiqiang Shen and Zechun Liu and Eric Xing},
      year={2021},
      journal={arXiv preprint arXiv:2111.05297}
}

Contact

Zhiqiang Shen (zhiqiangshen0214 at gmail.com or zhiqians at andrew.cmu.edu)

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].