All Projects → zhuang-group → LIT

zhuang-group / LIT

Licence: Apache-2.0 license
[AAAI 2022] This is the official PyTorch implementation of "Less is More: Pay Less Attention in Vision Transformers"

Programming Languages

python
139335 projects - #7 most used programming language
Cuda
1817 projects
C++
36643 projects - #6 most used programming language
shell
77523 projects

Projects that are alternatives of or similar to LIT

pytorch-vit
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
Stars: ✭ 250 (+216.46%)
Mutual labels:  transformers, image-recognition
BottleneckTransformers
Bottleneck Transformers for Visual Recognition
Stars: ✭ 231 (+192.41%)
Mutual labels:  transformers, image-recognition
Simpletransformers
Transformers for Classification, NER, QA, Language Modelling, Language Generation, T5, Multi-Modal, and Conversational AI
Stars: ✭ 2,881 (+3546.84%)
Mutual labels:  transformers
Transformer-Implementations
Library - Vanilla, ViT, DeiT, BERT, GPT
Stars: ✭ 34 (-56.96%)
Mutual labels:  transformers
COCO-LM
[NeurIPS 2021] COCO-LM: Correcting and Contrasting Text Sequences for Language Model Pretraining
Stars: ✭ 109 (+37.97%)
Mutual labels:  transformers
Nn
🧑‍🏫 50! Implementations/tutorials of deep learning papers with side-by-side notes 📝; including transformers (original, xl, switch, feedback, vit, ...), optimizers (adam, adabelief, ...), gans(cyclegan, stylegan2, ...), 🎮 reinforcement learning (ppo, dqn), capsnet, distillation, ... 🧠
Stars: ✭ 5,720 (+7140.51%)
Mutual labels:  transformers
TensorFlow-Binary-Image-Classification-using-CNN-s
Binary Image Classification in TensorFlow
Stars: ✭ 26 (-67.09%)
Mutual labels:  image-recognition
Transmogrifai
TransmogrifAI (pronounced trăns-mŏgˈrə-fī) is an AutoML library for building modular, reusable, strongly typed machine learning workflows on Apache Spark with minimal hand-tuning
Stars: ✭ 2,084 (+2537.97%)
Mutual labels:  transformers
thermostat
Collection of NLP model explanations and accompanying analysis tools
Stars: ✭ 126 (+59.49%)
Mutual labels:  transformers
Learnable-Image-Resizing
TF 2 implementation Learning to Resize Images for Computer Vision Tasks (https://arxiv.org/abs/2103.09950v1).
Stars: ✭ 48 (-39.24%)
Mutual labels:  image-recognition
tensorflow-image-recognition-chrome-extension
Chrome browser extension for using TensorFlow image recognition on web pages
Stars: ✭ 88 (+11.39%)
Mutual labels:  image-recognition
nlp-papers
Must-read papers on Natural Language Processing (NLP)
Stars: ✭ 87 (+10.13%)
Mutual labels:  transformers
Pytorch Sentiment Analysis
Tutorials on getting started with PyTorch and TorchText for sentiment analysis.
Stars: ✭ 3,209 (+3962.03%)
Mutual labels:  transformers
object-flaw-detector-cpp
Detect various irregularities of a product as it moves along a conveyor belt.
Stars: ✭ 19 (-75.95%)
Mutual labels:  image-recognition
Dalle Pytorch
Implementation / replication of DALL-E, OpenAI's Text to Image Transformer, in Pytorch
Stars: ✭ 3,661 (+4534.18%)
Mutual labels:  transformers
gpl
Powerful unsupervised domain adaptation method for dense retrieval. Requires only unlabeled corpus and yields massive improvement: "GPL: Generative Pseudo Labeling for Unsupervised Domain Adaptation of Dense Retrieval" https://arxiv.org/abs/2112.07577
Stars: ✭ 216 (+173.42%)
Mutual labels:  transformers
Spark Nlp
State of the Art Natural Language Processing
Stars: ✭ 2,518 (+3087.34%)
Mutual labels:  transformers
TF2DeepFloorplan
TF2 Deep FloorPlan Recognition using a Multi-task Network with Room-boundary-Guided Attention. Enable tensorboard, quantization, flask, tflite, docker, github actions and google colab.
Stars: ✭ 98 (+24.05%)
Mutual labels:  image-recognition
KoBERT-Transformers
KoBERT on 🤗 Huggingface Transformers 🤗 (with Bug Fixed)
Stars: ✭ 162 (+105.06%)
Mutual labels:  transformers
sharpmask
TensorFlow implementation of DeepMask and SharpMask
Stars: ✭ 31 (-60.76%)
Mutual labels:  image-recognition

Less is More: Pay Less Attention in Vision Transformers

License PyTorch

This is the official PyTorch implementation of AAAI 2022 paper: Less is More: Pay Less Attention in Vision Transformers.

By Zizheng Pan, Bohan Zhuang, Haoyu He, Jing Liu and Jianfei Cai.

In our paper, we present a novel Less attention vIsion Transformer (LIT), building upon the fact that the early self-attention layers in Transformers still focus on local patterns and bring minor benefits in recent hierarchical vision Transformers. LIT uses pure multi-layer perceptrons (MLPs) to encode rich local patterns in the early stages while applying self-attention modules to capture longer dependencies in deeper layers. Moreover, we further propose a learned deformable token merging module to adaptively fuse informative patches in a non-uniform manner.

If you use this code for a paper please cite:

@inproceedings{pan2022litv1,
  title={Less is More: Pay Less Attention in Vision Transformers},
  author={Pan, Zizheng and Zhuang, Bohan and He, Haoyu and Liu, Jing and Cai, Jianfei},
  booktitle = {AAAI},
  year={2022}
}

Updates

  • 19/06/2022. We introduce LITv2, a faster and better Vision Transformer with a novel efficient HiLo attention. Code and pretrained weights have also been released here.

  • 10/03/2022. Add visualisation code for attention maps in Figure 3. Please refer to here.

Usage

First, clone this repository.

git clone [email protected]:ziplab/LIT.git

Next, create a conda virtual environment.

# Make sure you have a NVIDIA GPU.
cd LIT/classification
bash setup_env.sh [conda_install_path] [env_name]

# For example
bash setup_env.sh /home/anaconda3 lit

Note: We use PyTorch 1.7.1 with CUDA 10.1 for all experiments. The setup_env.sh has illustrated all dependencies we used in our experiments. You may want to edit this file to install a different version of PyTorch or any other packages.

Image Classification on ImageNet

We provide baseline LIT models pretrained on ImageNet-1K. For training and evaluation code, please refer to classification.

Name Params (M) FLOPs (G) Top-1 Acc. (%) Model Log
LIT-Ti 19 3.6 81.1 google drive/github log
LIT-S 27 4.1 81.5 google drive/github log
LIT-M 48 8.6 83.0 google drive/github log
LIT-B 86 15.0 83.4 google drive/github log

Object Detection on COCO

For training and evaluation code, please refer to detection.

RetinaNet

Backbone Params (M) Lr schd box mAP Config Model Log
LIT-Ti 30 1x 41.6 config github log
LIT-S 39 1x 41.6 config github log

Mask R-CNN

Backbone Params (M) Lr schd box mAP mask mAP Config Model Log
LIT-Ti 40 1x 42.0 39.1 config github log
LIT-S 48 1x 42.9 39.6 config github log

Semantic Segmentation on ADE20K

For training and evaluation code, please refer to segmentation.

Semantic FPN

Backbone Params (M) Iters mIoU Config Model Log
LIT-Ti 24 8k 41.3 config github log
LIT-S 32 8k 41.7 config github log

Offsets Visualisation

dpm_vis

We provide a script for visualising the learned offsets by the proposed deformable token merging modules (DTM). For example,

# activate your virtual env
conda activate lit
cd classification/code_for_lit_ti

# visualise
python visualize_offset.py --model lit_ti --resume [path/to/lit_ti.pth] --vis_image visualization/demo.JPEG

The plots will be automatically saved under visualization/, with a folder named by the name of the example image.

Attention Map Visualisation

We provide our method for visualising the attention maps in Figure 3. To save your time, we also provide the pretrained model for PVT with standard MSA in all stages.

Name Params (M) FLOPs (G) Top-1 Acc. (%) Model Log
PVT w/ MSA 20 8.4 80.9 github log
conda activate lit
cd classification/code_for_lit_ti

# visualise
# by default, we save the results under 'classification/code_for_lit_ti/attn_results'
python generate_attention_maps.py --data-path [/path/to/imagenet] --resume [/path/to/pvt_full_msa.pth]

The resulting folder contains the following items,

.
├── attention_map
│   ├── stage-0
│   │   ├── block0
│   │   │   └── pixel-1260-block-0-head-0.png
│   │   ├── block1
│   │   │   └── pixel-1260-block-1-head-0.png
│   │   └── block2
│   │       └── pixel-1260-block-2-head-0.png
│   ├── stage-1
│   ├── stage-2
│   └── stage-3
└── full_msa_eval_maps.npy

where full_msa_eval_maps.npy contains the saved attention maps in each block and each stage. The folder attention_map contains the visualisation results.

License

This repository is released under the Apache 2.0 license as found in the LICENSE file.

Acknowledgement

This repository has adopted codes from DeiT, PVT and Swin, we thank the authors for their open-sourced code.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].