All Projects → zihangJiang → TokenLabeling

zihangJiang / TokenLabeling

Licence: Apache-2.0 license
Pytorch implementation of "All Tokens Matter: Token Labeling for Training Better Vision Transformers"

Programming Languages

Jupyter Notebook
11667 projects
python
139335 projects - #7 most used programming language
shell
77523 projects

Projects that are alternatives of or similar to TokenLabeling

nested-transformer
Nested Hierarchical Transformer https://arxiv.org/pdf/2105.12723.pdf
Stars: ✭ 174 (-54.81%)
Mutual labels:  transformer, vision, imagenet
FNet-pytorch
Unofficial implementation of Google's FNet: Mixing Tokens with Fourier Transforms
Stars: ✭ 204 (-47.01%)
Mutual labels:  transformer, vision
image-classification
A collection of SOTA Image Classification Models in PyTorch
Stars: ✭ 70 (-81.82%)
Mutual labels:  transformer, imagenet
HRFormer
This is an official implementation of our NeurIPS 2021 paper "HRFormer: High-Resolution Transformer for Dense Prediction".
Stars: ✭ 357 (-7.27%)
Mutual labels:  transformer, vision
visualization
a collection of visualization function
Stars: ✭ 189 (-50.91%)
Mutual labels:  transformer, vision
Ghostnet
CV backbones including GhostNet, TinyNet and TNT, developed by Huawei Noah's Ark Lab.
Stars: ✭ 1,744 (+352.99%)
Mutual labels:  transformer, imagenet
EfficientMORL
EfficientMORL (ICML'21)
Stars: ✭ 22 (-94.29%)
Mutual labels:  vision
BAKE
Self-distillation with Batch Knowledge Ensembling Improves ImageNet Classification
Stars: ✭ 79 (-79.48%)
Mutual labels:  imagenet
keras-vision-transformer
The Tensorflow, Keras implementation of Swin-Transformer and Swin-UNET
Stars: ✭ 91 (-76.36%)
Mutual labels:  transformer
bytekit
Java 字节操作的工具库(不是字节码的工具库)
Stars: ✭ 40 (-89.61%)
Mutual labels:  transformer
sparql-transformer
A more handy way to use SPARQL data in your web app
Stars: ✭ 38 (-90.13%)
Mutual labels:  transformer
kaggle-champs
Code for the CHAMPS Predicting Molecular Properties Kaggle competition
Stars: ✭ 49 (-87.27%)
Mutual labels:  transformer
php-serializer
Serialize PHP variables, including objects, in any format. Support to unserialize it too.
Stars: ✭ 47 (-87.79%)
Mutual labels:  transformer
cozmo-tensorflow
🤖 Cozmo the Robot recognizes objects with TensorFlow
Stars: ✭ 61 (-84.16%)
Mutual labels:  imagenet
Cross-lingual-Summarization
Zero-Shot Cross-Lingual Abstractive Sentence Summarization through Teaching Generation and Attention
Stars: ✭ 28 (-92.73%)
Mutual labels:  transformer
Representation-Learning-for-Information-Extraction
Pytorch implementation of Paper by Google Research - Representation Learning for Information Extraction from Form-like Documents.
Stars: ✭ 82 (-78.7%)
Mutual labels:  transformer
TransBTS
This repo provides the official code for : 1) TransBTS: Multimodal Brain Tumor Segmentation Using Transformer (https://arxiv.org/abs/2103.04430) , accepted by MICCAI2021. 2) TransBTSV2: Towards Better and More Efficient Volumetric Segmentation of Medical Images(https://arxiv.org/abs/2201.12785).
Stars: ✭ 254 (-34.03%)
Mutual labels:  transformer
Awesome-low-level-vision-resources
A curated list of resources for Low-level Vision Tasks
Stars: ✭ 35 (-90.91%)
Mutual labels:  transformer
ru-dalle
Generate images from texts. In Russian
Stars: ✭ 1,606 (+317.14%)
Mutual labels:  transformer
cape
Continuous Augmented Positional Embeddings (CAPE) implementation for PyTorch
Stars: ✭ 29 (-92.47%)
Mutual labels:  transformer

All Tokens Matter: Token Labeling for Training Better Vision Transformers (arxiv)

This is a Pytorch implementation of our paper.

Compare

Comparison between the proposed LV-ViT and other recent works based on transformers. Note that we only show models whose model sizes are under 100M.

Our codes are based on the pytorch-image-models by Ross Wightman.

Update

2021.7: Add script to generate label data.

2021.6: Support pip install tlt to use our Token Labeling Toolbox for image models.

2021.6: Release training code and segmentation model.

2021.4: Release LV-ViT models.

LV-ViT Models

Model layer dim Image resolution Param Top 1 Download
LV-ViT-T 12 240 224 8.53M 79.1 link
LV-ViT-S 16 384 224 26.15M 83.3 link
LV-ViT-S 16 384 384 26.30M 84.4 link
LV-ViT-M 20 512 224 55.83M 84.0 link
LV-ViT-M 20 512 384 56.03M 85.4 link
LV-ViT-M 20 512 448 56.13M 85.5 link
LV-ViT-L 24 768 448 150.47M 86.2 link
LV-ViT-L 24 768 512 150.66M 86.4 link

Requirements

torch>=1.4.0 torchvision>=0.5.0 pyyaml scipy timm==0.4.5

data prepare: ImageNet with the following folder structure, you can extract imagenet by this script.

│imagenet/
├──train/
│  ├── n01440764
│  │   ├── n01440764_10026.JPEG
│  │   ├── n01440764_10027.JPEG
│  │   ├── ......
│  ├── ......
├──val/
│  ├── n01440764
│  │   ├── ILSVRC2012_val_00000293.JPEG
│  │   ├── ILSVRC2012_val_00002138.JPEG
│  │   ├── ......
│  ├── ......

Validation

Replace DATA_DIR with your imagenet validation set path and MODEL_DIR with the checkpoint path

CUDA_VISIBLE_DEVICES=0 bash eval.sh /path/to/imagenet/val /path/to/checkpoint

Label data

We provide NFNet-F6 generated dense label map in Google Drive and BaiDu Yun (password: y6j2). As NFNet-F6 are based on pure ImageNet data, no extra training data is involved.

Training

Train the LV-ViT-S:

If only 4 GPUs are available,

CUDA_VISIBLE_DEVICES=0,1,2,3 ./distributed_train.sh 4 /path/to/imagenet --model lvvit_s -b 256 --apex-amp --img-size 224 --drop-path 0.1 --token-label --token-label-data /path/to/label_data --token-label-size 14 --model-ema

If 8 GPUs are available:

CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 ./distributed_train.sh 8 /path/to/imagenet --model lvvit_s -b 128 --apex-amp --img-size 224 --drop-path 0.1 --token-label --token-label-data /path/to/label_data --token-label-size 14 --model-ema

Train the LV-ViT-M and LV-ViT-L (run on 8 GPUs):

CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 ./distributed_train.sh 8 /path/to/imagenet --model lvvit_m -b 128 --apex-amp --img-size 224 --drop-path 0.2 --token-label --token-label-data /path/to/label_data --token-label-size 14 --model-ema
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 ./distributed_train.sh 8 /path/to/imagenet --model lvvit_l -b 128 --lr 1.e-3 --aa rand-n3-m9-mstd0.5-inc1 --apex-amp --img-size 224 --drop-path 0.3 --token-label --token-label-data /path/to/label_data --token-label-size 14 --model-ema

If you want to train our LV-ViT on images with 384x384 resolution, please use --img-size 384 --token-label-size 24.

Fine-tuning

To Fine-tune the pre-trained LV-ViT-S on images with 384x384 resolution:

CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 ./distributed_train.sh 8 /path/to/imagenet --model lvvit_s -b 64 --apex-amp --img-size 384 --drop-path 0.1 --token-label --token-label-data /path/to/label_data --token-label-size 24 --lr 5.e-6 --min-lr 5.e-6 --weight-decay 1.e-8 --finetune /path/to/checkpoint

To Fine-tune the pre-trained LV-ViT-S on other datasets without token labeling:

CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 ./distributed_train.sh 8 /path/to/dataset --model lvvit_s -b 64 --apex-amp --img-size 224 --drop-path 0.1 --token-label --token-label-size 14 --dense-weight 0.0 --num-classes $NUM_CLASSES --finetune /path/to/checkpoint

Segmentation

Our Segmentation model are fully based upon the MMSegmentation Toolkit. The model and config files are under seg/ folder which follow the same folder structure. You can simply drop in these file to get start.

git clone https://github.com/open-mmlab/mmsegmentation # and install

cp seg/mmseg/models/backbones/vit.py mmsegmentation/mmseg/models/backbones/
cp -r seg/configs/lvvit mmsegmentation/configs/

# test upernet+lvvit_s (add --aug-test to test on multi scale)
cd mmsegmentation
./tools/dist_test.sh configs/lvvit/upernet_lvvit_s_512x512_160k_ade20k.py /path/to/checkpoint 8 --eval mIoU [--aug-test]
Backbone Method Crop size Lr Schd mIoU mIoU(ms) Pixel Acc. Param Download
LV-ViT-S UperNet 512x512 160k 47.9 48.6 83.1 44M link
LV-ViT-M UperNet 512x512 160k 49.4 50.6 83.5 77M link
LV-ViT-L UperNet 512x512 160k 50.9 51.8 84.1 209M link

Visualization

We apply the visualization method in this repo to visualize the parts of the image that led to a certain classification for DeiT-Base and our LV-ViT-S. The parts of the image that used by the network to make the decision are highlighted in red.

Compare

Label generation

To generate token label data for training:

python3 generate_label.py /path/to/imagenet/train /path/to/save/label_top5_train_nfnet --model dm_nfnet_f6 --pretrained --img-size 576 -b 32 --crop-pct 1.0

Reference

If you use this repo or find it useful, please consider citing:

@inproceedings{NEURIPS2021_9a49a25d,
 author = {Jiang, Zi-Hang and Hou, Qibin and Yuan, Li and Zhou, Daquan and Shi, Yujun and Jin, Xiaojie and Wang, Anran and Feng, Jiashi},
 booktitle = {Advances in Neural Information Processing Systems},
 editor = {M. Ranzato and A. Beygelzimer and Y. Dauphin and P.S. Liang and J. Wortman Vaughan},
 pages = {18590--18602},
 publisher = {Curran Associates, Inc.},
 title = {All Tokens Matter: Token Labeling for Training Better Vision Transformers},
 url = {https://proceedings.neurips.cc/paper/2021/file/9a49a25d845a483fae4be7e341368e36-Paper.pdf},
 volume = {34},
 year = {2021}
}

Related projects

T2T-ViT, Re-labeling ImageNet, MMSegmentation, Transformer Explainability.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].