All Projects → yitu-opensource → T2t Vit

yitu-opensource / T2t Vit

Licence: other

Projects that are alternatives of or similar to T2t Vit

Detectorch
Detectorch - detectron for PyTorch
Stars: ✭ 566 (-1.57%)
Mutual labels:  jupyter-notebook
Dogs vs cats
猫狗大战
Stars: ✭ 570 (-0.87%)
Mutual labels:  jupyter-notebook
Codereader
跟大咖一起读源码
Stars: ✭ 572 (-0.52%)
Mutual labels:  jupyter-notebook
Ttur
Two time-scale update rule for training GANs
Stars: ✭ 567 (-1.39%)
Mutual labels:  jupyter-notebook
Tutorials
CatBoost tutorials repository
Stars: ✭ 563 (-2.09%)
Mutual labels:  jupyter-notebook
Cleverhans
An adversarial example library for constructing attacks, building defenses, and benchmarking both
Stars: ✭ 5,356 (+831.48%)
Mutual labels:  jupyter-notebook
Fastcore
Python supercharged for the fastai library
Stars: ✭ 565 (-1.74%)
Mutual labels:  jupyter-notebook
Gym Trading
Environment for reinforcement-learning algorithmic trading models
Stars: ✭ 574 (-0.17%)
Mutual labels:  jupyter-notebook
Practical python programming
北邮《Python编程与实践》课程资料
Stars: ✭ 569 (-1.04%)
Mutual labels:  jupyter-notebook
Wgan Tensorflow
a tensorflow implementation of WGAN
Stars: ✭ 572 (-0.52%)
Mutual labels:  jupyter-notebook
Datasets For Recommender Systems
This is a repository of a topic-centric public data sources in high quality for Recommender Systems (RS)
Stars: ✭ 564 (-1.91%)
Mutual labels:  jupyter-notebook
Bert score
BERT score for text generation
Stars: ✭ 568 (-1.22%)
Mutual labels:  jupyter-notebook
Data Science Notes
数据科学的笔记以及资料搜集
Stars: ✭ 6,072 (+956%)
Mutual labels:  jupyter-notebook
Machine Learning Specialization
Stars: ✭ 566 (-1.57%)
Mutual labels:  jupyter-notebook
Eng Edu
Stars: ✭ 573 (-0.35%)
Mutual labels:  jupyter-notebook
Hands On Machine Learning For Algorithmic Trading
Hands-On Machine Learning for Algorithmic Trading, published by Packt
Stars: ✭ 562 (-2.26%)
Mutual labels:  jupyter-notebook
Cs231
My corrections for the Standford class assingments CS231n - Convolutional Neural Networks for Visual Recognition
Stars: ✭ 570 (-0.87%)
Mutual labels:  jupyter-notebook
Agegenderdeeplearning
Stars: ✭ 575 (+0%)
Mutual labels:  jupyter-notebook
Ml Design Patterns
Source code accompanying O'Reilly book: Machine Learning Design Patterns
Stars: ✭ 566 (-1.57%)
Mutual labels:  jupyter-notebook
Deeplearning.ai
Some work of Andrew Ng's course on Coursera
Stars: ✭ 572 (-0.52%)
Mutual labels:  jupyter-notebook

Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet, arxiv

Update:

2021/03/11: update our new results. Now our T2T-ViT-14 with 21.5M parameters can reach 81.5% top1-acc with 224x224 image resolution, and 83.3% top1-acc with 384x384 resolution.

2021/02/21: T2T-ViT can be trained on most of common GPUs: 1080Ti, 2080Ti, TiTAN V, V100 stably with '--amp' (Automatic Mixed Precision). In some specifical GPU like Tesla T4, 'amp' would cause NAN loss when training T2T-ViT. If you get NAN loss in training, you can disable amp by removing '--amp' in the training scripts.

2021/01/28: release codes and upload most of the pretrained models of T2T-ViT.

Our codes are based on the official imagenet example by PyTorch and pytorch-image-models by Ross Wightman

Requirements

timm, pip install timm==0.3.4

torch>=1.4.0

torchvision>=0.5.0

pyyaml

data prepare: ImageNet with the following folder structure, you can extract imagenet by this script.

│imagenet/
├──train/
│  ├── n01440764
│  │   ├── n01440764_10026.JPEG
│  │   ├── n01440764_10027.JPEG
│  │   ├── ......
│  ├── ......
├──val/
│  ├── n01440764
│  │   ├── ILSVRC2012_val_00000293.JPEG
│  │   ├── ILSVRC2012_val_00002138.JPEG
│  │   ├── ......
│  ├── ......

T2T-ViT Models

Model T2T Transformer Top1 Acc #params MACs Download
T2T-ViT-14 Performer 81.5 21.5M 5.2G here
T2T-ViT-19 Performer 81.9 39.2M 8.9G here
T2T-ViT-24 Performer 82.3 64.1M 14.1G here
T2T-ViT-14, 384 Performer 83.3 21.7M here
T2T-ViT_t-14 Transformer 81.7 21.5M 6.1G here
T2T-ViT_t-19 Transformer 82.4 39.2M 9.8G here
T2T-ViT_t-24 Transformer 82.6 64.1M 15.0G here

The 'T2T-ViT-14, 384' means we train T2T-ViT-14 with image size of 384 x 384.

The three lite variants of T2T-ViT (Comparing with MobileNets): | Model | T2T Transformer | Top1 Acc | #params | MACs | Download| | :--- | :---: | :---: | :---: | :---: | :---: | | T2T-ViT-7 | Performer | 71.7 | 4.3M | 1.2G | here| | T2T-ViT-10 | Performer | 75.2 | 5.9M | 1.8G | here| | T2T-ViT-12 | Performer | 76.5 | 6.9M | 2.2G | here |

Validation

Test the T2T-ViT-14 (take Performer in T2T module),

Download the T2T-ViT-14, then test it by running:

CUDA_VISIBLE_DEVICES=0 python main.py path/to/data --model T2t_vit_14 -b 100 --eval_checkpoint path/to/checkpoint

The results look like:

Test: [   0/499]  Time: 2.083 (2.083)  Loss:  0.3578 (0.3578)  [email protected]: 96.0000 (96.0000)  [email protected]5: 99.0000 (99.0000)
Test: [  50/499]  Time: 0.166 (0.202)  Loss:  0.5823 (0.6404)  [email protected]: 85.0000 (86.1569)  [email protected]: 99.0000 (97.5098)
...
Test: [ 499/499]  Time: 0.272 (0.172)  Loss:  1.3983 (0.8261)  [email protected]: 62.0000 (81.5000)  [email protected]: 93.0000 (95.6660)
Top-1 accuracy of the model is: 81.5%

Test the three lite variants: T2T-ViT-7, T2T-ViT-10, T2T-ViT-12 (take Performer in T2T module),

Download the T2T-ViT-7, T2T-ViT-10 or T2T-ViT-12, then test it by running:

CUDA_VISIBLE_DEVICES=0 python main.py path/to/data --model T2t_vit_7 -b 100 --eval_checkpoint path/to/checkpoint

Test the model T2T-ViT-14, 384 with 83.3% top-1 accuracy:

CUDA_VISIBLE_DEVICES=0 python main.py path/to/data --model T2t_vit_14 --img-size 384 -b 100 --eval_checkpoint path/to/T2T-ViT-14-384 

Train

Train the three lite variants: T2T-ViT-7, T2T-ViT-10 and T2T-ViT-12 (take Performer in T2T module):

If only 4 GPUs are available,

CUDA_VISIBLE_DEVICES=0,1,2,3 ./distributed_train.sh 4 path/to/data --model T2t_vit_7 -b 128 --lr 1e-3 --weight-decay .03 --amp --img-size 224

The top1-acc in 4 GPUs would be slightly lower than 8 GPUs (around 0.1%-0.3% lower).

If 8 GPUs are available:

CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 ./distributed_train.sh 8 path/to/data --model T2t_vit_7 -b 64 --lr 1e-3 --weight-decay .03 --amp --img-size 224

Train the T2T-ViT-14 and T2T-ViT_t-14:

CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 ./distributed_train.sh 8 path/to/data --model T2t_vit_14 -b 64 --lr 5e-4 --weight-decay .05 --amp --img-size 224

If you want to train our T2T-ViT on images with 384x384 resolution, please use '--img-size 384'.

Train the T2T-ViT-19, T2T-ViT-24 or T2T-ViT_t-19, T2T-ViT_t-24:

CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 ./distributed_train.sh 8 path/to/data --model T2t_vit_19 -b 64 --lr 5e-4 --weight-decay .065 --amp --img-size 224

Visualization

Visualize the image features of ResNet50, you can open and run the visualization_resnet.ipynb file in jupyter notebook or jupyter lab; some results are given as following:

Visualize the image features of ViT, you can open and run the visualization_vit.ipynb file in jupyter notebook or jupyter lab; some results are given as following:

Visualize attention map, you can refer to this file. A simple example by visualizing the attention map in attention block 4 and 5 is:

Reference

If you find this repo useful, please consider citing:

@article{yuan2021tokens,
  title={Tokens-to-token vit: Training vision transformers from scratch on imagenet},
  author={Yuan, Li and Chen, Yunpeng and Wang, Tao and Yu, Weihao and Shi, Yujun and Tay, Francis EH and Feng, Jiashi and Yan, Shuicheng},
  journal={arXiv preprint arXiv:2101.11986},
  year={2021}
}
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].