All Projects → zhuang-group → HVT

zhuang-group / HVT

Licence: Apache-2.0 License
[ICCV 2021] Official implementation of "Scalable Vision Transformers with Hierarchical Pooling"

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to HVT

text2keywords
Trained T5 and T5-large model for creating keywords from text
Stars: ✭ 53 (+103.85%)
Mutual labels:  transformers
TorchBlocks
A PyTorch-based toolkit for natural language processing
Stars: ✭ 85 (+226.92%)
Mutual labels:  transformers
text2text
Text2Text: Cross-lingual natural language processing and generation toolkit
Stars: ✭ 188 (+623.08%)
Mutual labels:  transformers
ttt
A package for fine-tuning Transformers with TPUs, written in Tensorflow2.0+
Stars: ✭ 35 (+34.62%)
Mutual labels:  transformers
robo-vln
Pytorch code for ICRA'21 paper: "Hierarchical Cross-Modal Agent for Robotics Vision-and-Language Navigation"
Stars: ✭ 34 (+30.77%)
Mutual labels:  transformers
Text and Audio classification with Bert
Text Classification in Turkish Texts with Bert
Stars: ✭ 34 (+30.77%)
Mutual labels:  transformers
eve-bot
EVE bot, a customer service chatbot to enhance virtual engagement for Twitter Apple Support
Stars: ✭ 31 (+19.23%)
Mutual labels:  transformers
classy
classy is a simple-to-use library for building high-performance Machine Learning models in NLP.
Stars: ✭ 61 (+134.62%)
Mutual labels:  transformers
nuwa-pytorch
Implementation of NÜWA, state of the art attention network for text to video synthesis, in Pytorch
Stars: ✭ 347 (+1234.62%)
Mutual labels:  transformers
smaller-transformers
Load What You Need: Smaller Multilingual Transformers for Pytorch and TensorFlow 2.0.
Stars: ✭ 66 (+153.85%)
Mutual labels:  transformers
Product-Categorization-NLP
Multi-Class Text Classification for products based on their description with Machine Learning algorithms and Neural Networks (MLP, CNN, Distilbert).
Stars: ✭ 30 (+15.38%)
Mutual labels:  transformers
spark-transformers
Spark-Transformers: Library for exporting Apache Spark MLLIB models to use them in any Java application with no other dependencies.
Stars: ✭ 39 (+50%)
Mutual labels:  transformers
text2class
Multi-class text categorization using state-of-the-art pre-trained contextualized language models, e.g. BERT
Stars: ✭ 15 (-42.31%)
Mutual labels:  transformers
golgotha
Contextualised Embeddings and Language Modelling using BERT and Friends using R
Stars: ✭ 39 (+50%)
Mutual labels:  transformers
n-grammer-pytorch
Implementation of N-Grammer, augmenting Transformers with latent n-grams, in Pytorch
Stars: ✭ 50 (+92.31%)
Mutual labels:  transformers
small-text
Active Learning for Text Classification in Python
Stars: ✭ 241 (+826.92%)
Mutual labels:  transformers
minicons
Utility for analyzing Transformer based representations of language.
Stars: ✭ 28 (+7.69%)
Mutual labels:  transformers
OpenDialog
An Open-Source Package for Chinese Open-domain Conversational Chatbot (中文闲聊对话系统,一键部署微信闲聊机器人)
Stars: ✭ 94 (+261.54%)
Mutual labels:  transformers
C-Tran
General Multi-label Image Classification with Transformers
Stars: ✭ 106 (+307.69%)
Mutual labels:  transformers
serverless-transformers-on-aws-lambda
Deploy transformers serverless on AWS Lambda
Stars: ✭ 100 (+284.62%)
Mutual labels:  transformers

Scalable Vision Transformers with Hierarchical Pooling

License

This is the official PyTorch implementation of ICCV 2021 paper: Scalable Vision Transformers with Hierarchical Pooling.

By Zizheng Pan, Bohan Zhuang, Jing Liu, Haoyu He, and Jianfei Cai.

DeiT

In our paper, we propose a Hierarchical Visual Transformer (HVT) which progressively pools visual tokens to shrink the sequence length and hence reduces the computational cost, analogous to the feature maps downsampling in Convolutional Neural Networks (CNNs). Moreover, we empirically find that the average pooled visual tokens contain more discriminative information than the single class token.

If you use this code for a paper please cite:

@InProceedings{Pan_2021_ICCV,
    author    = {Pan, Zizheng and Zhuang, Bohan and Liu, Jing and He, Haoyu and Cai, Jianfei},
    title     = {Scalable Vision Transformers With Hierarchical Pooling},
    booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
    month     = {October},
    year      = {2021},
    pages     = {377-386}
}

Updates

  • 30/12/2021: HVT has been integrated into PaddleViT, checkout here for the 3rd party implementation!

Usage

First, clone the repository locally:

git clone https://github.com/MonashAI/HVT

Then, install PyTorch 1.7.0+ and torchvision 0.8.1+ and pytorch-image-models 0.3.2:

conda install -c pytorch pytorch torchvision
pip install timm==0.3.2

Data preparation

ImageNet

Download the ImageNet 2012 dataset from here, and prepare the dataset based on this script. The file structure should look like:

imagenet
├── train
│   ├── class1
│   │   ├── img1.jpeg
│   │   ├── img2.jpeg
│   │   └── ...
│   ├── class2
│   │   ├── img3.jpeg
│   │   └── ...
│   └── ...
└── val
    ├── class1
    │   ├── img4.jpeg
    │   ├── img5.jpeg
    │   └── ...
    ├── class2
    │   ├── img6.jpeg
    │   └── ...
    └── ...

CIFAR100

Download the CIFAR100 dataset from here.

Training

To train HVT-Ti-1 on ImageNet with 8 gpus, run:

python -m torch.distributed.launch --nproc_per_node=8 --use_env main.py --config config/hvt-ti-1.json --data-set IMNET --data-path [path/to/imagenet]

We also provide configuration files for HVT-S-1 and Scale HVT-Ti-4 under the config folder.

To train HVT-Ti-1 on CIFAR100, run:

python -m torch.distributed.launch --nproc_per_node=8 --use_env main.py --config config/hvt-ti-1.json --data-set CIFAR --data-path [path/to/cifar100]

Evaluation

To evaluate a model on ImageNet, e.g. HVT-S-1, run:

python main.py --config config/hvt-s-1.json --data-set IMNET --data-path [path/to/imagenet] --eval --resume [path/to/hvt_s_1.pth]

Scaling HVT

You can scale a HVT model with various settings, which is supported in the configuration file:

  • input_size: The input image size.
  • patch_size: The patch size that used to split an image.
  • num_heads: The number of self-attention heads in a MSA layer.
  • num_blocks: The number of Transformer blocks in a model.
  • pool_kernel_size: The kernel size of the pooling layer.
  • pool_stride: The stride of the pooling layer.
  • pool_block_width: The number of blocks for each stage.

Results on ImageNet

Main Results

Name FLOPs (G) Params (M) Top-1 Acc. (%) Top-5 Acc. (%) Model Log
HVT-Ti-1 0.64 5.74 69.64 89.40 github log
Scale HVT-Ti-4 1.39 22.12 75.23 92.30 github log
HVT-S-1 2.40 22.09 78.00 93.83 github log

Note that model weights and logs for HVT-Ti-1 and HVT-S-1 have been retrained.

More Pooling Stages with HVT-S

Name FLOPs (G) Params (M) Top-1 Acc. (%) Top-5 Acc. (%) Model Log
HVT-S-0 4.57 22.05 80.39 95.13 github log
HVT-S-1 2.40 22.09 78.00 93.83 github log
HVT-S-2 1.94 22.11 77.36 93.55 github log
HVT-S-3 1.62 22.11 76.32 92.90 github log
HVT-S-4 1.39 22.12 75.23 92.30 github log

For CIFAR-100 results, please check out our paper for more details.

License

This repository is released under the Apache 2.0 license as found in the LICENSE file.

Acknowledgement

This repository has adopted codes from DeiT, we thank the authors for their open-sourced code.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].