All Projects → microsoft → CvT

microsoft / CvT

Licence: MIT License
This is an official implementation of CvT: Introducing Convolutions to Vision Transformers.

Programming Languages

python
139335 projects - #7 most used programming language
shell
77523 projects

Projects that are alternatives of or similar to CvT

Pytorch Mobilenet V3
MobileNetV3 in pytorch and ImageNet pretrained models
Stars: ✭ 616 (+135.11%)
Mutual labels:  classification, imagenet
Regnet
Pytorch implementation of network design paradigm described in the paper "Designing Network Design Spaces"
Stars: ✭ 129 (-50.76%)
Mutual labels:  classification, imagenet
Constrained attention filter
(ECCV 2020) Tensorflow implementation of A Generic Visualization Approach for Convolutional Neural Networks
Stars: ✭ 36 (-86.26%)
Mutual labels:  classification, imagenet
Caffe Model
Caffe models (including classification, detection and segmentation) and deploy files for famouse networks
Stars: ✭ 1,258 (+380.15%)
Mutual labels:  classification, imagenet
Shufflenet V2 Tensorflow
A lightweight convolutional neural network
Stars: ✭ 145 (-44.66%)
Mutual labels:  classification, imagenet
Tensorflow object tracking video
Object Tracking in Tensorflow ( Localization Detection Classification ) developed to partecipate to ImageNET VID competition
Stars: ✭ 491 (+87.4%)
Mutual labels:  classification, imagenet
Pytorch Classification
Classification with PyTorch.
Stars: ✭ 1,268 (+383.97%)
Mutual labels:  classification, imagenet
Pytorch Randaugment
Unofficial PyTorch Reimplementation of RandAugment.
Stars: ✭ 323 (+23.28%)
Mutual labels:  classification, imagenet
Efficientnet
Implementation of EfficientNet model. Keras and TensorFlow Keras.
Stars: ✭ 1,920 (+632.82%)
Mutual labels:  classification, imagenet
Vision4j Collection
Collection of computer vision models, ready to be included in a JVM project
Stars: ✭ 132 (-49.62%)
Mutual labels:  classification, imagenet
Pytorch Imagenet Cifar Coco Voc Training
Training examples and results for ImageNet(ILSVRC2012)/CIFAR100/COCO2017/VOC2007+VOC2012 datasets.Image Classification/Object Detection.Include ResNet/EfficientNet/VovNet/DarkNet/RegNet/RetinaNet/FCOS/CenterNet/YOLOv3.
Stars: ✭ 130 (-50.38%)
Mutual labels:  classification, imagenet
simpleAICV-pytorch-ImageNet-COCO-training
SimpleAICV:pytorch training example on ImageNet(ILSVRC2012)/COCO2017/VOC2007+2012 datasets.Include ResNet/DarkNet/RetinaNet/FCOS/CenterNet/TTFNet/YOLOv3/YOLOv4/YOLOv5/YOLOX.
Stars: ✭ 276 (+5.34%)
Mutual labels:  classification, imagenet
Imgclsmob
Sandbox for training deep learning networks
Stars: ✭ 2,405 (+817.94%)
Mutual labels:  classification, imagenet
NIPS-Global-Paper-Implementation-Challenge
Selective Classification For Deep Neural Networks.
Stars: ✭ 11 (-95.8%)
Mutual labels:  classification, imagenet
Skin-cancer-recoginition
Recognizing and localizing melanoma from other skin disease
Stars: ✭ 28 (-89.31%)
Mutual labels:  classification
cvt modeline calculator 12
CVT (Coordinated Video Timings) modeline calculator with CVT v1.2 timings.
Stars: ✭ 51 (-80.53%)
Mutual labels:  cvt
CNN-SoilTextureClassification
1-dimensional convolutional neural networks (CNN) for the classification of soil texture based on hyperspectral data
Stars: ✭ 35 (-86.64%)
Mutual labels:  classification
well-classified-examples-are-underestimated
Code for the AAAI 2022 publication "Well-classified Examples are Underestimated in Classification with Deep Neural Networks"
Stars: ✭ 21 (-91.98%)
Mutual labels:  classification
data-science-notes
Open-source project hosted at https://makeuseofdata.com to crowdsource a robust collection of notes related to data science (math, visualization, modeling, etc)
Stars: ✭ 52 (-80.15%)
Mutual labels:  classification
text2class
Multi-class text categorization using state-of-the-art pre-trained contextualized language models, e.g. BERT
Stars: ✭ 15 (-94.27%)
Mutual labels:  classification

Introduction

This is an official implementation of CvT: Introducing Convolutions to Vision Transformers. We present a new architecture, named Convolutional vision Transformers (CvT), that improves Vision Transformers (ViT) in performance and efficienty by introducing convolutions into ViT to yield the best of both disignes. This is accomplished through two primary modifications: a hierarchy of Transformers containing a new convolutional token embedding, and a convolutional Transformer block leveraging a convolutional projection. These changes introduce desirable properties of convolutional neural networks (CNNs) to the ViT architecture (e.g. shift, scale, and distortion invariance) while maintaining the merits of Transformers (e.g. dynamic attention, global context, and better generalization). We validate CvT by conducting extensive experiments, showing that this approach achieves state-of-the-art performance over other Vision Transformers and ResNets on ImageNet-1k, with fewer parameters and lower FLOPs. In addition, performance gains are maintained when pretrained on larger dataset (e.g. ImageNet-22k) and fine-tuned to downstream tasks. Pre-trained on ImageNet-22k, our CvT-W24 obtains a top-1 accuracy of 87.7% on the ImageNet-1k val set. Finally, our results show that the positional encoding, a crucial component in existing Vision Transformers, can be safely removed in our model, simplifying the design for higher resolution vision tasks.

Main results

Models pre-trained on ImageNet-1k

Model Resolution Param GFLOPs Top-1
CvT-13 224x224 20M 4.5 81.6
CvT-21 224x224 32M 7.1 82.5
CvT-13 384x384 20M 16.3 83.0
CvT-21 384x384 32M 24.9 83.3

Models pre-trained on ImageNet-22k

Model Resolution Param GFLOPs Top-1
CvT-13 384x384 20M 16.3 83.3
CvT-21 384x384 32M 24.9 84.9
CvT-W24 384x384 277M 193.2 87.6

You can download all the models from our model zoo.

Quick start

Installation

Assuming that you have installed PyTroch and TorchVision, if not, please follow the officiall instruction to install them firstly. Intall the dependencies using cmd:

python -m pip install -r requirements.txt --user -q

The code is developed and tested using pytorch 1.7.1. Other versions of pytorch are not fully tested.

Data preparation

Please prepare the data as following:

|-DATASET
  |-imagenet
    |-train
    | |-class1
    | | |-img1.jpg
    | | |-img2.jpg
    | | |-...
    | |-class2
    | | |-img3.jpg
    | | |-...
    | |-class3
    | | |-img4.jpg
    | | |-...
    | |-...
    |-val
      |-class1
      | |-img5.jpg
      | |-...
      |-class2
      | |-img6.jpg
      | |-...
      |-class3
      | |-img7.jpg
      | |-...
      |-...

Run

Each experiment is defined by a yaml config file, which is saved under the directory of experiments. The directory of experiments has a tree structure like this:

experiments
|-{DATASET_A}
| |-{ARCH_A}
| |-{ARCH_B}
|-{DATASET_B}
| |-{ARCH_A}
| |-{ARCH_B}
|-{DATASET_C}
| |-{ARCH_A}
| |-{ARCH_B}
|-...

We provide a run.sh script for running jobs in local machine.

Usage: run.sh [run_options]
Options:
  -g|--gpus <1> - number of gpus to be used
  -t|--job-type <aml> - job type (train|test)
  -p|--port <9000> - master port
  -i|--install-deps - If install dependencies (default: False)

Training on local machine

bash run.sh -g 8 -t train --cfg experiments/imagenet/cvt/cvt-13-224x224.yaml

You can also modify the config paramters by the command line. For example, if you want to change the lr rate to 0.1, you can run the command:

bash run.sh -g 8 -t train --cfg experiments/imagenet/cvt/cvt-13-224x224.yaml TRAIN.LR 0.1

Notes:

  • The checkpoint, model, and log files will be saved in OUTPUT/{dataset}/{training config} by default.

Testing pre-trained models

bash run.sh -t test --cfg experiments/imagenet/cvt/cvt-13-224x224.yaml TEST.MODEL_FILE ${PRETRAINED_MODLE_FILE}

Citation

If you find this work or code is helpful in your research, please cite:

@article{wu2021cvt,
  title={Cvt: Introducing convolutions to vision transformers},
  author={Wu, Haiping and Xiao, Bin and Codella, Noel and Liu, Mengchen and Dai, Xiyang and Yuan, Lu and Zhang, Lei},
  journal={arXiv preprint arXiv:2103.15808},
  year={2021}
}

Contributing

This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.

When you submit a pull request, a CLA bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact [email protected] with any additional questions or comments.

Trademarks

This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft trademarks or logos is subject to and must follow Microsoft's Trademark & Brand Guidelines. Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship. Any use of third-party trademarks or logos are subject to those third-party's policies.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].