All Projects → youngwanLEE → MPViT

youngwanLEE / MPViT

Licence: other
MPViT:Multi-Path Vision Transformer for Dense Prediction in CVPR 2022

Programming Languages

python
139335 projects - #7 most used programming language
Cuda
1817 projects
C++
36643 projects - #6 most used programming language
shell
77523 projects

Projects that are alternatives of or similar to MPViT

semantic-segmentation
SOTA Semantic Segmentation Models in PyTorch
Stars: ✭ 464 (+140.41%)
Mutual labels:  vision-transformer
ai-background-remove
Cut out objects and remove backgrounds from pictures with artificial intelligence
Stars: ✭ 70 (-63.73%)
Mutual labels:  detectron2
visualization
a collection of visualization function
Stars: ✭ 189 (-2.07%)
Mutual labels:  vision-transformer
Transformers-Tutorials
This repository contains demos I made with the Transformers library by HuggingFace.
Stars: ✭ 2,828 (+1365.28%)
Mutual labels:  vision-transformer
DA-RetinaNet
Official Detectron2 implementation of DA-RetinaNet of our Image and Vision Computing 2021 work 'An unsupervised domain adaptation scheme for single-stage artwork recognition in cultural sites'
Stars: ✭ 31 (-83.94%)
Mutual labels:  detectron2
image-classification
A collection of SOTA Image Classification Models in PyTorch
Stars: ✭ 70 (-63.73%)
Mutual labels:  vision-transformer
GDR-Net
GDR-Net: Geometry-Guided Direct Regression Network for Monocular 6D Object Pose Estimation. (CVPR 2021)
Stars: ✭ 167 (-13.47%)
Mutual labels:  detectron2
Entity
EntitySeg Toolbox: Towards Open-World and High-Quality Image Segmentation
Stars: ✭ 313 (+62.18%)
Mutual labels:  detectron2
YOLOS
You Only Look at One Sequence (NeurIPS 2021)
Stars: ✭ 612 (+217.1%)
Mutual labels:  vision-transformer
InterpretDL
InterpretDL: Interpretation of Deep Learning Models,基于『飞桨』的模型可解释性算法库。
Stars: ✭ 121 (-37.31%)
Mutual labels:  vision-transformer
PointPainting
This repository is an open-source PointPainting package which is easy to understand, deploy and run!
Stars: ✭ 152 (-21.24%)
Mutual labels:  mmsegmentation
TransMorph Transformer for Medical Image Registration
TransMorph: Transformer for Unsupervised Medical Image Registration (PyTorch)
Stars: ✭ 130 (-32.64%)
Mutual labels:  vision-transformer
detectron2 backbone
detectron2 backbone: resnet18, efficientnet, hrnet, mobilenet v2, resnest, bifpn
Stars: ✭ 171 (-11.4%)
Mutual labels:  detectron2
Evo-ViT
Official implement of Evo-ViT: Slow-Fast Token Evolution for Dynamic Vision Transformer
Stars: ✭ 50 (-74.09%)
Mutual labels:  vision-transformer
LaTeX-OCR
pix2tex: Using a ViT to convert images of equations into LaTeX code.
Stars: ✭ 1,566 (+711.4%)
Mutual labels:  vision-transformer
SReT
Official PyTorch implementation of our ECCV 2022 paper "Sliced Recursive Transformer"
Stars: ✭ 51 (-73.58%)
Mutual labels:  vision-transformer
towhee
Towhee is a framework that is dedicated to making neural data processing pipelines simple and fast.
Stars: ✭ 821 (+325.39%)
Mutual labels:  vision-transformer
PASSL
PASSL包含 SimCLR,MoCo v1/v2,BYOL,CLIP,PixPro,BEiT,MAE等图像自监督算法以及 Vision Transformer,DEiT,Swin Transformer,CvT,T2T-ViT,MLP-Mixer,XCiT,ConvNeXt,PVTv2 等基础视觉算法
Stars: ✭ 134 (-30.57%)
Mutual labels:  vision-transformer
koclip
KoCLIP: Korean port of OpenAI CLIP, in Flax
Stars: ✭ 80 (-58.55%)
Mutual labels:  vision-transformer
pytorch-vit
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
Stars: ✭ 250 (+29.53%)
Mutual labels:  vision-transformer

MPViT : Multi-Path Vision Transformer for Dense Prediction

This repository inlcudes official implementations and model weights for MPViT.

[Arxiv] [BibTeX]

MPViT : Multi-Path Vision Transformer for Dense Prediction
🏛️️️🏫Youngwan Lee, 🏛️️️Jonghee Kim, 🏫Jeff Willette, 🏫Sung Ju Hwang
ETRI🏛️️, KAIST🏫

News

🎉 MPViT has been accepted in CVPR2022.

Abstract

We explore multi-scale patch embedding and multi-path structure, constructing the Multi-Path Vision Transformer (MPViT). MPViT embeds features of the same size (i.e., sequence length) with patches of different scales simultaneously by using overlapping convolutional patch embedding. Tokens of different scales are then independently fed into the Transformer encoders via multiple paths and the resulting features are aggregated, enabling both fine and coarse feature representations at the same feature level. Thanks to the diverse and multi-scale feature representations, our MPViTs scaling from Tiny(5M) to Base(73M) consistently achieve superior performance over state-of-the-art Vision Transformers on ImageNet classification, object detection, instance segmentation, and semantic segmentation. These extensive results demonstrate that MPViT can serve as a versatile backbone network for various vision tasks.

Main results on ImageNet-1K

🚀 These all models are trained on ImageNet-1K with the same training recipe as DeiT and CoaT.

model resolution acc@1 #params FLOPs weight
MPViT-T 224x224 78.2 5.8M 1.6G weight
MPViT-XS 224x224 80.9 10.5M 2.9G weight
MPViT-S 224x224 83.0 22.8M 4.7G weight
MPViT-B 224x224 84.3 74.8M 16.4G weight

Main results on COCO object detection

🚀 All model are trained using ImageNet-1K pretrained weights.

☀️ MS denotes the same multi-scale training augmentation as in Swin-Transformer which follows the MS augmentation as in DETR and Sparse-RCNN. Therefore, we also follows the official implementation of DETR and Sparse-RCNN which are also based on Detectron2.

Please refer to detectron2/ for the details.

Backbone Method lr Schd box mAP mask mAP #params FLOPS weight
MPViT-T RetinaNet 1x 41.8 - 17M 196G model | metrics
MPViT-XS RetinaNet 1x 43.8 - 20M 211G model | metrics
MPViT-S RetinaNet 1x 45.7 - 32M 248G model | metrics
MPViT-B RetinaNet 1x 47.0 - 85M 482G model | metrics
MPViT-T RetinaNet MS+3x 44.4 - 17M 196G model | metrics
MPViT-XS RetinaNet MS+3x 46.1 - 20M 211G model | metrics
MPViT-S RetinaNet MS+3x 47.6 - 32M 248G model | metrics
MPViT-B RetinaNet MS+3x 48.3 - 85M 482G model | metrics
MPViT-T Mask R-CNN 1x 42.2 39.0 28M 216G model | metrics
MPViT-XS Mask R-CNN 1x 44.2 40.4 30M 231G model | metrics
MPViT-S Mask R-CNN 1x 46.4 42.4 43M 268G model | metrics
MPViT-B Mask R-CNN 1x 48.2 43.5 95M 503G model | metrics
MPViT-T Mask R-CNN MS+3x 44.8 41.0 28M 216G model | metrics
MPViT-XS Mask R-CNN MS+3x 46.6 42.3 30M 231G model | metrics
MPViT-S Mask R-CNN MS+3x 48.4 43.9 43M 268G model | metrics
MPViT-B Mask R-CNN MS+3x 49.5 44.5 95M 503G model | metrics

Deformable-DETR

All models are trained using the same training recipe.

Please refer to deformable_detr/ for the details.

backbone box mAP epochs link
ResNet-50 44.5 50 -
CoaT-lite S 47.0 50 link
CoaT-S 48.4 50 link
MPViT-S 49.0 50 link

Main results on ADE20K Semantic segmentation

All model are trained using ImageNet-1K pretrained weight.

Please refer to semantic_segmentation/ for the details.

Backbone Method Crop Size Lr Schd mIoU #params FLOPs weight
MPViT-S UperNet 512x512 160K 48.3 52M 943G weight
MPViT-B UperNet 512x512 160K 50.3 105M 1185G weight

Getting Started

We use pytorch==1.7.0 torchvision==0.8.1 cuda==10.1 libraries on NVIDIA V100 GPUs. If you use different versions of cuda, you may obtain different accuracies, but the differences are negligible.

Acknowledgement

This repository is built using the Timm library, DeiT, CoaT, Detectron2, mmsegmentation repositories.

This work was supported by Institute of Information & Communications Technology Planning & Evaluation (IITP) grant funded by the Korean government (MSIT) (No. 2020-0-00004, Development of Previsional Intelligence based on Long-term Visual Memory Network and No. 2014-3-00123, Development of High Performance Visual BigData Discovery Platform for Large-Scale Realtime Data Analysis).

License

Please refer to MPViT LSA.

Citing MPViT

@inproceedings{lee2022mpvit,
      title={MPViT: Multi-Path Vision Transformer for Dense Prediction}, 
      author={Youngwan Lee and Jonghee Kim and Jeffrey Willette and Sung Ju Hwang},
      booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
      year={2022}
}
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].