๐ Vision-Transformer-Papers ๐
Image Classification
ยท Semantic Segmentation
Object Detection
ยท General Classification
Fine-Grained Image Classification
ยท Depth Estimation
Instance Segmentation
ยท Salient Object Detection
ยท Person Re-Identification
GOTO PyTorch!
Papers
-
[2020/10] An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale: ViT
-
[2019/08] VISUALBERT: A SIMPLE AND PERFORMANT BASELINE FOR VISION AND LANGUAGE: VISUALBERT
-
[2021/08] Boosting Salient Object Detection with Transformer-based Asymmetric Bilateral U-Net: U-Net
-
[2021/03] Dynamic Feature Regularized Loss for Weakly Supervised Semantic Segmentation
-
[2020/11] AdaBins: Depth Estimation using Adaptive Bins: AdaBins
-
[2021/04] VATT: Transformers for Multimodal Self-Supervised Learning from Raw Video, Audio and Text: VATT
-
[2021/05] CrossViT: Cross-Attention Multi-Scale Vision Transformer for Image Classification: CrossViT
-
[2021/04] Emerging Properties in Self-Supervised Vision Transformers
-
[2021/06] PVTv2: Improved Baselines with Pyramid Vision Transformer: PVTv2
-
[2021/01] Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet: Tokens-to-Token ViT
-
[2021/04] Emerging Properties in Self-Supervised Vision Transformers
-
[2021/06] BEiT: BERT Pre-Training of Image Transformers: BEiT
-
[2021/03] Vision Transformers for Dense Prediction
-
[2021/02] TransReID: Transformer-based Object Re-Identification: TransReID
-
[2021/04] All Tokens Matter: Token Labeling for Training Better Vision Transformers
-
[2021/07] AutoFormer: Searching Transformers for Visual Recognition: AutoFormer
-
[2021/04] Twins: Revisiting the Design of Spatial Attention in Vision Transformers: Twins
-
[2021/06] You Only Look at One Sequence: Rethinking Transformer in Vision through Object Detection