All Projects → AIprogrammer → Visual-Transformer-Paper-Summary

AIprogrammer / Visual-Transformer-Paper-Summary

Licence: other
Summary of Transformer applications for computer vision tasks.

Projects that are alternatives of or similar to Visual-Transformer-Paper-Summary

Medical Transformer
Pytorch Code for "Medical Transformer: Gated Axial-Attention for Medical Image Segmentation"
Stars: ✭ 153 (+200%)
Mutual labels:  transformer, attention, segmentation
cape
Continuous Augmented Positional Embeddings (CAPE) implementation for PyTorch
Stars: ✭ 29 (-43.14%)
Mutual labels:  transformer, vit, visual-transformer
CrabNet
Predict materials properties using only the composition information!
Stars: ✭ 57 (+11.76%)
Mutual labels:  transformer, attention
Linear-Attention-Mechanism
Attention mechanism
Stars: ✭ 27 (-47.06%)
Mutual labels:  attention, segmentation
Semantic-Aware-Attention-Based-Deep-Object-Co-segmentation
Semantic Aware Attention Based Deep Object Co-segmentation
Stars: ✭ 61 (+19.61%)
Mutual labels:  attention, segmentation
TRAR-VQA
[ICCV 2021] TRAR: Routing the Attention Spans in Transformers for Visual Question Answering -- Official Implementation
Stars: ✭ 49 (-3.92%)
Mutual labels:  transformer, attention
h-transformer-1d
Implementation of H-Transformer-1D, Hierarchical Attention for Sequence Learning
Stars: ✭ 121 (+137.25%)
Mutual labels:  transformer, attention
transformer
A PyTorch Implementation of "Attention Is All You Need"
Stars: ✭ 28 (-45.1%)
Mutual labels:  transformer, attention
Transferlearning
Transfer learning / domain adaptation / domain generalization / multi-task learning etc. Papers, codes, datasets, applications, tutorials.-迁移学习
Stars: ✭ 8,481 (+16529.41%)
Mutual labels:  survey, papers
Relation-Extraction-Transformer
NLP: Relation extraction with position-aware self-attention transformer
Stars: ✭ 63 (+23.53%)
Mutual labels:  transformer, attention
Transformer-ocr
Handwritten text recognition using transformers.
Stars: ✭ 92 (+80.39%)
Mutual labels:  transformer, detr
visualization
a collection of visualization function
Stars: ✭ 189 (+270.59%)
Mutual labels:  transformer, attention
Transformer-MM-Explainability
[ICCV 2021- Oral] Official PyTorch implementation for Generic Attention-model Explainability for Interpreting Bi-Modal and Encoder-Decoder Transformers, a novel method to visualize any Transformer-based network. Including examples for DETR, VQA.
Stars: ✭ 484 (+849.02%)
Mutual labels:  transformer, detr
learningspoons
nlp lecture-notes and source code
Stars: ✭ 29 (-43.14%)
Mutual labels:  transformer, attention
seq2seq-pytorch
Sequence to Sequence Models in PyTorch
Stars: ✭ 41 (-19.61%)
Mutual labels:  transformer, attention
gnn-lspe
Source code for GNN-LSPE (Graph Neural Networks with Learnable Structural and Positional Representations), ICLR 2022
Stars: ✭ 165 (+223.53%)
Mutual labels:  attention, transformer-networks
HRFormer
This is an official implementation of our NeurIPS 2021 paper "HRFormer: High-Resolution Transformer for Dense Prediction".
Stars: ✭ 357 (+600%)
Mutual labels:  transformer, segmentation
Jddc solution 4th
2018-JDDC大赛第4名的解决方案
Stars: ✭ 235 (+360.78%)
Mutual labels:  transformer, attention
Pytorch Seq2seq
Tutorials on implementing a few sequence-to-sequence (seq2seq) models with PyTorch and TorchText.
Stars: ✭ 3,418 (+6601.96%)
Mutual labels:  transformer, attention
towhee
Towhee is a framework that is dedicated to making neural data processing pipelines simple and fast.
Stars: ✭ 821 (+1509.8%)
Mutual labels:  transformer, vit

Awesome-Transformer-CV

If you have any problems, suggestions or improvements, please submit the issue or PR.

Contents

Attention

  • Recurrent Models of Visual Attention [2014 deepmind NIPS]
  • Neural Machine Translation by Jointly Learning to Align and Translate [ICLR 2015]

OverallSurvey

  • Efficient Transformers: A Survey [paper]
  • A Survey on Visual Transformer [paper]
  • Transformers in Vision: A Survey [paper]

NLP

Language

  • Sequence to Sequence Learning with Neural Networks [NIPS 2014] [paper] [code]
  • End-To-End Memory Networks [NIPS 2015] [paper] [code]
  • Attention is all you need [NIPS 2017] [paper] [code]
  • Bidirectional Encoder Representations from Transformers: BERT [paper] [code] [pretrained-models]
  • Reformer: The Efficient Transformer [ICLR2020] [paper] [code]
  • Linformer: Self-Attention with Linear Complexity [AAAI2020] [paper] [code]
  • GPT-3: Language Models are Few-Shot Learners [NIPS 2020] [paper] [code]

Speech

  • Dual-Path Transformer Network: Direct Context-Aware Modeling for End-to-End Monaural Speech Separation [INTERSPEECH 2020] [paper] [code]

CV

Backbone_Classification

Papers and Codes

  • CoaT: Co-Scale Conv-Attentional Image Transformers [arxiv 2021] [paper] [code]
  • SiT: Self-supervised vIsion Transformer [arxiv 2021] [paper] [code]
  • VIT: An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale [VIT] [ICLR 2021] [paper] [code]
    • Trained with extra private data: do not generalized well when trained on insufficient amounts of data
  • DeiT: Data-efficient Image Transformers [arxiv2021] [paper] [code]
    • Token-based strategy and build upon VIT and convolutional models
  • Transformer in Transformer [arxiv 2021] [paper] [code1] [code-official]
  • OmniNet: Omnidirectional Representations from Transformers [arxiv2021] [paper]
  • Gaussian Context Transformer [CVPR 2021] [paper]
  • General Multi-Label Image Classification With Transformers [CVPR 2021] [paper] [code]
  • Scaling Local Self-Attention for Parameter Efficient Visual Backbones [CVPR 2021] [paper]
  • T2T-ViT: Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet [ICCV 2021] [paper] [code]
  • Swin Transformer: Hierarchical Vision Transformer using Shifted Windows [ICCV 2021] [paper] [code]
  • Bias Loss for Mobile Neural Networks [ICCV 2021] [paper] [[code()]]
  • Vision Transformer with Progressive Sampling [ICCV 2021] [paper] [[code(https://github.com/yuexy/PS-ViT)]]
  • Rethinking Spatial Dimensions of Vision Transformers [ICCV 2021] [paper] [code]
  • Rethinking and Improving Relative Position Encoding for Vision Transformer [ICCV 2021] [paper] [code]

Interesting Repos

Self-Supervised

  • Emerging Properties in Self-Supervised Vision Transformers [ICCV 2021] [paper] [code]
  • An Empirical Study of Training Self-Supervised Vision Transformers [ICCV 2021] [paper] [code]

Interpretability and Robustness

  • Transformer Interpretability Beyond Attention Visualization [CVPR 2021] [paper] [code]
  • On the Adversarial Robustness of Visual Transformers [arxiv 2021] [paper]
  • Robustness Verification for Transformers [ICLR 2020] [paper] [code]
  • Pretrained Transformers Improve Out-of-Distribution Robustness [ACL 2020] [paper] [code]

Detection

  • DETR: End-to-End Object Detection with Transformers [ECCV2020] [paper] [code]
  • Deformable DETR: Deformable Transformers for End-to-End Object Detection [ICLR2021] [paper] [code]
  • End-to-End Object Detection with Adaptive Clustering Transformer [arxiv2020] [paper]
  • UP-DETR: Unsupervised Pre-training for Object Detection with Transformers [[arxiv2020] [paper]
  • Rethinking Transformer-based Set Prediction for Object Detection [arxiv2020] [paper] [zhihu]
  • End-to-end Lane Shape Prediction with Transformers [WACV 2021] [paper] [code]
  • ViT-FRCNN: Toward Transformer-Based Object Detection [arxiv2020] [paper]
  • Line Segment Detection Using Transformers [CVPR 2021] [paper] [code]
  • Facial Action Unit Detection With Transformers [CVPR 2021] [paper] [code]
  • Adaptive Image Transformer for One-Shot Object Detection [CVPR 2021] [paper] [code]
  • Self-attention based Text Knowledge Mining for Text Detection [CVPR 2021] [paper] [code]
  • Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions [ICCV 2021] [paper] [code]
  • Group-Free 3D Object Detection via Transformers [ICCV 2021] [paper] [code]
  • Fast Convergence of DETR with Spatially Modulated Co-Attention [ICCV 2021] [paper] [code]

HOI

  • End-to-End Human Object Interaction Detection with HOI Transformer [CVPR 2021] [paper] [code]
  • HOTR: End-to-End Human-Object Interaction Detection with Transformers [CVPR 2021] [paper] [code]

Tracking

  • Transformer Meets Tracker: Exploiting Temporal Context for Robust Visual Tracking [CVPR 2021] [paper] [code]
  • TransTrack: Multiple-Object Tracking with Transformer [CVPR 2021] [paper] [code]
  • Transformer Tracking [CVPR 2021] [paper] [code]
  • Learning Spatio-Temporal Transformer for Visual Tracking [ICCV 2021] [paper] [code]

Segmentation

  • SETR : Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers [CVPR 2021] [paper] [code]
  • Trans2Seg: Transparent Object Segmentation with Transformer [arxiv2021] [paper] [code]
  • End-to-End Video Instance Segmentation with Transformers [arxiv2020] [paper] [zhihu]
  • MaX-DeepLab: End-to-End Panoptic Segmentation with Mask Transformers [CVPR 2021] [paper] [official-code] [unofficial-code]
  • Medical Transformer: Gated Axial-Attention for Medical Image Segmentation [arxiv 2020] [paper] [code]
  • SSTVOS: Sparse Spatiotemporal Transformers for Video Object Segmentation [CVPR 2021] [paper] [code]

Reid

  • Diverse Part Discovery: Occluded Person Re-Identification With Part-Aware Transformer [CVPR 2021] [paper] [code]

Localization

  • LoFTR: Detector-Free Local Feature Matching with Transformers [CVPR 2021] [paper] [code]
  • MIST: Multiple Instance Spatial Transformer [CVPR 2021] [paper] [code]

Generation

Inpainting

  • STTN: Learning Joint Spatial-Temporal Transformations for Video Inpainting [ECCV 2020] [paper] [code]

Image enhancement

  • Pre-Trained Image Processing Transformer [CVPR 2021] [paper]
  • TTSR: Learning Texture Transformer Network for Image Super-Resolution [CVPR2020] [paper] [code]

Pose Estimation

  • Pose Recognition with Cascade Transformers [CVPR 2021] [paper] [code]
  • TransPose: Towards Explainable Human Pose Estimation by Transformer [arxiv 2020] [paper] [code]
  • Hand-Transformer: Non-Autoregressive Structured Modeling for 3D Hand Pose Estimation [ECCV 2020] [paper]
  • HOT-Net: Non-Autoregressive Transformer for 3D Hand-Object Pose Estimation [ACMMM 2020] [paper]
  • End-to-End Human Pose and Mesh Reconstruction with Transformers [CVPR 2021] [paper] [code]
  • 3D Human Pose Estimation with Spatial and Temporal Transformers [arxiv 2020] [paper] [code]
  • End-to-End Trainable Multi-Instance Pose Estimation with Transformers [arxiv 2020] [paper]

Face

  • Robust Facial Expression Recognition with Convolutional Visual Transformers [arxiv 2020] [paper]
  • Clusformer: A Transformer Based Clustering Approach to Unsupervised Large-Scale Face and Visual Landmark Recognition [CVPR 2021] [paper] [code]

Video Understanding

  • Is Space-Time Attention All You Need for Video Understanding? [arxiv 2020] [paper] [code]
  • Temporal-Relational CrossTransformers for Few-Shot Action Recognition [CVPR 2021] [paper] [code]
  • Self-Supervised Video Hashing via Bidirectional Transformers [CVPR 2021] [paper]
  • SSAN: Separable Self-Attention Network for Video Representation Learning [CVPR 2021] [paper]

Depth-Estimation

  • Adabins:Depth Estimation using Adaptive Bins [CVPR 2021] [paper] [code]

Prediction

  • Multimodal Motion Prediction with Stacked Transformers [CVPR 2021] [paper] [code]
  • Deep Transformer Models for Time Series Forecasting: The Influenza Prevalence Case [paper]
  • Transformer networks for trajectory forecasting [ICPR 2020] [paper] [code]
  • Spatial-Channel Transformer Network for Trajectory Prediction on the Traffic Scenes [arxiv 2021] [paper] [code]
  • Pedestrian Trajectory Prediction using Context-Augmented Transformer Networks [ICRA 2020] [paper] [code]
  • Spatio-Temporal Graph Transformer Networks for Pedestrian Trajectory Prediction [ECCV 2020] [paper] [code]
  • Hierarchical Multi-Scale Gaussian Transformer for Stock Movement Prediction [paper]
  • Single-Shot Motion Completion with Transformer [arxiv2021] [paper] [code]

NAS

PointCloud

  • Multi-Modal Fusion Transformer for End-to-End Autonomous Driving [CVPR 2021] [paper] [code]
  • Point 4D Transformer Networks for Spatio-Temporal Modeling in Point Cloud Videos [CVPR 2021] [paper]

Fashion

  • Kaleido-BERT:Vision-Language Pre-training on Fashion Domain [CVPR 2021] [paper] [code]

Medical

  • Lesion-Aware Transformers for Diabetic Retinopathy Grading [CVPR 2021] [paper]

Cross-Modal

  • Thinking Fast and Slow: Efficient Text-to-Visual Retrieval with Transformers [CVPR 2021] [paper]
  • Revamping Cross-Modal Recipe Retrieval with Hierarchical Transformers and Self-supervised Learning [CVPR2021] [paper] [code]
  • Topological Planning With Transformers for Vision-and-Language Navigation [CVPR 2021] [paper]
  • Multi-Stage Aggregated Transformer Network for Temporal Language Localization in Videos [CVPRR 2021] [paper]
  • VLN BERT: A Recurrent Vision-and-Language BERT for Navigation [CVPR 2021] [paper] [code]
  • Less Is More: ClipBERT for Video-and-Language Learning via Sparse Sampling [CVPR 2021] [paper] [code]

Reference

Acknowledgement

Thanks for the awesome survey papers of Transformer.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].