All Projects → alohays → Awesome Visual Representation Learning With Transformers

alohays / Awesome Visual Representation Learning With Transformers

Awesome Transformers (self-attention) in Computer Vision

Projects that are alternatives of or similar to Awesome Visual Representation Learning With Transformers

Asne
A sparsity aware and memory efficient implementation of "Attributed Social Network Embedding" (TKDE 2018).
Stars: ✭ 73 (-56.02%)
Mutual labels:  representation-learning
Sigver wiwd
Learned representation for Offline Handwritten Signature Verification. Models and code to extract features from signature images.
Stars: ✭ 112 (-32.53%)
Mutual labels:  representation-learning
Role2vec
A scalable Gensim implementation of "Learning Role-based Graph Embeddings" (IJCAI 2018).
Stars: ✭ 134 (-19.28%)
Mutual labels:  representation-learning
Ice
ICE: Item Concept Embedding
Stars: ✭ 83 (-50%)
Mutual labels:  representation-learning
Codesearchnet
Datasets, tools, and benchmarks for representation learning of code.
Stars: ✭ 1,378 (+730.12%)
Mutual labels:  representation-learning
Multi object datasets
Multi-object image datasets with ground-truth segmentation masks and generative factors.
Stars: ✭ 121 (-27.11%)
Mutual labels:  representation-learning
Graph 2d cnn
Code and data for the paper 'Classifying Graphs as Images with Convolutional Neural Networks' (new title: 'Graph Classification with 2D Convolutional Neural Networks')
Stars: ✭ 67 (-59.64%)
Mutual labels:  representation-learning
Attribute Aware Attention
[ACM MM 2018] Attribute-Aware Attention Model for Fine-grained Representation Learning
Stars: ✭ 143 (-13.86%)
Mutual labels:  representation-learning
Ampligraph
Python library for Representation Learning on Knowledge Graphs https://docs.ampligraph.org
Stars: ✭ 1,662 (+901.2%)
Mutual labels:  representation-learning
Hyte
EMNLP 2018: HyTE: Hyperplane-based Temporally aware Knowledge Graph Embedding
Stars: ✭ 130 (-21.69%)
Mutual labels:  representation-learning
Conmask
ConMask model described in paper Open-world Knowledge Graph Completion.
Stars: ✭ 84 (-49.4%)
Mutual labels:  representation-learning
Sert
Semantic Entity Retrieval Toolkit
Stars: ✭ 100 (-39.76%)
Mutual labels:  representation-learning
Pcl
PyTorch code for "Prototypical Contrastive Learning of Unsupervised Representations"
Stars: ✭ 124 (-25.3%)
Mutual labels:  representation-learning
Mklpy
A package for Multiple Kernel Learning in Python
Stars: ✭ 81 (-51.2%)
Mutual labels:  representation-learning
Kate
Code & data accompanying the KDD 2017 paper "KATE: K-Competitive Autoencoder for Text"
Stars: ✭ 135 (-18.67%)
Mutual labels:  representation-learning
Self Supervised Learning Overview
📜 Self-Supervised Learning from Images: Up-to-date reading list.
Stars: ✭ 73 (-56.02%)
Mutual labels:  representation-learning
Declutr
The corresponding code from our paper "DeCLUTR: Deep Contrastive Learning for Unsupervised Textual Representations". Do not hesitate to open an issue if you run into any trouble!
Stars: ✭ 111 (-33.13%)
Mutual labels:  representation-learning
Deformable Kernels
Deforming kernels to adapt towards object deformation. In ICLR 2020.
Stars: ✭ 166 (+0%)
Mutual labels:  representation-learning
Autoregressive Predictive Coding
Autoregressive Predictive Coding: An unsupervised autoregressive model for speech representation learning
Stars: ✭ 138 (-16.87%)
Mutual labels:  representation-learning
Srl Zoo
State Representation Learning (SRL) zoo with PyTorch - Part of S-RL Toolbox
Stars: ✭ 125 (-24.7%)
Mutual labels:  representation-learning

Awesome Visual Representation Learning with Transformers Awesome

Awesome Transformers (self-attention) in Computer Vision

About transformers

  • Attention Is All You Need, NeurIPS 2017
  • BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding, NAACL 2019
  • Efficient Transformers: A Survey, arXiv 2020
    • Yi Tay, Mostafa Dehghani, Dara Bahri, Donald Metzler
    • [paper]
  • A Survey on Visual Transformer, arXiv 2020
    • Kai Han, Yunhe Wang, Hanting Chen, Xinghao Chen, Jianyuan Guo, Zhenhua Liu, Yehui Tang, An Xiao, Chunjing Xu, Yixing Xu, Zhaohui Yang, Yiman Zhang, Dacheng Tao
    • [paper]
  • Transformers in Vision: A Survey, arXiv 2021
    • Salman Khan, Muzammal Naseer, Munawar Hayat, Syed Waqas Zamir, Fahad Shahbaz Khan, Mubarak Shah
    • [paper]

Combining CNN with self-attention

  • Attention augmented convolutional networks, ICCV 2019, image classification
  • Self-Attention Generative Adversarial Networks, ICML 2019, generative model(GANs)
  • Videobert: A joint model for video and language representation learning, ICCV 2019, video processing
    • Chen Sun, Austin Myers, Carl Vondrick, Kevin Murphy, Cordelia Schmid
    • [paper]
  • Visual Transformers: Token-based Image Representation and Processing for Computer Vision, arXiv 2020, image classification
    • Bichen Wu, Chenfeng Xu, Xiaoliang Dai, Alvin Wan, Peizhao Zhang, Masayoshi Tomizuka, Kurt Keutzer, Peter Vajda
    • [paper]
  • Feature Pyramid Transformer, ECCV 2020, detection and segmentation
  • Revisiting Stereo Depth Estimation From a Sequence-to-Sequence Perspective with Transformers, arXiv 2020, depth estimation
    • Zhaoshuo Li, Xingtong Liu, Francis X. Creighton, Russell H. Taylor, and Mathias Unberath
    • [paper] [official code]
  • End-to-end Lane Shape Prediction with Transformers, arXiv 2020, lane detection
  • Taming Transformers for High-Resolution Image Synthesis, arXiv 2020, image synthesis
  • TransPose: Towards Explainable Human Pose Estimation by Transformer, arXiv 2020, pose estimation
    • Sen Yang, Zhibin Quan, Mu Nie, Wankou Yang
    • [paper]
  • End-to-End Video Instance Segmentation with Transformers, arXiv 2020, video instance segmentation
    • Yuqing Wang, Zhaoliang Xu, Xinlong Wang, Chunhua Shen, Baoshan Cheng, Hao Shen, Huaxia Xia
    • [paper]
  • TransTrack: Multiple-Object Tracking with Transformer, arXiv 2020, MOT
    • Peize Sun, Yi Jiang, Rufeng Zhang, Enze Xie, Jinkun Cao, Xinting Hu, Tao Kong, Zehuan Yuan, Changhu Wang, Ping Luo
    • [paper][official code]
  • TrackFormer: Multi-Object Tracking with Transformers, arXiv 2021, MOT
    • Tim Meinhardt, Alexander Kirillov, Laura Leal-Taixe, Christoph Feichtenhofer
    • [paper]
  • Line Segment Detection Using Transformers without Edges, arXiv 2021, line segmentation
    • Yifan Xu, Weijian Xu, David Cheung, Zhuowen Tu
    • [paper]
  • Segmenting Transparent Object in the Wild with Transformer, arXiv 2021, transparent object segmentation
  • Bottleneck Transformers for Visual Recognition, arXiv 2021, backbone design
    • Aravind Srinivas, Tsung-Yi Lin, Niki Parmar, Jonathon Shlens, Pieter Abbeel, Ashish Vaswani
    • [paper]

DETR Family

  • End-to-end object detection with transformers, ECCV 2020, object detection
  • Deformable DETR: Deformable Transformers for End-to-End Object Detection, ICLR 2021, object detection
  • End-to-End Object Detection with Adaptive Clustering Transformer, arXiv 2020, object detection
    • Minghang Zheng, Peng Gao, Xiaogang Wang, Hongsheng Li, Hao Dong
    • [paper]
  • UP-DETR: Unsupervised Pre-training for Object Detection with Transformers, arXiv 2020, object detection
    • Zhigang Dai, Bolun Cai, Yugeng Lin, Junying Chen
    • [paper]
  • DETR for Pedestrian Detection, arXiv 2020, pedestrian detection
    • Matthieu Lin, Chuming Li, Xingyuan Bu, Ming Sun, Chen Lin, Junjie Yan, Wanli Ouyang, Zhidong Deng
    • [paper]

Stand-alone transformers for Computer Vision

Self-attention only in local neighborhood

  • Image Transformer, ICML 2018
    • Niki Parmar, Ashish Vaswani, Jakob Uszkoreit, Łukasz Kaiser, Noam Shazeer, Alexander Ku, Dustin Tran
    • [paper] [official code]
  • Stand-alone self-attention in vision models, NeurIPS 2019
  • On the relationship between self-attention and convolutional layers, ICLR 2020
  • Exploring self-attention for image recognition, CVPR 2020

Scalable approximations to global self-attention

  • Generating long sequences with sparse transformers, arXiv 2019
  • Scaling autoregressive video models, ICLR 2019
    • Dirk Weissenborn, Oscar Täckström, Jakob Uszkoreit
    • [paper]
  • Axial attention in multidimensional transformers, arXiv 2019
  • Axial-deeplab: Stand-alone axial-attention for panoptic segmentation, ECCV 2020
  • MaX-DeepLab: End-to-End Panoptic Segmentation with Mask Transformers, arXiv 2020
    • Huiyu Wang, Yukun Zhu, Hartwig Adam, Alan Yuille, Liang-Chieh Chen
    • [paper]

Global self-attention with image preprocessing

  • Generative pretraining from pixels, ICML 2020, iGPT
    • Mark Chen, Alec Radford, Rewon Child, Jeff Wu, Heewoo Jun, Prafulla Dhariwal, David Luan, Ilya Sutskever
    • [paper] [official code]
  • An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale, ICLR 2021, ViT
    • Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, Neil Houlsby
    • [paper] [pytorch implementation]
  • Pre-Trained Image Processing Transformer, arXiv, IPT
    • Hanting Chen, Yunhe Wang, Tianyu Guo, Chang Xu, Yiping Deng, Zhenhua Liu, Siwei Ma, Chunjing Xu, Chao Xu, Wen Gao
    • [paper]
  • Training data-efficient image transformers & distillation through attention, arXiv 2020, DeiT
    • Hugo Touvron, Matthieu Cord, Matthijs Douze, Francisco Massa, Alexandre Sablayrolles, Herve Jegou
    • [paper][official code]
  • Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers, arXiv 2020, SETR
    • Sixiao Zheng, Jiachen Lu, Hengshuang Zhao, Xiatian Zhu, Zekun Luo, Yabiao Wang, Yanwei Fu, Jianfeng Feng, Tao Xiang, Philip H.S. Torr, Li Zhang
    • [paper][official code]
  • Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet, arXiv 2021, T2T-ViT
    • Li Yuan, Yunpeng Chen, Tao Wang, Weihao Yu, Yujun Shi, Francis EH Tay, Jiashi Feng, Shuicheng Yan
    • [paper][official code]
  • TransReID: Transformer-based Object Re-Identification, arXiv 2021
    • Shuting He, Hao Luo, Pichao Wang, Fan Wang, Hao Li, Wei Jiang
    • [paper]
  • Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions
    • Wenhai Wang, Enze Xie, Xiang Li, Deng-Ping Fan, Kaitao Song, Ding Liang, Tong Lu, Ping Luo, Ling Shao
    • [paper][official code]

Global self-attention on 3D point clouds

  • Point Transformer, arXiv 2020, points classification + part/semantic segmentation
    • Hengshuang Zhao, Li Jiang, Jiaya Jia, Philip Torr, Vladlen Koltun
    • [paper]

Unified text-vision tasks

Focused on VQA

  • ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks, NeurIPS 2019
  • LXMERT: Learning Cross-Modality Encoder Representations from Transformers, EMNLP 2019
  • VisualBERT: A Simple and Performant Baseline for Vision and Language, arXiv 2019
  • VL-BERT: Pre-training of Generic Visual-Linguistic Representations, ICLR 2020
  • UNITER: UNiversal Image-TExt Representation Learning, ECCV 2020
    • Yen-Chun Chen, Linjie Li, Licheng Yu, Ahmed El Kholy, Faisal Ahmed, Zhe Gan, Yu Cheng, Jingjing Liu
    • [paper] [official code]
  • Pixel-BERT: Aligning Image Pixels with Text by Deep Multi-Modal Transformers, arXiv 2020
    • Zhicheng Huang, Zhaoyang Zeng, Bei Liu, Dongmei Fu, Jianlong Fu
    • [paper]
  • ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision, arXiv 2021
    • Wonjae Kim, Bokyung Son, Ildoo Kim
    • [paper]

Focused on Image Retrieval

  • Unicoder-VL: A Universal Encoder for Vision and Language by Cross-modal Pre-training, AAAI 2020
  • ImageBERT: Cross-modal Pre-training with Large-scale Weak-supervised Image-Text Data, arXiv 2020
    • Di Qi, Lin Su, Jia Song, Edward Cui, Taroon Bharti, Arun Sacheti
    • [paper]
  • Oscar: Object-Semantics Aligned Pre-training for Vision-Language Tasks, ECCV 2020
    • Xiujun Li, Xi Yin, Chunyuan Li, Pengchuan Zhang, Xiaowei Hu, Lei Zhang, Lijuan Wang, Houdong Hu, Li Dong, Furu Wei, Yejin Choi, Jianfeng Gao
    • [paper] [official code]
  • Training Vision Transformers for Image Retrieval, arXiv 2021
    • Alaaeldin El-Nouby, Natalia Neverova, Ivan Laptev, Herve Jegou
    • [paper]

Focused on OCR

  • LayoutLM: Pre-training of Text and Layout for Document Image Understanding

Focused on Image Captioning

  • CPTR: Full Transformer Network for Image Captioning, arXiv 2021
    • Wei Liu, Sihan Chen, Longteng Guo, Xinxin Zhu, Jing Liu
    • [paper]

Multi-Task

  • 12-in-1: Multi-Task Vision and Language Representation Learning
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].