All Projects → junchen14 → Multi-Modal-Transformer

junchen14 / Multi-Modal-Transformer

Licence: other
The repository collects many various multi-modal transformer architectures, including image transformer, video transformer, image-language transformer, video-language transformer and related datasets. Additionally, it also collects many useful tutorials and tools in these related domains.

Projects that are alternatives of or similar to Multi-Modal-Transformer

image-classification
A collection of SOTA Image Classification Models in PyTorch
Stars: ✭ 70 (+14.75%)
Mutual labels:  vision-transformer, mlp-mixer
MPViT
MPViT:Multi-Path Vision Transformer for Dense Prediction in CVPR 2022
Stars: ✭ 193 (+216.39%)
Mutual labels:  vision-transformer
ViT-V-Net for 3D Image Registration Pytorch
Vision Transformer for 3D medical image registration (Pytorch).
Stars: ✭ 169 (+177.05%)
Mutual labels:  vision-transformer
OASIS
Official implementation of the paper "You Only Need Adversarial Supervision for Semantic Image Synthesis" (ICLR 2021)
Stars: ✭ 232 (+280.33%)
Mutual labels:  multi-modal
YOLOS
You Only Look at One Sequence (NeurIPS 2021)
Stars: ✭ 612 (+903.28%)
Mutual labels:  vision-transformer
visualization
a collection of visualization function
Stars: ✭ 189 (+209.84%)
Mutual labels:  vision-transformer
EGSC-IT
Tensorflow implementation of ICLR2019 paper "Exemplar Guided Unsupervised Image-to-Image Translation with Semantic Consistency"
Stars: ✭ 29 (-52.46%)
Mutual labels:  multi-modal
Splice
Official Pytorch Implementation for "Splicing ViT Features for Semantic Appearance Transfer" presenting "Splice" (CVPR 2022)
Stars: ✭ 126 (+106.56%)
Mutual labels:  vision-transformer
PASSL
PASSL包含 SimCLR,MoCo v1/v2,BYOL,CLIP,PixPro,BEiT,MAE等图像自监督算法以及 Vision Transformer,DEiT,Swin Transformer,CvT,T2T-ViT,MLP-Mixer,XCiT,ConvNeXt,PVTv2 等基础视觉算法
Stars: ✭ 134 (+119.67%)
Mutual labels:  vision-transformer
InterpretDL
InterpretDL: Interpretation of Deep Learning Models,基于『飞桨』的模型可解释性算法库。
Stars: ✭ 121 (+98.36%)
Mutual labels:  vision-transformer
pytorch-vit
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
Stars: ✭ 250 (+309.84%)
Mutual labels:  vision-transformer
towhee
Towhee is a framework that is dedicated to making neural data processing pipelines simple and fast.
Stars: ✭ 821 (+1245.9%)
Mutual labels:  vision-transformer
LaTeX-OCR
pix2tex: Using a ViT to convert images of equations into LaTeX code.
Stars: ✭ 1,566 (+2467.21%)
Mutual labels:  vision-transformer
TransMorph Transformer for Medical Image Registration
TransMorph: Transformer for Unsupervised Medical Image Registration (PyTorch)
Stars: ✭ 130 (+113.11%)
Mutual labels:  vision-transformer
Combining-EfficientNet-and-Vision-Transformers-for-Video-Deepfake-Detection
Code for Video Deepfake Detection model from "Combining EfficientNet and Vision Transformers for Video Deepfake Detection" available on Arxiv and was submitted to ICIAP 2021.
Stars: ✭ 39 (-36.07%)
Mutual labels:  vision-transformer
VidSitu
[CVPR21] Visual Semantic Role Labeling for Video Understanding (https://arxiv.org/abs/2104.00990)
Stars: ✭ 41 (-32.79%)
Mutual labels:  video-language
iPerceive
Applying Common-Sense Reasoning to Multi-Modal Dense Video Captioning and Video Question Answering | Python3 | PyTorch | CNNs | Causality | Reasoning | LSTMs | Transformers | Multi-Head Self Attention | Published in IEEE Winter Conference on Applications of Computer Vision (WACV) 2021
Stars: ✭ 52 (-14.75%)
Mutual labels:  multi-modal
deep-text-recognition-benchmark
PyTorch code of my ICDAR 2021 paper Vision Transformer for Fast and Efficient Scene Text Recognition (ViTSTR)
Stars: ✭ 123 (+101.64%)
Mutual labels:  vision-transformer
GFNet
[NeurIPS 2021] Global Filter Networks for Image Classification
Stars: ✭ 199 (+226.23%)
Mutual labels:  vision-transformer
koclip
KoCLIP: Korean port of OpenAI CLIP, in Flax
Stars: ✭ 80 (+31.15%)
Mutual labels:  vision-transformer

Reading list in Transformer

We are a team from KAUST Vision-CAIR group and focus on the Multi-modal representation learning.

This repo is aimed to collect all the recent popular Transformer paper, codes and learning resources with respect to the domains of Vision Transformer, NLP and multi-modal, etc.

Recent News

CVPR multi-modal papers are collected in here

The code of VisualGPT is open sourced. They can be found here

The code and paper of LeViT is open sourced. They can be found here

The paper MLP-Mixer: An all-MLP Architecture for Vision is availble here

The code and paper of MDTER is open sourced. They can be found here

The code and papper of RelTransformer is open sourced. They can be found here

The code and paper of Twins-SVT is open sourced. They can be found here

Vision Transformer for deepfake detection. They can be found here

The code of VideoGPT is open sourced. They can be found here

The code of CoaT is open sourced. They can be found here

The code of Kaleido-BERT is open sourced. They can be found here

The code of TimeSformer is open sourced. They can be found here

The code of SwinTransformer is open sourced. They can be found here

Topics (paper and code)

Review Paper in multi-modal

Tutorials and workshop

Datasets

Blogs

Tools

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].