junchen14 / Multi-Modal-Transformer

Licence: other

The repository collects many various multi-modal transformer architectures, including image transformer, video transformer, image-language transformer, video-language transformer and related datasets. Additionally, it also collects many useful tutorials and tools in these related domains.

Projects that are alternatives of or similar to Multi-Modal-Transformer

image-classification

A collection of SOTA Image Classification Models in PyTorch

Stars: ✭ 70 (+14.75%)

Mutual labels: vision-transformer, mlp-mixer

MPViT

MPViT:Multi-Path Vision Transformer for Dense Prediction in CVPR 2022

Stars: ✭ 193 (+216.39%)

Mutual labels: vision-transformer

ViT-V-Net for 3D Image Registration Pytorch

Vision Transformer for 3D medical image registration (Pytorch).

Stars: ✭ 169 (+177.05%)

Mutual labels: vision-transformer

OASIS

Official implementation of the paper "You Only Need Adversarial Supervision for Semantic Image Synthesis" (ICLR 2021)

Stars: ✭ 232 (+280.33%)

Mutual labels: multi-modal

YOLOS

You Only Look at One Sequence (NeurIPS 2021)

Stars: ✭ 612 (+903.28%)

Mutual labels: vision-transformer

visualization

a collection of visualization function

Stars: ✭ 189 (+209.84%)

Mutual labels: vision-transformer

EGSC-IT

Tensorflow implementation of ICLR2019 paper "Exemplar Guided Unsupervised Image-to-Image Translation with Semantic Consistency"

Stars: ✭ 29 (-52.46%)

Mutual labels: multi-modal

Splice

Official Pytorch Implementation for "Splicing ViT Features for Semantic Appearance Transfer" presenting "Splice" (CVPR 2022)

Stars: ✭ 126 (+106.56%)

Mutual labels: vision-transformer

PASSL

PASSL包含 SimCLR，MoCo v1/v2，BYOL，CLIP，PixPro，BEiT，MAE等图像自监督算法以及 Vision Transformer，DEiT，Swin Transformer，CvT，T2T-ViT，MLP-Mixer，XCiT，ConvNeXt，PVTv2 等基础视觉算法

Stars: ✭ 134 (+119.67%)

Mutual labels: vision-transformer

InterpretDL

InterpretDL: Interpretation of Deep Learning Models，基于『飞桨』的模型可解释性算法库。

Stars: ✭ 121 (+98.36%)

Mutual labels: vision-transformer

pytorch-vit

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

Stars: ✭ 250 (+309.84%)

Mutual labels: vision-transformer

towhee

Towhee is a framework that is dedicated to making neural data processing pipelines simple and fast.

Stars: ✭ 821 (+1245.9%)

Mutual labels: vision-transformer

LaTeX-OCR

pix2tex: Using a ViT to convert images of equations into LaTeX code.

Stars: ✭ 1,566 (+2467.21%)

Mutual labels: vision-transformer

TransMorph Transformer for Medical Image Registration

TransMorph: Transformer for Unsupervised Medical Image Registration (PyTorch)

Stars: ✭ 130 (+113.11%)

Mutual labels: vision-transformer

Combining-EfficientNet-and-Vision-Transformers-for-Video-Deepfake-Detection

Code for Video Deepfake Detection model from "Combining EfficientNet and Vision Transformers for Video Deepfake Detection" available on Arxiv and was submitted to ICIAP 2021.

Stars: ✭ 39 (-36.07%)

Mutual labels: vision-transformer

VidSitu

[CVPR21] Visual Semantic Role Labeling for Video Understanding (https://arxiv.org/abs/2104.00990)

Stars: ✭ 41 (-32.79%)

Mutual labels: video-language

iPerceive

Stars: ✭ 52 (-14.75%)

Mutual labels: multi-modal

deep-text-recognition-benchmark

PyTorch code of my ICDAR 2021 paper Vision Transformer for Fast and Efficient Scene Text Recognition (ViTSTR)

Stars: ✭ 123 (+101.64%)

Mutual labels: vision-transformer

GFNet

[NeurIPS 2021] Global Filter Networks for Image Classification

Stars: ✭ 199 (+226.23%)

Mutual labels: vision-transformer

koclip

KoCLIP: Korean port of OpenAI CLIP, in Flax

Stars: ✭ 80 (+31.15%)

Mutual labels: vision-transformer

View All Similar Projects ➔

Reading list in Transformer

We are a team from KAUST Vision-CAIR group and focus on the Multi-modal representation learning.

This repo is aimed to collect all the recent popular Transformer paper, codes and learning resources with respect to the domains of Vision Transformer, NLP and multi-modal, etc.

Recent News

CVPR multi-modal papers are collected in here

The code of VisualGPT is open sourced. They can be found here

The code and paper of LeViT is open sourced. They can be found here

The paper MLP-Mixer: An all-MLP Architecture for Vision is availble here

The code and paper of MDTER is open sourced. They can be found here

The code and papper of RelTransformer is open sourced. They can be found here

The code and paper of Twins-SVT is open sourced. They can be found here

Vision Transformer for deepfake detection. They can be found here

The code of VideoGPT is open sourced. They can be found here

The code of CoaT is open sourced. They can be found here

The code of Kaleido-BERT is open sourced. They can be found here

The code of TimeSformer is open sourced. They can be found here

The code of SwinTransformer is open sourced. They can be found here

Topics (paper and code)

other interested papers in related domains

Review Paper in multi-modal

Video-language

Tutorials and workshop

Datasets

Multi-modal Datasets

Blogs

Lil's blogs

Tools

PyTorchVideo a deep learning library for video understanding research
horovod a tool for multi-gpu parallel processing
accelerate an easy API for mixed precision and any kind of distributed computing
hyperparameter search: optuna
AI Conference Deadlines

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

junchen14 / Multi-Modal-Transformer

Labels

Projects that are alternatives of or similar to Multi-Modal-Transformer

Reading list in Transformer

Recent News

Topics (paper and code)

Tutorials and workshop

Datasets

Blogs

Tools