All Projects → fnzhan → MISE

fnzhan / MISE

Licence: other
Multimodal Image Synthesis and Editing: A Survey

Programming Languages

TeX
3793 projects

Projects that are alternatives of or similar to MISE

PyTorch-Model-Compare
Compare neural networks by their feature similarity
Stars: ✭ 119 (-44.39%)
Mutual labels:  transformers
language-planner
Official Code for "Language Models as Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents"
Stars: ✭ 84 (-60.75%)
Mutual labels:  transformers
Self-Supervised-Embedding-Fusion-Transformer
The code for our IEEE ACCESS (2020) paper Multimodal Emotion Recognition with Transformer-Based Self Supervised Feature Fusion.
Stars: ✭ 57 (-73.36%)
Mutual labels:  multimodal-deep-learning
COVID-19-Tweet-Classification-using-Roberta-and-Bert-Simple-Transformers
Rank 1 / 216
Stars: ✭ 24 (-88.79%)
Mutual labels:  transformers
molecule-attention-transformer
Pytorch reimplementation of Molecule Attention Transformer, which uses a transformer to tackle the graph-like structure of molecules
Stars: ✭ 46 (-78.5%)
Mutual labels:  transformers
deepfrog
An NLP-suite powered by deep learning
Stars: ✭ 16 (-92.52%)
Mutual labels:  transformers
xpandas
Universal 1d/2d data containers with Transformers functionality for data analysis.
Stars: ✭ 25 (-88.32%)
Mutual labels:  transformers
optimum
🏎️ Accelerate training and inference of 🤗 Transformers with easy to use hardware optimization tools
Stars: ✭ 567 (+164.95%)
Mutual labels:  transformers
pytorch-vit
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
Stars: ✭ 250 (+16.82%)
Mutual labels:  transformers
Pytorch-NLU
Pytorch-NLU,一个中文文本分类、序列标注工具包,支持中文长文本、短文本的多类、多标签分类任务,支持中文命名实体识别、词性标注、分词等序列标注任务。 Ptorch NLU, a Chinese text classification and sequence annotation toolkit, supports multi class and multi label classification tasks of Chinese long text and short text, and supports sequence annotation tasks such as Chinese named entity recognition, part of speech ta…
Stars: ✭ 151 (-29.44%)
Mutual labels:  transformers
modules
The official repository for our paper "Are Neural Nets Modular? Inspecting Functional Modularity Through Differentiable Weight Masks". We develop a method for analyzing emerging functional modularity in neural networks based on differentiable weight masks and use it to point out important issues in current-day neural networks.
Stars: ✭ 25 (-88.32%)
Mutual labels:  transformers
converse
Conversational text Analysis using various NLP techniques
Stars: ✭ 147 (-31.31%)
Mutual labels:  transformers
course-content-dl
NMA deep learning course
Stars: ✭ 537 (+150.93%)
Mutual labels:  transformers
gnn-lspe
Source code for GNN-LSPE (Graph Neural Networks with Learnable Structural and Positional Representations), ICLR 2022
Stars: ✭ 165 (-22.9%)
Mutual labels:  transformers
chef-transformer
Chef Transformer 🍲 .
Stars: ✭ 29 (-86.45%)
Mutual labels:  transformers
DocSum
A tool to automatically summarize documents abstractively using the BART or PreSumm Machine Learning Model.
Stars: ✭ 58 (-72.9%)
Mutual labels:  transformers
hateful memes-hate detectron
Detecting Hate Speech in Memes Using Multimodal Deep Learning Approaches: Prize-winning solution to Hateful Memes Challenge. https://arxiv.org/abs/2012.12975
Stars: ✭ 35 (-83.64%)
Mutual labels:  multimodal-deep-learning
remixer-pytorch
Implementation of the Remixer Block from the Remixer paper, in Pytorch
Stars: ✭ 37 (-82.71%)
Mutual labels:  transformers
iPerceive
Applying Common-Sense Reasoning to Multi-Modal Dense Video Captioning and Video Question Answering | Python3 | PyTorch | CNNs | Causality | Reasoning | LSTMs | Transformers | Multi-Head Self Attention | Published in IEEE Winter Conference on Applications of Computer Vision (WACV) 2021
Stars: ✭ 52 (-75.7%)
Mutual labels:  transformers
TransCenter
This is the official implementation of TransCenter. The code and pretrained models are now available here: https://gitlab.inria.fr/yixu/TransCenter_official.
Stars: ✭ 82 (-61.68%)
Mutual labels:  transformers

Multimodal Image Synthesis and Editing: A Survey

arXiv Survey Maintenance PR's Welcome GitHub license

Teaser

This project is associated with our survey paper which comprehensively contextualizes the advance of the recent Multimodal Image Synthesis & Editing (MISE) and formulates taxonomies according to data modality and model architectures.

Multimodal Image Synthesis and Editing: A Survey » [Paper]
Fangneng Zhan, Yingchen Yu, Rongliang Wu, Jiahui Zhang, Shijian Lu

Table of Contents (Work in Progress)

Methods:

Modalities & Datasets:

Transformer-based-Methods

MaskGIT: Masked Generative Image Transformer
Huiwen Chang, Han Zhang, Lu Jiang, Ce Liu, William T. Freeman
arxiv 2022 [Paper]

ERNIE-ViLG: Unified Generative Pre-training for Bidirectional Vision-Language Generation
Han Zhang, Weichong Yin, Yewei Fang, Lanxin Li, Boqiang Duan, Zhihua Wu, Yu Sun, Hao Tian, Hua Wu, Haifeng Wang
arxiv 2021 [Paper] [Project]

NÜWA: Visual Synthesis Pre-training for Neural visUal World creAtion
Chenfei Wu, Jian Liang, Lei Ji, Fan Yang, Yuejian Fang, Daxin Jiang, Nan Duan
arxiv 2021 [Paper] [Code] [Video]

L-Verse: Bidirectional Generation Between Image and Text
Taehoon Kim, Gwangmo Song, Sihaeng Lee, Sangyun Kim, Yewon Seo, Soonyoung Lee, Seung Hwan Kim, Honglak Lee, Kyunghoon Bae
arxiv 2021 [Paper] [Code]

M6-UFC: Unifying Multi-Modal Controls for Conditional Image Synthesis
Zhu Zhang, Jianxin Ma, Chang Zhou, Rui Men, Zhikang Li, Ming Ding, Jie Tang, Jingren Zhou, Hongxia Yang
NeurIPS 2021 [Paper]

ImageBART: Bidirectional Context with Multinomial Diffusion for Autoregressive Image Synthesis
Patrick Esser, Robin Rombach, Andreas Blattmann, Björn Ommer
NeurIPS 2021 [Paper] [Code] [Project]

A Picture is Worth a Thousand Words: A Unified System for Diverse Captions and Rich Images Generation
Yupan Huang, Bei Liu, Jianlong Fu, Yutong Lu
ACM MM 2021 [Paper] [Code]

Unifying Multimodal Transformer for Bi-directional Image and Text Generation
Yupan Huang, Hongwei Xue, Bei Liu, Yutong Lu
ACM MM 2021 [Paper] [Code]

Taming Transformers for High-Resolution Image Synthesis
Patrick Esser, Robin Rombach, Björn Ommer
CVPR 2021 [Paper] [Code] [Project]

RuDOLPH: One Hyper-Modal Transformer can be creative as DALL-E and smart as CLIP
Alex Shonenkov and Michael Konstantinov
arxiv 2022 [Code]

Generate Images from Texts in Russian (ruDALL-E)
[Code] [Project]

Zero-Shot Text-to-Image Generation
Aditya Ramesh, Mikhail Pavlov, Gabriel Goh, Scott Gray, Chelsea Voss, Alec Radford, Mark Chen, Ilya Sutskever
arxiv 2021 [Paper] [Code] [Project]

Compositional Transformers for Scene Generation
Drew A. Hudson, C. Lawrence Zitnick
NeurIPS 2021 [Paper] [Code]

X-LXMERT: Paint, Caption and Answer Questions with Multi-Modal Transformers
Jaemin Cho, Jiasen Lu, Dustin Schwenk, Hannaneh Hajishirzi, Aniruddha Kembhavi
EMNLP 2020 [Paper] [Code]

One-shot Talking Face Generation from Single-speaker Audio-Visual Correlation Learning
Suzhen Wang, Lincheng Li, Yu Ding, Xin Yu
AAAI 2022 [Paper]


Image-Quantizer

[TE-VQGAN] Translation-equivariant Image Quantizer for Bi-directional Image-Text Generation
Woncheol Shin, Gyubok Lee, Jiyoung Lee, Joonseok Lee, Edward Choi
arxiv 2021 [Paper] [Code]

[ViT-VQGAN] Vector-quantized Image Modeling with Improved VQGAN
Jiahui Yu, Xin Li, Jing Yu Koh, Han Zhang, Ruoming Pang, James Qin, Alexander Ku, Yuanzhong Xu, Jason Baldridge, Yonghui Wu
arxiv 2021 [Paper]

[PeCo] PeCo: Perceptual Codebook for BERT Pre-training of Vision Transformers
Xiaoyi Dong, Jianmin Bao, Ting Zhang, Dongdong Chen, Weiming Zhang, Lu Yuan, Dong Chen, Fang Wen, Nenghai Yu
arxiv 2021 [Paper]

[VQ-GAN] Taming Transformers for High-Resolution Image Synthesis
Patrick Esser, Robin Rombach, Björn Ommer
CVPR 2021 [Paper] [Code]

[Gumbel-VQ] vq-wav2vec: Self-Supervised Learning of Discrete Speech Representations
Alexei Baevski, Steffen Schneider, Michael Auli
ICLR 2020 [Paper] [Code]

[EM VQ-VAE] Theory and Experiments on Vector Quantized Autoencoders
Aurko Roy, Ashish Vaswani, Arvind Neelakantan, Niki Parmar
arxiv 2018 [Paper] [Code]

[VQ-VAE] Neural Discrete Representation Learning
Aaron van den Oord, Oriol Vinyals, Koray Kavukcuoglu
NIPS 2017 [Paper] [Code]

[VQ-VAE2 or EMA-VQ] Generating Diverse High-Fidelity Images with VQ-VAE-2
Ali Razavi, Aaron van den Oord, Oriol Vinyals
NIPS 2019 [Paper] [Code]

[Discrete VAE] Discrete Variational Autoencoders
Jason Tyler Rolfe
ICLR 2017 [Paper] [Code]

[DVAE++] DVAE++: Discrete Variational Autoencoders with Overlapping Transformations
Arash Vahdat, William G. Macready, Zhengbing Bian, Amir Khoshaman, Evgeny Andriyash
ICML 2018 [Paper] [Code]

[DVAE#] DVAE#: Discrete Variational Autoencoders with Relaxed Boltzmann Priors
Arash Vahdat, Evgeny Andriyash, William G. Macready
NIPS 2018 [Paper] [Code]


NeRF-based-Methods

IDE-3D: Interactive Disentangled Editing for High-Resolution 3D-aware Portrait Synthesis
Jingxiang Sun, Xuan Wang, Yichun Shi, Lizhen Wang, Jue Wang, Yebin Liu
arxiv 2022 [Paper] [Code] [Project]

CG-NeRF: Conditional Generative Neural Radiance Fields
Kyungmin Jo, Gyumin Shim, Sanghun Jung, Soyoung Yang, Jaegul Choo
arxiv 2021 [Paper]

Sem2NeRF: Converting Single-View Semantic Masks to Neural Radiance Fields
Yuedong Chen, Qianyi Wu, Chuanxia Zheng, Tat-Jen Cham, Jianfei Cai
arxiv 2022 [Paper] [Code] [Project]

Zero-Shot Text-Guided Object Generation with Dream Fields
Ajay Jain, Ben Mildenhall, Jonathan T. Barron, Pieter Abbeel, Ben Poole
arxiv 2021 [Paper] [Project]

CLIP-NeRF: Text-and-Image Driven Manipulation of Neural Radiance Fields
Can Wang, Menglei Chai, Mingming He, Dongdong Chen, Jing Liao
arxiv 2021 [Paper] [Code] [Project]

AD-NeRF: Audio Driven Neural Radiance Fields for Talking Head Synthesis
Yudong Guo, Keyu Chen, Sen Liang, Yong-Jin Liu, Hujun Bao, Juyong Zhang
ICCV 2021 [Paper] [Code] [Project] [Video]


Diffusion-based-Methods

Text2Human: Text-Driven Controllable Human Image Generation
Yuming Jiang, Shuai Yang, Haonan Qiu, Wayne Wu, Chen Change Loy, Ziwei Liu
SIGGRAPH 2022 [Paper] [Project] [Code]

[DALL-E 2] Hierarchical Text-Conditional Image Generation with CLIP Latents
Aditya Ramesh, Prafulla Dhariwal, Alex Nichol, Casey Chu, Mark Chen
[Paper] [Code]

High-Resolution Image Synthesis with Latent Diffusion Models
Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, Björn Ommer
arxiv 2021 [Paper] [Code]

v objective diffusion
Katherine Crowson
[Code]

GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models
Alex Nichol, Prafulla Dhariwal, Aditya Ramesh, Pranav Shyam, Pamela Mishkin, Bob McGrew, Ilya Sutskever, Mark Chen
arxiv 2021 [Paper] [Code]

Vector Quantized Diffusion Model for Text-to-Image Synthesis
Shuyang Gu, Dong Chen, Jianmin Bao, Fang Wen, Bo Zhang, Dongdong Chen, Lu Yuan, Baining Guo
arxiv 2021 [Paper] [Code]

DiffusionCLIP: Text-Guided Diffusion Models for Robust Image Manipulation
Gwanghyun Kim, Jong Chul Ye
arxiv 2021 [Paper]

Blended Diffusion for Text-driven Editing of Natural Images
Omri Avrahami, Dani Lischinski, Ohad Fried
CVPR 2022 [Paper] [Project] [Code]


GAN-Inversion-Methods

HairCLIP: Design Your Hair by Text and Reference Image
Tianyi Wei, Dongdong Chen, Wenbo Zhou, Jing Liao, Zhentao Tan, Lu Yuan, Weiming Zhang, Nenghai Yu
arxiv 2021 [Paper] [Code]

FuseDream: Training-Free Text-to-Image Generation with Improved CLIP+ GAN Space Optimization
Xingchao Liu, Chengyue Gong, Lemeng Wu, Shujian Zhang, Hao Su, Qiang Liu
arxiv 2021 [Paper] [Code]

StyleMC: Multi-Channel Based Fast Text-Guided Image Generation and Manipulation
Umut Kocasari, Alara Dirik, Mert Tiftikci, Pinar Yanardag
WACV 2022 [Paper] [Code] [Project]

Cycle-Consistent Inverse GAN for Text-to-Image Synthesis
Hao Wang, Guosheng Lin, Steven C. H. Hoi, Chunyan Miao
ACM MM 2021 [Paper]

StyleCLIP: Text-Driven Manipulation of StyleGAN Imagery
Or Patashnik, Zongze Wu, Eli Shechtman, Daniel Cohen-Or, Dani Lischinski
ICCV 2021 [Paper] [Code] [Video]

Talk-to-Edit: Fine-Grained Facial Editing via Dialog
Yuming Jiang, Ziqi Huang, Xingang Pan, Chen Change Loy, Ziwei Liu
ICCV 2021 [Paper] [Code] [Project]

TediGAN: Text-Guided Diverse Face Image Generation and Manipulation
Weihao Xia, Yujiu Yang, Jing-Hao Xue, Baoyuan Wu
CVPR 2021 [Paper] [Code] [Video]

Paint by Word
David Bau, Alex Andonian, Audrey Cui, YeonHwan Park, Ali Jahanian, Aude Oliva, Antonio Torralba
arxiv 2021 [Paper]


GAN-based-Methods

GauGAN2
NVIDIA
[Project] [Video]

Multimodal Conditional Image Synthesis with Product-of-Experts GANs
Xun Huang, Arun Mallya, Ting-Chun Wang, Ming-Yu Liu
arxiv 2021 [Paper]

RiFeGAN2: Rich Feature Generation for Text-to-Image Synthesis from Constrained Prior Knowledge
Jun Cheng, Fuxiang Wu, Yanling Tian, Lei Wang, Dapeng Tao
TCSVT 2021 [Paper]

TRGAN: Text to Image Generation Through Optimizing Initial Image
Liang Zhao, Xinwei Li, Pingda Huang, Zhikui Chen, Yanqi Dai, Tianyu Li
ICONIP 2021 [Paper]

Audio-Driven Emotional Video Portraits [Audio2Image]
Xinya Ji, Hang Zhou, Kaisiyuan Wang, Wayne Wu, Chen Change Loy, Xun Cao, Feng Xu
CVPR 2021 [Paper] [Code] [Project]

Direct Speech-to-Image Translation [Audio2Image]
Jiguo Li, Xinfeng Zhang, Chuanmin Jia, Jizheng Xu, Li Zhang, Yue Wang, Siwei Ma, Wen Gao
JSTSP 2020 [Paper] [Code] [Project]

MirrorGAN: Learning Text-to-image Generation by Redescription [Text2Image]
Tingting Qiao, Jing Zhang, Duanqing Xu, Dacheng Tao
CVPR 2019 [Paper] [Code]

AttnGAN: Fine-Grained Text to Image Generation with Attentional Generative Adversarial Networks [Text2Image]
Tao Xu, Pengchuan Zhang, Qiuyuan Huang, Han Zhang, Zhe Gan, Xiaolei Huang, Xiaodong He
CVPR 2018 [Paper] [Code]

Plug & Play Generative Networks: Conditional Iterative Generation of Images in Latent Space
Anh Nguyen, Jeff Clune, Yoshua Bengio, Alexey Dosovitskiy, Jason Yosinski
CVPR 2017 [Paper] [Code]

StackGAN++: Realistic Image Synthesis with Stacked Generative Adversarial Networks [Text2Image]
Han Zhang, Tao Xu, Hongsheng Li, Shaoting Zhang, Xiaogang Wang, Xiaolei Huang, Dimitris Metaxas
TPAMI 2018 [Paper] [Code]

StackGAN: Text to Photo-realistic Image Synthesis with Stacked Generative Adversarial Networks [Text2Image]
Han Zhang, Tao Xu, Hongsheng Li, Shaoting Zhang, Xiaogang Wang, Xiaolei Huang, Dimitris Metaxas
ICCV 2017 [Paper] [Code]


Other-Methods

Language-Driven Image Style Transfer
Tsu-Jui Fu, Xin Eric Wang, William Yang Wang
arxiv 2021 [Paper]

CLIPstyler: Image Style Transfer with a Single Text Condition
Gihyun Kwon, Jong Chul Ye
arxiv 2021 [Paper] [Code]



Text-Encoding

FLAVA: A Foundational Language And Vision Alignment Model
Amanpreet Singh, Ronghang Hu, Vedanuj Goswami, Guillaume Couairon, Wojciech Galuba, Marcus Rohrbach, Douwe Kiela
arxiv 2021 [Paper]

Learning Transferable Visual Models From Natural Language Supervision (CLIP)
Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, Ilya Sutskever
arxiv 2021 [Paper] [Code]


Audio-Encoding

Wav2CLIP: Learning Robust Audio Representations From CLIP (Wav2CLIP)
Ho-Hsiang Wu, Prem Seetharaman, Kundan Kumar, Juan Pablo Bello
ICASSP 2022 [Paper] [Code]

Datasets

Multimodal CelebA-HQ (https://github.com/IIGROUP/MM-CelebA-HQ-Dataset)

DeepFashion MultiModal (https://github.com/yumingj/DeepFashion-MultiModal)

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].