All Projects → mhw32 → multimodal-vae-public

mhw32 / multimodal-vae-public

Licence: MIT license
A PyTorch implementation of "Multimodal Generative Models for Scalable Weakly-Supervised Learning" (https://arxiv.org/abs/1802.05335)

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to multimodal-vae-public

Tensorflow Generative Model Collections
Collection of generative models in Tensorflow
Stars: ✭ 3,785 (+3762.24%)
Mutual labels:  variational-autoencoder, generative-models
precision-recall-distributions
Assessing Generative Models via Precision and Recall (official repository)
Stars: ✭ 80 (-18.37%)
Mutual labels:  variational-autoencoder, generative-models
Generative Continual Learning
No description or website provided.
Stars: ✭ 51 (-47.96%)
Mutual labels:  variational-autoencoder, generative-models
CVAE-AnomalyDetection-PyTorch
Example of Anomaly Detection using Convolutional Variational Auto-Encoder (CVAE)
Stars: ✭ 23 (-76.53%)
Mutual labels:  variational-autoencoder
playing with vae
Comparing FC VAE / FCN VAE / PCA / UMAP on MNIST / FMNIST
Stars: ✭ 53 (-45.92%)
Mutual labels:  variational-autoencoder
TopicNet
Interface for easier topic modelling.
Stars: ✭ 127 (+29.59%)
Mutual labels:  multimodal-learning
shared-latent-space
Shared Latent Space VAE's
Stars: ✭ 15 (-84.69%)
Mutual labels:  variational-autoencoder
lego-face-VAE
Variational autoencoder for Lego minifig faces
Stars: ✭ 15 (-84.69%)
Mutual labels:  variational-autoencoder
paccmann rl
Code pipeline for the PaccMann^RL in iScience: https://www.cell.com/iscience/fulltext/S2589-0042(21)00237-6
Stars: ✭ 22 (-77.55%)
Mutual labels:  generative-models
linguistic-style-transfer-pytorch
Implementation of "Disentangled Representation Learning for Non-Parallel Text Style Transfer(ACL 2019)" in Pytorch
Stars: ✭ 55 (-43.88%)
Mutual labels:  variational-autoencoder
continuous Bernoulli
There are C language computer programs about the simulator, transformation, and test statistic of continuous Bernoulli distribution. More than that, the book contains continuous Binomial distribution and continuous Trinomial distribution.
Stars: ✭ 22 (-77.55%)
Mutual labels:  variational-autoencoder
fewshot-font-generation
The unified repository for few-shot font generation methods. This repository includes FUNIT (ICCV'19), DM-Font (ECCV'20), LF-Font (AAAI'21) and MX-Font (ICCV'21).
Stars: ✭ 76 (-22.45%)
Mutual labels:  generative-models
STEP
Spatial Temporal Graph Convolutional Networks for Emotion Perception from Gaits
Stars: ✭ 39 (-60.2%)
Mutual labels:  variational-autoencoder
Generative-Model
Repository for implementation of generative models with Tensorflow 1.x
Stars: ✭ 66 (-32.65%)
Mutual labels:  generative-models
cfg-gan
CFG-GAN: Composite functional gradient learning of generative adversarial models
Stars: ✭ 15 (-84.69%)
Mutual labels:  generative-models
PanoDR
Code and models for "PanoDR: Spherical Panorama Diminished Reality for Indoor Scenes" presented at the OmniCV workshop of CVPR21.
Stars: ✭ 22 (-77.55%)
Mutual labels:  generative-models
adVAE
Implementation of 'Self-Adversarial Variational Autoencoder with Gaussian Anomaly Prior Distribution for Anomaly Detection'
Stars: ✭ 17 (-82.65%)
Mutual labels:  variational-autoencoder
lffont
Official PyTorch implementation of LF-Font (Few-shot Font Generation with Localized Style Representations and Factorization) AAAI 2021
Stars: ✭ 110 (+12.24%)
Mutual labels:  generative-models
CVAE Dial
CVAE_XGate model in paper "Xu, Dusek, Konstas, Rieser. Better Conversations by Modeling, Filtering, and Optimizing for Coherence and Diversity"
Stars: ✭ 16 (-83.67%)
Mutual labels:  variational-autoencoder
overlord
Official pytorch implementation of "Scaling-up Disentanglement for Image Translation", ICCV 2021.
Stars: ✭ 35 (-64.29%)
Mutual labels:  generative-models

Multimodal Variational Autoencoder

A PyTorch implementation of Multimodal Generative Models for Scalable Weakly-Supervised Learning (https://arxiv.org/abs/1802.05335).

Setup/Installation

Open a new conda environment and install the necessary dependencies. See here for more details on installing dlib.

conda create -n multimodal python=2.7 anaconda
# activate the environment
source activate multimodal

# install the pytorch
conda install pytorch torchvision -c pytorch

pip install tqdm
pip install scikit-image
pip install python-opencv
pip install imutils

# install dlib
brew install cmake
brew install boost
pip install dlib

Some additional setup is needed for CelebA-related datasets. Download the aligned-and-cropped version here. Also download any annotation information. For the computer vision experiment, we need to precompute a few computations on the CelebA dataset. The dlib model we use to extract landmarks is from a PyImageSearch tutorial. You can download it here. After downloading CelebA, try the following:

cd vision
# assuming CelebA images are stored in ./data/images
python setup.py grayscale ./data/images ./data/grayscale
python setup.py edge ./data/images ./data/edge
python setup.py mask ./data/images ./data/mask

Example Experiments

This repository contains a subset of the experiments mentioned in the paper. In each folder, there are 3 scripts that one can run: train.py to fit the MVAE; sample.py to (conditionally) reconstruct from samples in the latent space; and loglike.py to compute the marginal log likelihood log p(x) using q(z|x,y) as the inference network.

By default, we anneal KL from 0 to 1. The user can customize the learning rate (--lr), number of latent dimensions (--n-latents), te annealing rate (--annealing-epochs), etc. from the command line. Notably, the user can set lambda_image and lambda_text, which balance the reconstruction terms. This tends to be important in practice. Training the model will save weights to filesystem. Run python train.py -h for details.

experiment-reconstructions

MNIST

Treat images as one modality and the label (integer 0 to 9) as a second.

cd mnist
CUDA_VISIBLE_DEVICES=0 python train.py --lambda-text 50. --cuda
# model is stored in ./trained_models
CUDA_VISIBLE_DEVICES=0 python sample.py ./trained_models/model_best.pth.tar --cuda
# you can also condition on the label
CUDA_VISIBLE_DEVICES=0 python sample.py ./trained_models/model_best.pth.tar --condition-on-text 5 --cuda

FashionMNIST

Very similar to MNIST, except the labels correspond to categories of fashion items.

cd fashionmnist
CUDA_VISIBLE_DEVICES=0 python train.py --lambda-text 50. --cuda
# model is stored in ./trained_models
CUDA_VISIBLE_DEVICES=0 python sample.py ./trained_models/model_best.pth.tar --cuda
# you can also condition on the label
CUDA_VISIBLE_DEVICES=0 python sample.py ./trained_models/model_best.pth.tar --condition-on-text 1 --cuda

MultiMNIST

Again, a MNIST-derivative except each image contains up to 4 digits in fixed locations. The second modality is a string of digits representing the character(s) in the image. We employ an RNN in the label inference network q(z|y).

cd multimnist
CUDA_VISIBLE_DEVICES=0 python train.py --lambda-text 10. --cuda
# model is stored in ./trained_models
CUDA_VISIBLE_DEVICES=0 python sample.py ./trained_models/model_best.pth.tar --cuda
# you can also condition on the digits
CUDA_VISIBLE_DEVICES=0 python sample.py ./trained_models/model_best.pth.tar --condition-on-text 1773 --cuda

CelebA

Treat images of celebrity faces as one modality and 18 attributes pertaining to the celebrity (i.e. gender, hair color, etc) as a second modality.

cd celeba
CUDA_VISIBLE_DEVICES=0 python train.py --lambda-attrs 10. --cuda
# model is stored in ./trained_models
CUDA_VISIBLE_DEVICES=0 python sample.py ./trained_models/model_best.pth.tar --cuda
# you can also condition on the attribute
CUDA_VISIBLE_DEVICES=0 python sample.py ./trained_models/model_best.pth.tar --condition-on-attrs Male --cuda

CelebA-19

Similar to CelebA except we treat each attribute as its own expert in the product-of-experts. Here we begin to explore more than 2 modalities. See code for an example of the MVAE training paradigm (mentioned in the paper) by sampling multimodal ELBO terms.

cd celeba
CUDA_VISIBLE_DEVICES=0 python train.py --lambda-attrs 10. approx-m 1 --cuda

Here approx-m sets the number of ELBO terms to sample beyond the complete and individual terms.

Computer Vision Transformations

We learn a series of image processing transformations (i.e. colorization, image completion, edge detciont, watermark removal, and facial landmark segmentation) as modalities. We curate a dataset by applying off-the-shelf tools to CelebA. For simplicitly, in this implementation, we only include the complete ELBO term (using all 6 modalities), and the 6 individual ELBO terms as the objective (in order words k = 0). One can also subsample more ELBO terms to better approximate the true MVAE objective (as in /celeba19/train.py).

cd vision
CUDA_VISIBLE_DEVICES=0 python train.py --cuda
# model is stored in ./trained_models
CUDA_VISIBLE_DEVICES=0 python sample.py ./trained_models/model_best.pth.tar --cuda
# this will reconstruct all the modalities from the image
CUDA_VISIBLE_DEVICES=0 python sample.py ./trained_models/model_best.pth.tar --condition-file <path_to_file> --condition-type image --cuda
# we can also go in the other directions
CUDA_VISIBLE_DEVICES=0 python sample.py ./trained_models/model_best.pth.tar --condition-file <path_to_file> --condition-type watermark --cuda

vision-reconstructions

Questions?

Please report any bugs and I will get to them ASAP. For any additional questions, feel free to email [email protected].

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].