All Projects → yiyang92 → vae_captioning

yiyang92 / vae_captioning

Licence: other
Implementation of Diverse and Accurate Image Description Using a Variational Auto-Encoder with an Additive Gaussian Encoding Space

Programming Languages

Jupyter Notebook
11667 projects
python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to vae captioning

Video prediction
Stochastic Adversarial Video Prediction
Stars: ✭ 247 (+325.86%)
Mutual labels:  vae
InpaintNet
Code accompanying ISMIR'19 paper titled "Learning to Traverse Latent Spaces for Musical Score Inpaintning"
Stars: ✭ 48 (-17.24%)
Mutual labels:  vae
tensorflow-mnist-AAE
Tensorflow implementation of adversarial auto-encoder for MNIST
Stars: ✭ 86 (+48.28%)
Mutual labels:  vae
MIDI-VAE
No description or website provided.
Stars: ✭ 56 (-3.45%)
Mutual labels:  vae
benchmark VAE
Unifying Variational Autoencoder (VAE) implementations in Pytorch (NeurIPS 2022)
Stars: ✭ 1,211 (+1987.93%)
Mutual labels:  vae
deepgtt
DeepGTT: Learning Travel Time Distributions with Deep Generative Model
Stars: ✭ 30 (-48.28%)
Mutual labels:  vae
Vae Cvae Mnist
Variational Autoencoder and Conditional Variational Autoencoder on MNIST in PyTorch
Stars: ✭ 229 (+294.83%)
Mutual labels:  vae
dcgan vae pytorch
dcgan combined with vae in pytorch!
Stars: ✭ 110 (+89.66%)
Mutual labels:  vae
EfficientMORL
EfficientMORL (ICML'21)
Stars: ✭ 22 (-62.07%)
Mutual labels:  vae
Variational-Autoencoder-pytorch
Implementation of a convolutional Variational-Autoencoder model in pytorch.
Stars: ✭ 65 (+12.07%)
Mutual labels:  vae
language-models
Keras implementations of three language models: character-level RNN, word-level RNN and Sentence VAE (Bowman, Vilnis et al 2016).
Stars: ✭ 39 (-32.76%)
Mutual labels:  vae
soft-intro-vae-pytorch
[CVPR 2021 Oral] Official PyTorch implementation of Soft-IntroVAE from the paper "Soft-IntroVAE: Analyzing and Improving Introspective Variational Autoencoders"
Stars: ✭ 170 (+193.1%)
Mutual labels:  vae
precision-recall-distributions
Assessing Generative Models via Precision and Recall (official repository)
Stars: ✭ 80 (+37.93%)
Mutual labels:  vae
DeepSSM SysID
Official PyTorch implementation of "Deep State Space Models for Nonlinear System Identification", 2020.
Stars: ✭ 62 (+6.9%)
Mutual labels:  vae
Bagel
IPCCC 2018: Robust and Unsupervised KPI Anomaly Detection Based on Conditional Variational Autoencoder
Stars: ✭ 45 (-22.41%)
Mutual labels:  vae
Human body prior
VPoser: Variational Human Pose Prior
Stars: ✭ 244 (+320.69%)
Mutual labels:  vae
vae-concrete
Keras implementation of a Variational Auto Encoder with a Concrete Latent Distribution
Stars: ✭ 51 (-12.07%)
Mutual labels:  vae
Pytorch models
PyTorch study
Stars: ✭ 14 (-75.86%)
Mutual labels:  vae
molecular-VAE
Implementation of the paper - Automatic chemical design using a data-driven continuous representation of molecules
Stars: ✭ 36 (-37.93%)
Mutual labels:  vae
concept-based-xai
Library implementing state-of-the-art Concept-based and Disentanglement Learning methods for Explainable AI
Stars: ✭ 41 (-29.31%)
Mutual labels:  vae

Diverse and Accurate Image Description Using a Variational Auto-Encoder with an Additive Gaussian Encoding Space

Overview

Tensorflow Implementation of Diverse and Accurate Image Description Using a Variational Auto-Encoder with an Additive Gaussian Encoding Space, (Nips) In this implementation included VGG16-LSTM baseline with beam search, Normal prior CVAE, GMM prior CVAE and AG-CVAE.

Usage

Training:

You will need to download image net weights for VGG16 first:https://yadi.sk/d/V6Rfzfei3TdKCH

Specify your mscoco directory in utils/parameters.py and launch:

python main.py --gpu 'your gpu'

It will train Normal CVAE prior model without fine-tuning, the best achieved result with using cluster vectors without fine-tuning is CIDER~0.8. Better results will be possible with some fine-tuning. If you want to train a model with fine-tuning, you can specify --fine_tune parameter.

Note: train/validation split can be changed simply by setting gen_val_captions parameter. Default is set to 4000 so we will have ~120000 in training set.

Note2: You will need to launch preprocess.py script first to obtain images hdf5 file. It is done for speed up image loading during fine-tuning the model.

Parameters

Parameters can be set directly in in utils/parameters.py file. (or specify through command line parameters). For example, if you want to train AG-CVAE model, which use cluster vectors as input to encoder and decoder, you can call:

python main.py --gpu 0 --embed_dim 256 --dec_hid 512 --epochs 50 --temperature 0.6 --gen_name ag --dec_drop 0.7 --dec_lstm_drop 0.7 --lr 0.001 --checkpoint ag_cv_test1 --coco_dir "/home/username/mscoco/coco/" --optimizer Adam --sample_gen greedy --c_v --prior AG

Generation

Two options:

  1. Using main.py

After some training just launch:

python main.py --gpu 'your gpu' --mode inference

If you used fine-tuning will need just to add --fine_tune to the parameters:

python main.py --gpu 'your gpu' --mode inference --fine_tune

It will produce json file ready to use with mscoco evaluation tool

  1. Using separate gen_caption.py script. It doesnt support fine-tuned model for now (will be modified soon). Can be used to generate captions for any images.

For list of required parameters:

python gen_caption.py -h

For example:

python -i gen_caption.py --img_path ./images/COCO_val2014_000000233527.jpg --checkpoint ./checkpoints/gaussian_nocv.ckpt --params_path ./pickles/params_Normal_False_gaussian_nocv_False

Where:

  • --params_path: saved Parameters class, can be saved by calling main.py --save_params
  • --checkpoint: saved checkpoint
  • --img_path: path for image
  • -i: for launching python in interactive mode so captions can be generated by calling generator.generate_caption(img_path). This can be also used in ipython notebook

Trained CVAE without cluster vectors checkpoint + parameters file can be downloaded at: https://yadi.sk/d/TCyXUmKk3SPVtc

Implementation progress

  • LSTM baseline (implemented)
  • CVAE baseline (implemented)
  • cluster vectors (impemented, vectors for test set generated using tensorflow object detection API and faster-RCNN)
  • beam search (implemented)
  • AG-CVAE (partially implemented)
  • GMM-CVAE (implemented)
  • Caption generation for new photos (partially implemented, will need to automate cluster vectors generation process)
  • fine_tune for better result (implemented)

Specific requirements

Other files

  • prepare_cluster_vectors_train_val.ipynb - takes MSCOCO dataset json files and generates cluster vectors
  • prepare_test_vectors.ipynb - gets test set cluster vector file, prepared using tf.models API and generates cluster vector
  • gen_caption_example.ipynb - generate caption for some photo (without cluster vectors inputs)
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].