All Projects → chahuja → mix-stage

chahuja / mix-stage

Licence: other
Official Repository for the paper Style Transfer for Co-Speech Gesture Animation: A Multi-Speaker Conditional-Mixture Approach published in ECCV 2020 (https://arxiv.org/abs/2007.12553)

Programming Languages

python
139335 projects - #7 most used programming language
HTML
75241 projects

Projects that are alternatives of or similar to mix-stage

Msg Net
Multi-style Generative Network for Real-time Transfer
Stars: ✭ 152 (+590.91%)
Mutual labels:  style-transfer, generative-model
Adain Style
Arbitrary Style Transfer in Real-time with Adaptive Instance Normalization
Stars: ✭ 1,049 (+4668.18%)
Mutual labels:  style-transfer, generative-model
favorite-research-papers
Listing my favorite research papers 📝 from different fields as I read them.
Stars: ✭ 12 (-45.45%)
Mutual labels:  style-transfer, generative-model
Vincent Ai Artist
Style transfer using deep convolutional neural nets
Stars: ✭ 176 (+700%)
Mutual labels:  style-transfer, generative-model
MMD-GAN
Improving MMD-GAN training with repulsive loss function
Stars: ✭ 82 (+272.73%)
Mutual labels:  generative-model
StyleTransfer-PyTorch
Implementation of image style transfer in PyTorch
Stars: ✭ 18 (-18.18%)
Mutual labels:  style-transfer
PREREQ-IAAI-19
Inferring Concept Prerequisite Relations from Online Educational Resources (IAAI-19)
Stars: ✭ 22 (+0%)
Mutual labels:  generative-model
StyleGAN demo
The re-implementation of style-based generator idea
Stars: ✭ 22 (+0%)
Mutual labels:  style-transfer
GatedPixelCNNPyTorch
PyTorch implementation of "Conditional Image Generation with PixelCNN Decoders" by van den Oord et al. 2016
Stars: ✭ 68 (+209.09%)
Mutual labels:  generative-model
Android-Tensorflow-Style-Transfer
An Android app built with an artistic style transfer neural network
Stars: ✭ 31 (+40.91%)
Mutual labels:  style-transfer
Artistic-Style-Transfer-using-Keras-Tensorflow
Art to Image Style Transfer using Keras and Tensorflow.
Stars: ✭ 22 (+0%)
Mutual labels:  style-transfer
ImgFastNeuralStyleTransfer TensorFlow
快速风格迁移学习实践
Stars: ✭ 22 (+0%)
Mutual labels:  style-transfer
color-aware-style-transfer
Reference code for the paper CAMS: Color-Aware Multi-Style Transfer.
Stars: ✭ 36 (+63.64%)
Mutual labels:  style-transfer
AC-VRNN
PyTorch code for CVIU paper "AC-VRNN: Attentive Conditional-VRNN for Multi-Future Trajectory Prediction"
Stars: ✭ 21 (-4.55%)
Mutual labels:  generative-model
GLStyleNet
Semantic style transfer, code and data for "GLStyleNet: Exquisite Style Transfer Combining Global and Local Pyramid Features" (IET Computer Vision 2020)
Stars: ✭ 48 (+118.18%)
Mutual labels:  style-transfer
caffe-simnets
The SimNets Architecture's Implementation in Caffe
Stars: ✭ 13 (-40.91%)
Mutual labels:  generative-model
texturize
🤖🖌️ Generate photo-realistic textures based on source images. Remix, remake, mashup! Useful if you want to create variations on a theme or elaborate on an existing texture.
Stars: ✭ 495 (+2150%)
Mutual labels:  generative-model
TitleStylist
Source code for our "TitleStylist" paper at ACL 2020
Stars: ✭ 72 (+227.27%)
Mutual labels:  style-transfer
feed forward vqgan clip
Feed forward VQGAN-CLIP model, where the goal is to eliminate the need for optimizing the latent space of VQGAN for each input prompt
Stars: ✭ 135 (+513.64%)
Mutual labels:  generative-model
eccv16 attr2img
Torch Implemention of ECCV'16 paper: Attribute2Image
Stars: ✭ 93 (+322.73%)
Mutual labels:  generative-model

Mix-Stage

This is the official repository for the paper Style Transfer for Co-Speech Gesture Animation: A Multi-Speaker Conditional Mixture Approach

Chaitanya Ahuja, Dong Won Lee, Yukiko Nakano, Louis-Philippe Morency - ECCV2020

License: MIT

Links: Paper, Demo+Project Website, Dataset Website

Bibtex:

@inproceedings{ahuja2020style,
  title={Style Transfer for Co-Speech Gesture Animation: A Multi-Speaker Conditional-Mixture Approach},
  author={Ahuja, Chaitanya and Lee, Dong Won and Nakano, Yukiko I and Morency, Louis-Philippe},
  booktitle={European Conference on Computer Vision},
  year={2020}
}

Overview

overview

This repo has information on the training code and pre-trained models.

For the dataset, we refer you to:

For the purposes of this repository, we assume that the dataset is downloaded to ../data/

This repo is divided into the following sections:

This is followed by additional informational sections:

Clone

As the project website is also hosted on this repository, clone only the master branch,

git clone -b master --single-branch https://github.com/chahuja/mix-stage.git

Set up Environment

  • pycasper
cd mix-stage
mkdir ../pycasper
git clone https://github.com/chahuja/pycasper ../pycasper

cd src
ln -s ../../pycasper/pycasper .  ## create a symlink
  • Create an anaconda or a virtual enviroment and activate it
pip install -r requirements.txt

Training

To train a model from scratch, run the following script after chaging directory to src,

python train.py \
 -cpk JointLateClusterSoftStyle4_G \ ## checkpoint name which is a part of experiment file PREFIX
 -exp 1 \ ## creates a unique experiment number
 -path2data ../data ## path to data files
 -speaker '["corden", "lec_cosmic", "ytch_prof", "oliver"]' \ ## List of speakers
 -model JointLateClusterSoftStyle4_G \ ## Name of the model
 -modelKwargs '{"lambda_id": 0.1, "argmax": 1, "some_grad_flag": 1, "train_only": 1}' \ ## List of extra arguments to instantiate an object of the model
 -note mix-stage \ ## unique identifier for the model to group results
 -save_dir save/mix-stage \ ## save directory
 -modalities '["pose/normalize", "audio/log_mel_400"]' \ ## all modalities as a list. output modality first, then input modalities
 -fs_new '[15, 15]' \ ## frame rate of each modality
 -input_modalities '["audio/log_mel_400"]' \ ## List of input modalities
 -output_modalities '["pose/normalize"]' \ ## List of output modalities
 -gan 1 \ ## Flag to train with a discriminator on the output
 -loss L1Loss \ ## Choice of loss function. Any loss function torch.nn.* will work here
 -window_hop 5 \ ## Hop size of the window for the dataloader
 -render 0 \ ## flag to render. Default 0
 -batch_size 16 \ ## batch size
 -num_epochs 20 \ ## total number of epochs
 -overfit 0 \ ## flag to overfit (for debugging)
 -early_stopping 0 \ ## flag to perform early stopping 
 -dev_key dev_spatialNorm \ ## metric used to choose the best model
 -num_clusters 8 \ ## number of clusters in the Conditional Mix-GAN
 -feats '["pose", "velocity", "speed"]' \ ## Festures used to make the clusters
 -style_iters 3000 \ ## Number of training iterations per epoch
 -num_iters 3000 ## Maximum number of validation iterations per epoch

Scripts for training models in the paper can be found as follows,

Inference

Inference for quantitative evaluation

python sample.py \
-load <path2weights> \ ## path to PREFIX_weights.p file
-path2data ../data ## path to data

Sampling gestures with many-to-many style transfers

python sample.py \
-load <path2weights> \ ## path to PREFIX_weights.p file
-sample_all_styles 20 \ ## if value > 0, samples `value` number of intervals in all styles (= number of speakers)
-path2data ../data ## path to data

Pre-trained models (UPDATE : March 17, 2021)

Download pretrained models and unzip them in the src folder.

cd mix-stage/src
wget -O pretrained.zip https://cmu.box.com/shared/static/gw9i4qvj2vykcq3krkkvq6nickb4chem.zip
unzip pretrained.zip

Once you unzip them, all the pretrained models can be found in the save/pretrained_models. For the multi-speaker scenario in Table 1 and part of Table 2 of the paper, look look for the weights in save/pretrained_models/multi-speaker. For the attribute level training, look for the weights in save/pretrained_models/attribute.

An example of sampling gesture animations from a pretrained model:

python sample.py \
-load save/pretrained_models/multi-speaker/exp_3659_cpk_JointLateClusterSoftStyle4_G_speaker_\[\'corden\',\ \'lec_cosmic\'\]_model_JointLateClusterSoftStyle4_G_note_s2g_gst_mixgan15_weights.p \
-path2data ../data

We also release a script to extract the reported results from the pretrained models in eccv2020-results.ipynb which requires the latest version of pycasper.

Rendering

python render.py \
-render 20 \ ## number of intervals to render
-load <path2weights> \ ## path to PREFIX_weights.p file
-render_text 0 ## if 1, render text on the video as well.
-path2data ../data ## path to data

Experiment Files

Every experiment multiple files with the same PREFIX:

Training files

  • PREFIX_args.args - arguments stored as a dictionary
  • PREFIX_res.json - results for every epoch
  • PREFIX_weights.p - weights of the best model
  • PREFIX_log.log - log file
  • PREFIX_name.name - name file to restore value of PREFIX

Inference files

  • PREFIX/ - directory containing sampled h5 files and eventually renders
  • PREFIX_cummMetrics.json - metrics extimated at inference
  • PREFIX_metrics.json - metrics estimated at inference for every style transfer separately
  • PREFIX_style.pkl - style space conditioned gesture regions to compute t-SNE plots
  • PREFIX_histogram.json - Histogram of each generator in conditional Mix-GAN giving an idea about which set of generators were important for which style.

Inception Score for pose sequences

To measure inception scores for pose sequences (or gestures), we refer you to the class InceptionScoreStyle

Other cool stuff

If you enjoyed this work, I would recommend the following projects which study different axes of nonverbal grounding,

Issues

All research has a tag of work in progress. If you find any issues with this code, feel free to raise issues or pull requests (even better) and I will get to it as soon as humanly possible.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].