All Projects → snap-research → MoCoGAN-HD

snap-research / MoCoGAN-HD

Licence: other
[ICLR 2021 Spotlight] A Good Image Generator Is What You Need for High-Resolution Video Synthesis

Programming Languages

python
139335 projects - #7 most used programming language
Cuda
1817 projects
shell
77523 projects
C++
36643 projects - #6 most used programming language

Projects that are alternatives of or similar to MoCoGAN-HD

swd
unsupervised video and image generation
Stars: ✭ 50 (-77.68%)
Mutual labels:  image-generation, video-generation
Swapnet
Virtual Clothing Try-on with Deep Learning. PyTorch reproduction of SwapNet by Raj et al. 2018. Now with Docker support!
Stars: ✭ 202 (-9.82%)
Mutual labels:  image-generation
Fq Gan
Official implementation of FQ-GAN
Stars: ✭ 137 (-38.84%)
Mutual labels:  image-generation
Pixelcnn
Theano reimplementation of pixelCNN architecture
Stars: ✭ 170 (-24.11%)
Mutual labels:  image-generation
Focal Frequency Loss
Focal Frequency Loss for Generative Models
Stars: ✭ 141 (-37.05%)
Mutual labels:  image-generation
Distancegan
Pytorch implementation of "One-Sided Unsupervised Domain Mapping" NIPS 2017
Stars: ✭ 180 (-19.64%)
Mutual labels:  image-generation
Oneshottranslation
Pytorch implementation of "One-Shot Unsupervised Cross Domain Translation" NIPS 2018
Stars: ✭ 135 (-39.73%)
Mutual labels:  image-generation
Graphite
Open source 2D node-based raster/vector graphics editor (Photoshop + Illustrator + Houdini = Graphite)
Stars: ✭ 223 (-0.45%)
Mutual labels:  image-generation
Conditional Gan
Tensorflow implementation for Conditional Convolutional Adversarial Networks.
Stars: ✭ 202 (-9.82%)
Mutual labels:  image-generation
Tilegan
Code for TileGAN: Synthesis of Large-Scale Non-Homogeneous Textures (SIGGRAPH 2019)
Stars: ✭ 166 (-25.89%)
Mutual labels:  image-generation
Scene generation
A PyTorch implementation of the paper: Specifying Object Attributes and Relations in Interactive Scene Generation
Stars: ✭ 158 (-29.46%)
Mutual labels:  image-generation
Tsit
[ECCV 2020 Spotlight] A Simple and Versatile Framework for Image-to-Image Translation
Stars: ✭ 141 (-37.05%)
Mutual labels:  image-generation
Storygan
StoryGAN: A Sequential Conditional GAN for Story Visualization
Stars: ✭ 184 (-17.86%)
Mutual labels:  image-generation
Unetgan
Official Implementation of the paper "A U-Net Based Discriminator for Generative Adversarial Networks" (CVPR 2020)
Stars: ✭ 139 (-37.95%)
Mutual labels:  image-generation
Paddlegan
PaddlePaddle GAN library, including lots of interesting applications like First-Order motion transfer, wav2lip, picture repair, image editing, photo2cartoon, image style transfer, and so on.
Stars: ✭ 4,987 (+2126.34%)
Mutual labels:  image-generation
Gesturegan
[ACM MM 2018 Oral] GestureGAN for Hand Gesture-to-Gesture Translation in the Wild
Stars: ✭ 136 (-39.29%)
Mutual labels:  image-generation
Mmediting
OpenMMLab Image and Video Editing Toolbox
Stars: ✭ 2,618 (+1068.75%)
Mutual labels:  image-generation
Xinggan
[ECCV 2020] XingGAN for Person Image Generation
Stars: ✭ 177 (-20.98%)
Mutual labels:  image-generation
Finegan
FineGAN: Unsupervised Hierarchical Disentanglement for Fine-grained Object Generation and Discovery
Stars: ✭ 240 (+7.14%)
Mutual labels:  image-generation
Pytorch Cyclegan And Pix2pix
Image-to-Image Translation in PyTorch
Stars: ✭ 16,477 (+7255.8%)
Mutual labels:  image-generation

MoCoGAN-HD

Project | OpenReview | arXiv | Talk | Slides

(AFHQ, VoxCeleb)

Pytorch implementation of our method for high-resolution (e.g. 1024x1024) and cross-domain video synthesis.
A Good Image Generator Is What You Need for High-Resolution Video Synthesis
Yu Tian1, Jian Ren2, Menglei Chai2, Kyle Olszewski2, Xi Peng3, Dimitris N. Metaxas1, Sergey Tulyakov2
1Rutgers Univeristy, 2Snap Inc., 3University of Delaware
In ICLR 2021, Spotlight.

Pre-trained Image Generator & Video Datasets

In-domain Video Synthesis

UCF-101: image generator, video data, motion generator
FaceForensics: image generator, video data, motion generator
Sky-Timelapse: image generator, video data, motion generator

Cross-domain Video Synthesis

(FFHQ, VoxCeleb): FFHQ image generator, VoxCeleb, motion generator
(AFHQ, VoxCeleb): AFHQ image generator, VoxCeleb, motion generator
(Anime, VoxCeleb): Anime image generator, VoxCeleb, motion generator
(FFHQ-1024, VoxCeleb): FFHQ-1024 image generator, VoxCeleb, motion generator
(LSUN-Church, TLVDB): LSUN-Church image generator, TLVDB

Calculated pca stats are saved here.

Training

Organise the video dataset as follows:

Video dataset
|-- video1
    |-- img_0000.png
    |-- img_0001.png
    |-- img_0002.png
    |-- ...
|-- video2
    |-- img_0000.png
    |-- img_0001.png
    |-- img_0002.png
    |-- ...
|-- video3
    |-- img_0000.png
    |-- img_0001.png
    |-- img_0002.png
    |-- ...
|-- ...

In-domain Video Synthesis

UCF-101

Collect the PCA components from a pre-trained image generator.

python get_stats_pca.py --batchSize 4000 \
  --save_pca_path pca_stats/ucf_101 \
  --pca_iterations 250 \
  --latent_dimension 512 \
  --img_g_weights /path/to/ucf_101_image_generator \
  --style_gan_size 256 \
  --gpu 0

Train the model

python -W ignore train.py --name ucf_101 \
  --time_step 2 \
  --lr 0.0001 \
  --save_pca_path pca_stats/ucf_101 \
  --latent_dimension 512 \
  --dataroot /path/to/ucf_101 \
  --checkpoints_dir checkpoints/ucf_101 \
  --img_g_weights /path/to/ucf_101_image_generator \
  --multiprocessing_distributed --world_size 1 --rank 0 \
  --batchSize 16 \
  --workers 8 \
  --style_gan_size 256 \
  --total_epoch 100 \

Inference

python -W ignore evaluate.py  \
  --save_pca_path pca_stats/ucf_101 \
  --latent_dimension 512 \
  --style_gan_size 256 \
  --img_g_weights /path/to/ucf_101_image_generator \
  --load_pretrain_path /path/to/checkpoints \
  --load_pretrain_epoch the_epoch_for_testing (should >= 0) \
  --results results/ucf_101 \
  --num_test_videos 10 \

FaceForensics

Collect the PCA components from a pre-trained image generator.

sh script/faceforensics/run_get_stats_pca.sh

Train the model

sh script/faceforensics/run_train.sh

Inference

sh script/faceforensics/run_evaluate.sh

Sky-Timelapse

Collect the PCA components from a pre-trained image generator.

sh script/sky_timelapse/run_get_stats_pca.sh

Train the model

sh script/sky_timelapse/run_train.sh

Inference

sh script/sky_timelapse/run_evaluate.sh

Cross-domain Video Synthesis

(FFHQ, VoxCeleb)

Collect the PCA components from a pre-trained image generator.

python get_stats_pca.py --batchSize 4000 \
  --save_pca_path pca_stats/ffhq_256 \
  --pca_iterations 250 \
  --latent_dimension 512 \
  --img_g_weights /path/to/ffhq_image_generator \
  --style_gan_size 256 \
  --gpu 0

Train the model

python -W ignore train.py --name ffhq_256-voxel \
  --time_step 2 \
  --lr 0.0001 \
  --save_pca_path pca_stats/ffhq_256 \
  --latent_dimension 512 \
  --dataroot /path/to/voxel_dataset \
  --checkpoints_dir checkpoints \
  --img_g_weights /path/to/ffhq_image_generator \
  --multiprocessing_distributed --world_size 1 --rank 0 \
  --batchSize 16 \
  --workers 8 \
  --style_gan_size 256 \
  --total_epoch 25 \
  --cross_domain \

Inference

python -W ignore evaluate.py  \
  --save_pca_path pca_stats/ffhq_256 \
  --latent_dimension 512 \
  --style_gan_size 256 \
  --img_g_weights /path/to/ffhq_image_generator \
  --load_pretrain_path /path/to/checkpoints \
  --load_pretrain_epoch the_epoch_for_testing (should >= 0) \
  --results results/ffhq_256 \
  --num_test_videos 10 \

(FFHQ-1024, VoxCeleb)

Collect the PCA components from a pre-trained image generator.

sh script/ffhq-vox/run_get_stats_pca_1024.sh

Train the model

sh script/ffhq-vox/run_train_1024.sh

Inference

sh script/ffhq-vox/run_evaluate_1024.sh

(AFHQ, VoxCeleb)

Collect the PCA components from a pre-trained image generator.

sh script/afhq-vox/run_get_stats_pca.sh

Train the model

sh script/afhq-vox/run_train.sh

Inference

sh script/afhq-vox/run_evaluate.sh

(Anime, VoxCeleb)

Collect the PCA components from a pre-trained image generator.

sh script/anime-vox/run_get_stats_pca.sh

Train the model

sh script/anime-vox/run_train.sh

Inference

sh script/anime-vox/run_evaluate.sh

(LSUN-Church, TLVDB)

Collect the PCA components from a pre-trained image generator.

sh script/lsun_church-tlvdb/run_get_stats_pca.sh

Train the model

sh script/lsun_church-tlvdb/run_train.sh

Inference

sh script/lsun_church-tlvdb/run_evaluate.sh

Fine-tuning

If you wish to resume interupted training or fine-tune a pre-trained model, run (use UCF-101 as an example):

python -W ignore train.py --name ucf_101 \
  --time_step 2 \
  --lr 0.0001 \
  --save_pca_path pca_stats/ucf_101 \
  --latent_dimension 512 \
  --dataroot /path/to/ucf_101 \
  --checkpoints_dir checkpoints \
  --img_g_weights /path/to/ucf_101_image_generator \
  --multiprocessing_distributed --world_size 1 --rank 0 \
  --batchSize 16 \
  --workers 8 \
  --style_gan_size 256 \
  --total_epoch 100 \
  --load_pretrain_path /path/to/checkpoints \
  --load_pretrain_epoch 0

Training Control With Options

--w_residual controls the step of motion residual, default value is 0.2, we recommand <= 0.5
--n_pca # of PCA basis, used in the motion residual calculation, default value is 384 (out of 512 dim of StyleGAN2 w space), we recommand >= 256
--q_len size of queue to save logits used in constrastive loss, default value is 4,096
--video_frame_size spatial size of video frames for training, all synthesized video clips will be down-sampled to this size before feeding to the video discriminator, default value is 128, larger size may lead to better motion modeling
--cross_domain activate for cross-domain video synthesis, default value is False
--w_match weight for feature matching loss, default value is 1.0, large value improves content matching

Long Sequence Generation

LSTM Unrolling

In inference, you can generate long sequence by LSTM unrolling with --n_frames_G

python -W ignore evaluate.py  \
  --save_pca_path pca_stats/ffhq_256 \
  --latent_dimension 512 \
  --style_gan_size 256 \
  --img_g_weights /path/to/ffhq_image_generator \
  --load_pretrain_path /path/to/checkpoints \
  --load_pretrain_epoch 0 \
  --n_frames_G 32

Interpolation

In inference, you can generate long sequence by interpolation with --interpolation

python -W ignore evaluate.py  \
  --save_pca_path pca_stats/ffhq_256 \
  --latent_dimension 512 \
  --style_gan_size 256 \
  --img_g_weights /path/to/ffhq_image_generator \
  --load_pretrain_path /path/to/checkpoints \
  --load_pretrain_epoch 0 \
  --interpolation

Examples of Generated Videos

UCF-101

FaceForensics

Sky Timelapse

(FFHQ, VoxCeleb)

(FFHQ-1024, VoxCeleb)

(Anime, VoxCeleb)

(LSUN-Church, TLVDB)

Citation

If you use the code for your work, please cite our paper.

@inproceedings{
tian2021a,
title={A Good Image Generator Is What You Need for High-Resolution Video Synthesis},
author={Yu Tian and Jian Ren and Menglei Chai and Kyle Olszewski and Xi Peng and Dimitris N. Metaxas and Sergey Tulyakov},
booktitle={International Conference on Learning Representations},
year={2021},
url={https://openreview.net/forum?id=6puCSjH3hwA}
}

Acknowledgments

This code borrows StyleGAN2 Image Generator, BigGAN Discriminator, PatchGAN Discriminator.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].