All Projects → akanimax → Msg Gan V1

akanimax / Msg Gan V1

Licence: mit
MSG-GAN: Multi-Scale Gradients GAN (Architecture inspired from ProGAN but doesn't use layer-wise growing)

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Msg Gan V1

Alae
[CVPR2020] Adversarial Latent Autoencoders
Stars: ✭ 3,178 (+2639.66%)
Mutual labels:  gan, celeba
DCGAN-CelebA-PyTorch-CPP
DCGAN Implementation using PyTorch in both C++ and Python
Stars: ✭ 14 (-87.93%)
Mutual labels:  gan, celeba
Gan Tutorial
Simple Implementation of many GAN models with PyTorch.
Stars: ✭ 227 (+95.69%)
Mutual labels:  gan, celeba
Celeba Hq Modified
Modified h5tool.py make user getting celeba-HQ easier
Stars: ✭ 84 (-27.59%)
Mutual labels:  gan, celeba
Tf.gans Comparison
Implementations of (theoretical) generative adversarial networks and comparison without cherry-picking
Stars: ✭ 477 (+311.21%)
Mutual labels:  gan, celeba
Pycadl
Python package with source code from the course "Creative Applications of Deep Learning w/ TensorFlow"
Stars: ✭ 356 (+206.9%)
Mutual labels:  gan, celeba
Tensorflow DCGAN
Study Friendly Implementation of DCGAN in Tensorflow
Stars: ✭ 22 (-81.03%)
Mutual labels:  gan, celeba
Pytorch Mnist Celeba Gan Dcgan
Pytorch implementation of Generative Adversarial Networks (GAN) and Deep Convolutional Generative Adversarial Networks (DCGAN) for MNIST and CelebA datasets
Stars: ✭ 363 (+212.93%)
Mutual labels:  gan, celeba
Began Tensorflow
Tensorflow implementation of "BEGAN: Boundary Equilibrium Generative Adversarial Networks"
Stars: ✭ 904 (+679.31%)
Mutual labels:  gan, celeba
Tf Exercise Gan
Tensorflow implementation of different GANs and their comparisions
Stars: ✭ 110 (-5.17%)
Mutual labels:  gan, celeba
Tbgan
Project Page of 'Synthesizing Coupled 3D Face Modalities by Trunk-Branch Generative Adversarial Networks'
Stars: ✭ 105 (-9.48%)
Mutual labels:  gan
Fagan
A variant of the Self Attention GAN named: FAGAN (Full Attention GAN)
Stars: ✭ 105 (-9.48%)
Mutual labels:  gan
Nmt gan
generative adversarial nets for neural machine translation
Stars: ✭ 110 (-5.17%)
Mutual labels:  gan
Deepnudecli
DeepNude Command Line Version With Watermark Removed
Stars: ✭ 112 (-3.45%)
Mutual labels:  gan
Tensorflow2.0 Examples
🙄 Difficult algorithm, Simple code.
Stars: ✭ 1,397 (+1104.31%)
Mutual labels:  gan
What I Have Read
Paper Lists, Notes and Slides, Focus on NLP. For summarization, please refer to https://github.com/xcfcode/Summarization-Papers
Stars: ✭ 110 (-5.17%)
Mutual labels:  gan
Spectralnormalizationkeras
Spectral Normalization for Keras Dense and Convolution Layers
Stars: ✭ 100 (-13.79%)
Mutual labels:  gan
Zerospeech Tts Without T
A Pytorch implementation for the ZeroSpeech 2019 challenge.
Stars: ✭ 100 (-13.79%)
Mutual labels:  gan
Lsd Seg
Learning from Synthetic Data: Addressing Domain Shift for Semantic Segmentation
Stars: ✭ 99 (-14.66%)
Mutual labels:  gan
Hccg Cyclegan
Handwritten Chinese Characters Generation
Stars: ✭ 115 (-0.86%)
Mutual labels:  gan

**Please note that this is not the repo for the MSG-GAN research paper. Please head over to the msg-stylegan-tf repository for the official code and trained models for the MSG-GAN paper.

MSG-GAN

MSG-GAN (Multi-Scale Gradients GAN): A Network architecture inspired from the ProGAN.

The architecture of this gan contains connections between the intermediate layers of the singular Generator and the Discriminator. The network is not trained by progressively growing the layers. All the layers get trained at the same time.

Implementation uses the PyTorch framework.

Celeba samples

celeba generated samples

Please note that all the samples at various scales are generated by the network simultaneously.

Multi-Scale Gradients architecture

proposed MSG-GAN architecture

The above figure describes the architecture of the proposed Multi-Scale gradients GAN. As you can notice, from every intermediate layer of the Generator, a particular resolution image is extracted through (1 x 1) convolutions. These extracted images are in turn fed to the appropriate layers of the Discriminator. This allows for gradients to flow from the Discriminator to the Generator at multiple scales.


For the discrimination process, appropriately downsampled versions of the real images are fed to corresponding layers of the discriminator as shown in the diagram.


The problem of occurence of random gradients for GANs at the higher resolutions is tackled by layerwise training in the ProGAN paper. I present another solution for it. I have run the following experiment that preliminarily validates the proposed approach.

gradients explanation


Above figure explains how the Meaningful Gradients penetrate the Generator from Bottoms-up. Initially, only the lower resolution gradients are menaingful and thus start generating good images at those resolutions, but eventually, all the scales synchronize and start producing images. This results in a stabler training for the higher resolution.

Celeba Experiment

I ran the experiment on a skimmed version of the architecture as described in the ProGAN paper. Following table summarize the details of the Networks:

detailed_architecture


For extracting images after every 3 layer block at that resolution, I used 1 x 1 convolutions. Similar operation is performed for feeding the images to discriminator intermediate layers.

The architecture for the discriminator is also the same (reverse mirror), with the distinction that half of the channels come from the (1 x 1 convolution) transformed downsampled real images and half from conventional top-to-bottom path.

All the 3 x 3 convolution weights have a forward hook that applies spectral normalization on them. Apart from that, in the discriminator for the 4 x 4 layer, there is a MinibatchStd layer for improving sample diversity. No other stablization techniques are applied.

64 x 64 experiment

Loss Plot


128 x 128 experiment

Loss Plot


The above diagrams are the loss plots obtained during training the Networks in an adversarial manner. The loss function used is Relativistic Hinge-GAN. Apart from some initial aberrations, the training has stayed smooth.

Running the Code

Please note to use value of learning_rate=0.0003 for both G and D for all experiments. TTUR doesn't work with this architecture (from experience). And, you can find other better learning rates, but the value 0.0003 always seems to work.

Running the training is actually very simple. Just start the training by running the train.py script in the sourcecode/ directory. Refer to the following parameters for tweaking for your own use:

-h, --help            show this help message and exit
 --generator_file GENERATOR_FILE
                    pretrained weights file for generator
 --discriminator_file DISCRIMINATOR_FILE
                    pretrained_weights file for discriminator
 --images_dir IMAGES_DIR
                    path for the images directory
 --sample_dir SAMPLE_DIR
                    path for the generated samples directory
 --model_dir MODEL_DIR
                    path for saved models directory
 --loss_function LOSS_FUNCTION
                    loss function to be used: 'hinge', 'relativistic-
                    hinge'
 --depth DEPTH         Depth of the GAN
 --latent_size LATENT_SIZE
                    latent size for the generator
 --batch_size BATCH_SIZE
                    batch_size for training
 --start START         starting epoch number
 --num_epochs NUM_EPOCHS
                    number of epochs for training
 --feedback_factor FEEDBACK_FACTOR
                    number of logs to generate per epoch
 --num_samples NUM_SAMPLES
                    number of samples to generate for creating the grid
                    should be a square number preferably
 --gen_dilation GEN_DILATION
                    amount of dilation for the generator
 --dis_dilation DIS_DILATION
                    amount of dilation for the discriminator
 --checkpoint_factor CHECKPOINT_FACTOR
                    save model per n epochs
 --g_lr G_LR           learning rate for generator
 --d_lr D_LR           learning rate for discriminator
 --adam_beta1 ADAM_BETA1
                    value of beta_1 for adam optimizer
 --adam_beta2 ADAM_BETA2
                    value of beta_2 for adam optimizer
 --use_spectral_norm USE_SPECTRAL_NORM
                    Whether to use spectral normalization or not
 --data_percentage DATA_PERCENTAGE
                    percentage of data to use
 --num_workers NUM_WORKERS
                    number of parallel workers for reading files

Running 1024 x 1024 architecture

For training a network as per the ProGAN CelebaHQ experiment, use the following arguments:

$ python train.py --depth=9 \
                  --latent_size=512 \
                  --images_dir=<path to CelebaHQ images> \
                  --sample_dir=samples/CelebaHQ_experiment \
                  --model_dir=models/CelebaHQ_experiment

Set the batch_size, feedback_factor and checkpoint_factor accordingly. This experiment was carried out by me on a DGX-1 machine. The samples displayed in Figure 1. of this readme are the output of this experiment. You can use the models pretrained for 3 epochs at [1024 x 1024] for your training. These are available at -> https://drive.google.com/drive/folders/119n0CoMDGq2K1dnnGpOA3gOf4RwFAGFs

Trained weights for generating cool faces :)

Please refer to the models/Celeba/1/GAN_GEN_3.pth for the saved weights for this model in PyTorch format.

Other links

medium blog -> https://medium.com/@animeshsk3/msg-gan-multi-scale-gradients-gan-ee2170f55d50
Training video -> https://www.youtube.com/watch?v=dx7ZHRcbFr8

Thanks

Please feel free to open PRs here if you train on other datasets using this architecture.

Best regards,
@akanimax :)

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].