All Projects → dorarad → Gansformer

dorarad / Gansformer

Licence: mit
Generative Adversarial Transformers

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Gansformer

stylegan-v
[CVPR 2022] StyleGAN-V: A Continuous Video Generator with the Price, Image Quality and Perks of StyleGAN2
Stars: ✭ 136 (-67.7%)
Mutual labels:  gans, generative-adversarial-networks
Diverse-Structure-Inpainting
CVPR 2021: "Generating Diverse Structure for Image Inpainting With Hierarchical VQ-VAE"
Stars: ✭ 131 (-68.88%)
Mutual labels:  attention, generative-adversarial-networks
cfg-gan
CFG-GAN: Composite functional gradient learning of generative adversarial models
Stars: ✭ 15 (-96.44%)
Mutual labels:  image-generation, generative-adversarial-networks
Attentiongan
AttentionGAN for Unpaired Image-to-Image Translation & Multi-Domain Image-to-Image Translation
Stars: ✭ 341 (-19%)
Mutual labels:  gans, image-generation
Anycost Gan
[CVPR 2021] Anycost GANs for Interactive Image Synthesis and Editing
Stars: ✭ 367 (-12.83%)
Mutual labels:  gans, image-generation
gan-vae-pretrained-pytorch
Pretrained GANs + VAEs + classifiers for MNIST/CIFAR in pytorch.
Stars: ✭ 134 (-68.17%)
Mutual labels:  gans, generative-adversarial-networks
generative deep learning
Generative Deep Learning Sessions led by Anugraha Sinha (Machine Learning Tokyo)
Stars: ✭ 24 (-94.3%)
Mutual labels:  gans, generative-adversarial-networks
Pytorch Cyclegan And Pix2pix
Image-to-Image Translation in PyTorch
Stars: ✭ 16,477 (+3813.78%)
Mutual labels:  gans, image-generation
3d Sdn
[NeurIPS 2018] 3D-Aware Scene Manipulation via Inverse Graphics
Stars: ✭ 256 (-39.19%)
Mutual labels:  gans, generative-adversarial-networks
MNIST-invert-color
Invert the color of MNIST images with PyTorch
Stars: ✭ 13 (-96.91%)
Mutual labels:  image-generation, generative-adversarial-networks
Anime2Sketch
A sketch extractor for anime/illustration.
Stars: ✭ 1,623 (+285.51%)
Mutual labels:  image-generation, gans
Selectiongan
[CVPR 2019 Oral] Multi-Channel Attention Selection GAN with Cascaded Semantic Guidance for Cross-View Image Translation
Stars: ✭ 366 (-13.06%)
Mutual labels:  gans, image-generation
CoCosNet-v2
CoCosNet v2: Full-Resolution Correspondence Learning for Image Translation
Stars: ✭ 312 (-25.89%)
Mutual labels:  image-generation, gans
OASIS
Official implementation of the paper "You Only Need Adversarial Supervision for Semantic Image Synthesis" (ICLR 2021)
Stars: ✭ 232 (-44.89%)
Mutual labels:  image-generation, generative-adversarial-networks
Finegan
FineGAN: Unsupervised Hierarchical Disentanglement for Fine-grained Object Generation and Discovery
Stars: ✭ 240 (-42.99%)
Mutual labels:  gans, image-generation
AODA
Official implementation of "Adversarial Open Domain Adaptation for Sketch-to-Photo Synthesis"(WACV 2022/CVPRW 2021)
Stars: ✭ 44 (-89.55%)
Mutual labels:  image-generation, gans
Fq Gan
Official implementation of FQ-GAN
Stars: ✭ 137 (-67.46%)
Mutual labels:  gans, image-generation
Generative adversarial networks 101
Keras implementations of Generative Adversarial Networks. GANs, DCGAN, CGAN, CCGAN, WGAN and LSGAN models with MNIST and CIFAR-10 datasets.
Stars: ✭ 138 (-67.22%)
Mutual labels:  gans, generative-adversarial-networks
mSRGAN-A-GAN-for-single-image-super-resolution-on-high-content-screening-microscopy-images.
Generative Adversarial Network for single image super-resolution in high content screening microscopy images
Stars: ✭ 52 (-87.65%)
Mutual labels:  image-generation, generative-adversarial-networks
Pytorch Gans
My implementation of various GAN (generative adversarial networks) architectures like vanilla GAN (Goodfellow et al.), cGAN (Mirza et al.), DCGAN (Radford et al.), etc.
Stars: ✭ 271 (-35.63%)
Mutual labels:  gans, generative-adversarial-networks

PWC PWC PWC

Python 3.7 TensorFlow 1.10 cuDNN 7.3.1 License CC BY-NC

GANsformer: Generative Adversarial Transformers

Drew A. Hudson* & C. Lawrence Zitnick

*I wish to thank Christopher D. Manning for the fruitful discussions and constructive feedback in developing the Bipartite Transformer, especially when explored within the language representation area and also in the visual context, as well as for providing the kind financial support that allowed this work to happen! 🌻

This is an implementation of the GANsformer model, a novel and efficient type of transformer, explored for the task of image generation. The network employs a bipartite structure that enables long-range interactions across the image, while maintaining computation of linearly efficiency, that can readily scale to high-resolution synthesis. The model iteratively propagates information from a set of latent variables to the evolving visual features and vice versa, to support the refinement of each in light of the other and encourage the emergence of compositional representations of objects and scenes. In contrast to the classic transformer architecture, it utilizes multiplicative integration that allows flexible region-based modulation, and can thus be seen as a generalization of the successful StyleGAN network.

Paper: https://arxiv.org/pdf/2103.01209
Contact: [email protected]
Implementation: network.py

Update: All code is now ready!

✅ Uploading initial code and readme
✅ Image sampling and visualization script
✅ Code clean-up and refacotiring, adding documentation
✅ Training and data-prepreation intructions
✅ Pretrained networks for all datasets
✅ Extra visualizations and evaluations

If you experience any issues or have suggestions for improvements or extensions, feel free to contact me either thourgh the issues page or at [email protected].

Bibtex

@article{hudson2021gansformer,
  title={Generative Adversarial Transformers},
  author={Hudson, Drew A and Zitnick, C. Lawrence},
  journal={arXiv preprint:2103.01209},
  year={2021}
}

Requirements

  • Python 3.6 or 3.7 are supported.
  • We recommend TensorFlow 1.14 which was used for development, but TensorFlow 1.15 is also supported.
  • The code was tested with CUDA 10.0 toolkit and cuDNN 7.5.
  • We have performed experiments on Titan V GPU. We assume 12GB of GPU memory (more memory can expedite training).
  • See requirements.txt for the required python packages and run pip install -r requirements.txt to install them.

Quickstart & Overview

A minimal example of using a pre-trained GANsformer can be found at generate.py. When executed, the 10-lines program downloads a pre-trained modle and uses it to generate some images:

python generate.py --gpus 0 --model gdrive:bedrooms-snapshot.pkl --output-dir images --images-num 8

You can use --truncation-psi to control the generated images quality/diversity trade-off.

We can train and evaluate new or pretrained model both quantitatively and qualitative with run_netowrk.py.
The model architecutre can be found at network.py. The training procedure is implemented at training_loop.py.

Data preparation

We explored the GANsformer model on 4 datasets for images and scenes: CLEVR, LSUN-Bedrooms, Cityscapes and FFHQ. The model can be trained on other datasets as well. We trained the model on 256x256 resolution. Higher resolutions are supported too. The model will automatically adapt to the resolution of the images in the dataset.

The prepare_data.py can either prepare the datasets from our catalog or create new datasets.

Default Datasets

To prepare the datasets from the catalog, run the following command:

python prepare_data.py --ffhq --cityscapes --clevr --bedroom --max-images 100000

See table below for details about the datasets in the catalog.

Useful options:

  • --data-dir the output data directory (default: datasets)
  • --shards-num to select the number of shards for the data (default: adapted to each dataset)
  • --max-images to store only a subset of the dataset, in order to reduce the size of the stored tfrecord files (default: max).
    This can be particularly useful to save space in case of large datasets, such as LSUN-bedrooms (originaly contains 3M images)

Custom Datasets

You can also use the script to create new custom datasets. For instance:

python prepare_data.py --task <dataset-name> --images-dir <source-dir> --format png --ratio 0.7 --shards-num 5

The script supports several formats: png, jpg, npy, hdf5, tfds and lmdb.

Dataset Catalog

Dataset # Images Resolution Dowhnload Size TFrecords Size Gamma
FFHQ 70,000 256×256 13GB 13GB 10
CLEVR 100,015 256×256 18GB 15.5GB 40
Cityscapes 24,998 256×256 1.8GB 8GB 20
LSUN-Bedrooms 3,033,042 256×256 42.8GB Up to 480GB 100

Use --max-images to reduce the size of the tfrecord files.

Training

Models are trained by using the --train option. To fine-tune a pretrained GANsformer model:

python run_network.py --train --gpus 0 --gansformer-default --expname clevr-pretrained --dataset clevr  

To train a GANsformer in its default configuration form scratch:

python run_network.py --train --gpus 0 --gansformer-default --expname clevr-scratch --dataset clevr \
  --pretrained-pkl None

By defualt, models training is resumed from the latest snapshot. Use --restart to strat a new experiment, or --pretrained-pkl to select a particular snapshot to load.

For comparing to state-of-the-art, we compute metric scores using 50,000 sample imaegs. To expedite training though, we recommend settings --eval-images-num to a lower number. Note though that this can impact the precision of the metrics, so we recommend using a lower value during training, and increasing it back up in the final evaluation.

We support a large variety of command-line options to adjust the model, training, and evaluation. Run python run_network.py -h for the full list of options!

Logging

  • During training, sample images and attention maps will be generated and stored at results/- (--keep-samples).
  • Metrics will also be regularly commputed and reported in a metric-<name>.txt file. --metrics can be set to fid for FID, is for Inception Score and pr for Precision/Recall.
  • Tensorboard logs are also created (--summarize) that track the metrics, loss values for the generator and discriminator, and other useful statistics over the course of training.

Baseline models

The codebase suppors multiple baselines in addition to the GANsformer. For instance, to run a vanilla GAN model:

python run_network.py --train --gpus 0 --baseline GAN --expname clevr-gan --dataset clevr 
  • Vanialla GAN: --baseline GAN, a standard GAN without style modulation.
  • StyleGAN2: --baseline StyleGAN2, with one global latent that modulates the image features.
  • k-GAN: --baseline kGAN, which generates multiple image layers independetly and then merge them into one shared image.
  • SAGAN: --baseline SAGAN, which performs self-attention between all image features in low-resolution layer (e.g. 32x32).

Evaluation

To evalute a model, use the --eval option:

python run_network.py --eval --gpus 0 --expname clevr-exp --dataset clevr

Add --pretrained-network gdrive:<dataset>-snapshot.pkl to evalute a pretrained model.

Below we provide the FID-50k scores for the GANsformer (using the pretrained checkpoints above) as well as baseline models.
Note that these scores are different than the scores reported in the StyleGAN2 paper since they run experiments for up to 7x more training steps (5k-15k kimg-steps in our experiments over all models, which takes about 3-4 days with 4 GPUs, vs 50-70k kimg-steps in their experiments, which take over 90 GPU-days).

Model CLEVR LSUN-Bedroom FFHQ Cityscapes
GAN 25.02 12.16 13.18 11.57
kGAN 28.28 69.9 61.14 51.08
SAGAN 26.04 14.06 16.21 12.81
StyleGAN2 16.05 11.53 16.21 8.35
VQGAN 32.60 59.63 63.12 173.80
GANsformer 9.24 6.15 7.42 5.23

Visualization

The code supports producing qualitative results and visualizations. For instance, to create attention maps for each layer:

python run_network.py --gpus 0 --eval --expname clevr-exp --dataset clevr --vis-layer-maps

Below you can see sample images and attention maps produced by the GANsformer:

Command-line Options

In the following we list some of the most useful model options.

Training

  • --gamma: We recommend explore different values for the chosen dataset (default: 10)
  • --truncation-psi: Controls the image quality/diversity trade-off. (default: 0.65)
  • --eval-images-num: Number of images to compute metrics over. We recommend selecting a lower number to expedite training (default: 50,000)
  • --restart: To restart training from sracth instead of resuming from the latest snapshot
  • --pretrained-pkl: To load a pretrained model, either a local one or from drive gdrive:<dataset>-snapshot.pkl for the datasets in the catalog.
  • --data-dir and --result-dir: Directory names for the datasets (tfrecords) and logging/results.

Model (most useful)

  • --transformer: To add transformer layers to the generator (GANsformer)
  • --components-num: Number of latent components, which will attend to the image. We recommend values in the range of 8-16 (default: 1)
  • --latent-size: Overall latent size (default: 512). The size of each latent component will then be latent_size/components_num
  • --num-heads: Number of attention heads (default: 1)
  • --integration: Integration of information in the transformer layer, e.g. add or mul (default: mul)

Model (others)

  • --g-start-res and --g-end-res: Start and end resolution for the transformer layers (default: all layers up to resolution 28)
  • --kmeans: Track and update image-to-latents assignment centroids, used in the duplex attention
  • --mapping-ltnt2ltnt: Perform self-attention over latents in the mapping network
  • --use-pos: Use trainable positional encodings for the latents.
  • --style False: To turn-off one-vector global style modulation (StyleGAN2).

Visualization

  • Sample imaegs
    • --vis-images: Generate image samples
    • --vis-latents: Save source latent vectors
  • Attention maps
    • --vis-maps: Visualize attention maps of last layer and first head
    • --vis-layer-maps: Visualize attention maps of all layer and heads
    • --blending-alpha: Alpha weight when visualizing a bledning of images and attention maps
  • Image interpolations
    • --vis-interpolations: Generative interplations between pairs of source latents
    • --interpolation-density: Number of samples in between two end points of an interpolation (default: 8)
  • Others
    • --vis-noise-var: Create noise variation visualization
    • --vis-style-mix: Create style mixing visualization

Run python run_network.py -h for the full options list.

Sample images

CUDA / Installation

The model relies on custom TensorFlow ops that are compiled on the fly using NVCC.

To set up the environment:

export PATH=/usr/local/cuda-10.0/bin${PATH:+:${PATH}}
export LD_LIBRARY_PATH=/usr/local/cuda10.0/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}

To test that your NVCC installation is working correctly, run:

nvcc test_nvcc.cu -o test_nvcc -run
| CPU says hello.
| GPU says hello.

Architecture Overview

The GANsformer consists of two networks:

Generator: which produces the images (x) given randomly sampled latents (z). The latent z has a shape [batch_size, component_num, latent_dim], where component_num = 1 by default (Vanilla GAN, StyleGAN) but is > 1 for the GANsformer model. We can define the latent components by splitting z along the second dimension to obtain z_1,...,z_k latent components. The generator likewise consists of two parts:

  • Mapping network: converts sampled latents from a normal distribution (z) to the intermediate space (w). A series of Feed-forward layers. The k latent components either are mapped independently from the z space to the w space or interact with each other through self-attention (optional flag).
  • Synthesis network: the intermediate latents w are used to guide the generation of new images. Images features begin from a small constant/sampled grid of 4x4, and then go through multiple layers of convolution and up-sampling until reaching the desirable resolution (e.g. 256x256). After each convolution, the image features are modulated (meaning that their variance and bias are controlled) by the intermediate latent vectors w. While in the StyleGAN model there is one global w vectors that controls all the features equally. The GANsformer uses attention so that the k latent components specialize to control different regions in the image to create it cooperatively, and therefore perform better especially in generating images depicting multi-object scenes.
  • Attention can be used in several ways
    • Simplex Attention: when attention is applied in one direction only from the latents to the image features (top-down).
    • Duplex Attention: when attention is applied in the two directions: latents to image features (top-down) and then image features back to latents (bottom-up), so that each representation informs the other iteratively.
    • Self Attention between latents: can also be used so to each direct interactions between the latents.
    • Self Attention between image features (SAGAN model): prior approaches used attention directly between the image features, but this method does not scale well due to the quadratic number of features which becomes very high for high-resolutions.

Discriminator: Receives and image and has to predict whether it is real or fake – originating from the dataset or the generator. The model perform multiple layers of convolution and downsampling on the image, reducing the representation's resolution gradually until making final prediction. Optionally, attention can be incorporated into the discriminator as well where it has multiple (k) aggregator variables, that use attention to adaptively collect information from the image while being processed. We observe small improvements in model performance when attention is used in the discriminator, although note that most of the gain in using attention based on our observations arises from the generator.

Codebase

This codebase builds on top of and extends the great StyleGAN2 repository by Karras et al.

The GANsformer model can also be seen as a generalization of StyleGAN: while StyleGAN has one global latent vector that control the style of all image features globally, the GANsformer has k latent vectors, that cooperate through attention to control regions within the image, and thereby better modeling images of multi-object and compositional scenes.

If you have questions, comments or feedback, please feel free to contact me at [email protected], Thank you! :)

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].