All Projects → kcosta42 → VQGAN-CLIP-Docker

kcosta42 / VQGAN-CLIP-Docker

Licence: MIT License
Zero-Shot Text-to-Image Generation VQGAN+CLIP Dockerized

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to VQGAN-CLIP-Docker

Dalle Pytorch
Implementation / replication of DALL-E, OpenAI's Text to Image Transformer, in Pytorch
Stars: ✭ 3,661 (+6212.07%)
Mutual labels:  transformers, text-to-image
text2image-benchmark
Performance comparison of existing GAN based Text To Image algorithms. (GAN-CLS, StackGAN, TAC-GAN)
Stars: ✭ 25 (-56.9%)
Mutual labels:  generative-adversarial-network, text2image
VQGAN-CLIP
Just playing with getting VQGAN+CLIP running locally, rather than having to use colab.
Stars: ✭ 2,369 (+3984.48%)
Mutual labels:  text-to-image, text2image
Cyclegan Qp
Official PyTorch implementation of "Artist Style Transfer Via Quadratic Potential"
Stars: ✭ 59 (+1.72%)
Mutual labels:  generative-adversarial-network, generative-art
keras-text-to-image
Translate text to image in Keras using GAN and Word2Vec as well as recurrent neural networks
Stars: ✭ 60 (+3.45%)
Mutual labels:  generative-adversarial-network, text-to-image
CogView
Text-to-Image generation. The repo for NeurIPS 2021 paper "CogView: Mastering Text-to-Image Generation via Transformers".
Stars: ✭ 708 (+1120.69%)
Mutual labels:  transformers, text-to-image
feed forward vqgan clip
Feed forward VQGAN-CLIP model, where the goal is to eliminate the need for optimizing the latent space of VQGAN for each input prompt
Stars: ✭ 135 (+132.76%)
Mutual labels:  text-to-image, vqgan
Conditional Animegan
Conditional GAN for Anime face generation.
Stars: ✭ 70 (+20.69%)
Mutual labels:  generative-adversarial-network, generative-art
CLIP-Guided-Diffusion
Just playing with getting CLIP Guided Diffusion running locally, rather than having to use colab.
Stars: ✭ 328 (+465.52%)
Mutual labels:  text-to-image, text2image
awesome-generative-deep-art
A curated list of generative deep learning tools, works, models, etc. for artistic uses
Stars: ✭ 172 (+196.55%)
Mutual labels:  generative-art, text2image
Cartoongan Tensorflow
Generate your own cartoon-style images with CartoonGAN (CVPR 2018), powered by TensorFlow 2.0 Alpha.
Stars: ✭ 587 (+912.07%)
Mutual labels:  generative-adversarial-network, generative-art
KoDALLE
🇰🇷 Text to Image in Korean
Stars: ✭ 55 (-5.17%)
Mutual labels:  text-to-image, vqgan
vqgan-clip-app
Local image generation using VQGAN-CLIP or CLIP guided diffusion
Stars: ✭ 94 (+62.07%)
Mutual labels:  generative-art, text2image
Introduction-to-Deep-Learning-and-Neural-Networks-Course
Code snippets and solutions for the Introduction to Deep Learning and Neural Networks Course hosted in educative.io
Stars: ✭ 33 (-43.1%)
Mutual labels:  transformers, generative-adversarial-network
py-msa-kdenlive
Python script to load a Kdenlive (OSS NLE video editor) project file, and conform the edit on video or numpy arrays.
Stars: ✭ 25 (-56.9%)
Mutual labels:  generative-adversarial-network, generative-art
Awesome-Text-to-Image
A Survey on Text-to-Image Generation/Synthesis.
Stars: ✭ 251 (+332.76%)
Mutual labels:  generative-adversarial-network, text-to-image
keras-3dgan
Keras implementation of 3D Generative Adversarial Network.
Stars: ✭ 20 (-65.52%)
Mutual labels:  generative-adversarial-network
DLSS
Deep Learning Super Sampling with Deep Convolutional Generative Adversarial Networks.
Stars: ✭ 88 (+51.72%)
Mutual labels:  generative-adversarial-network
transganformer
Implementation of TransGanFormer, an all-attention GAN that combines the finding from the recent GanFormer and TransGan paper
Stars: ✭ 137 (+136.21%)
Mutual labels:  transformers
DeepFlow
Pytorch implementation of "DeepFlow: History Matching in the Space of Deep Generative Models"
Stars: ✭ 24 (-58.62%)
Mutual labels:  generative-adversarial-network

VQGAN-CLIP-Docker

About

Zero-Shot Text-to-Image Generation VQGAN+CLIP Dockerized

This is a stripped and minimal dependencies repository for running locally or in production VQGAN+CLIP.

For a Google Colab notebook see the original repository.

Samples

Setup

Clone this repository and cd inside.

git clone https://github.com/kcosta42/VQGAN-CLIP-Docker.git
cd VQGAN-CLIP-Docker

You can download a pretrained VQGAN model and put it in the ./models folder.

Dataset Link Config
ImageNet (f=16), 16384 vqgan_imagenet_f16_16384.ckpt ./configs/models/vqgan_imagenet_f16_16384.json
ImageNet (f=16), 1024 vqgan_imagenet_f16_1024.ckpt ./configs/models/vqgan_imagenet_f16_1024.json
FacesHQ (f=16) vqgan_faceshq_f16_1024.ckpt ./configs/models/vqgan_faceshq_f16_1024.json
COCO-Stuff (f=16) vqgan_coco_f16_8192.ckpt ./configs/models/vqgan_coco_f16_8192.json

For GPU capability, make sure you have CUDA installed on your system (tested with CUDA 11.1+).

  • 6 GB of VRAM is required to generate 256x256 images.
  • 11 GB of VRAM is required to generate 512x512 images.
  • 24 GB of VRAM is required to generate 1024x1024 images. (Untested)

Local

Install the Python requirements

python3 -m pip install -r requirements.txt

To know if you can run this on your GPU, the following command must return True.

python3 -c "import torch; print(torch.cuda.is_available());"

Docker

Make sure you have docker and docker-compose v1.28.0+ installed. nvidia-docker is needed if you want to run this on your GPU through Docker.

A Makefile is provided for ease of use.

make build  # Build the docker image

Usage

Inference

Two configuration files are provided ./configs/local.json and ./configs/docker.json. They are ready to go, but you may want to edit them to meet your need. Check the Configuration section to understand each field.

By default, the resulting generations can be found in the ./outputs folder.

GPU

To run locally:

python3 -m scripts.generate -c ./configs/local.json

To run on docker:

make generate

CPU

To run locally:

DEVICE=cpu python3 -m scripts.generate -c ./configs/local.json

To run on docker:

make generate-cpu

Configuration

Argument Type Descriptions
prompts List[str] Text prompts
image_prompts List[FilePath] Image prompts / target image path
max_iterations int Number of iterations
save_freq int Save image iterations
size [int, int] Image size (width height)
pixelart [int, int] Pixelart image size (width height) (Optional, remove option to disable)
init_image FilePath Initial image
init_noise str Initial noise image ["gradient","pixels","fractal"]
init_weight float Initial weight
mse_decay_rate int Slowly decrease the MSE Loss each specified iterations until it reach about 0
output_dir FilePath Path to output directory
models_dir FilePath Path to models cache directory
clip_model FilePath CLIP model path or name
vqgan_checkpoint FilePath VQGAN checkpoint path
vqgan_config FilePath VQGAN config path
noise_prompt_seeds List[int] Noise prompt seeds
noise_prompt_weights List[float] Noise prompt weights
step_size float Learning rate
cutn int Number of cuts
cut_pow float Cut power
seed int Seed (-1 for random seed)
optimizer str Optimiser ["Adam","AdamW","Adagrad","Adamax","DiffGrad","AdamP","RAdam"]
nwarm_restarts int Number of time the learning rate is reseted (-1 to disable LR decay)
augments List[str] Enabled augments ["Ji","Sh","Gn","Pe","Ro","Af","Et","Ts","Cr","Er","Re","Hf"]

Training

These are instructions to train a new VQGAN model. You can also finetunes the pretrained models but you may need to tweak the training script.

Two models configuration files are provided ./configs/models/vqgan_custom.json and ./configs/models/vqgan_custom_docker.json. They are ready to go, but you may want to edit them to meet your need. Check the Model Configuration to understand each field.

By default, the models are saved in the ./models/checkpoints folder.

Dataset

Put your image in a folder inside the data directory (./data by default).

The dataset must be structured as follow:

./data/
├── class_x/
│   ├── xxx.png
│   ├── xxy.jpg
│   └── ...
│       └── xxz.ppm
└── class_y/
    ├── 123.bmp
    ├── nsdf3.tif
    └── ...
    └── asd932_.webp

GPU

To run locally:

python3 -m scripts.train -c ./configs/models/vqgan_custom.json

To run on docker:

make train

CPU

To run locally:

DEVICE=cpu python3 -m scripts.train -c ./configs/models/vqgan_custom.json

To run on docker:

make train-cpu

Model Configuration

Argument Type Descriptions
base_learning_rate float Initial Learning rate
batch_size int Batch size (Adjust based on your GPU capability)
epochs int Maximum number of epoch
output_dir FilePath Path to directory where to save training images
models_dir FilePath Path to directory where to save the model
data_dir FilePath Path to data directory
seed int Seed (-1 for random seed)
resume_checkpoint FilePath Path to pretrained model

Infos

  • Let the Generator train without the Discriminator for a few epochs (~3-5 epochs for ImageNet), then enable the Discriminator.
    The variable lossconfig.params.disc_start correspond to the number of global step (ie. batch iterations) before enabling the Discriminator.
  • Once enabled, the Discriminator loss will stagnate around ~1.0, this is a normal behaviour. The loss will decrease in later epochs. (It can take a very long time).
  • If you've enabled the Discriminator too soon, the Generator will take a lot more time to train.
  • Basically there is no rules for the number of epochs. If your dataset is large enough, there is no risk of overfitting. So the more you train, the better.

Acknowledgments

VQGAN+CLIP

Taming Transformers

CLIP

DALLE-PyTorch

Citations

@misc{unpublished2021clip,
    title  = {CLIP: Connecting Text and Images},
    author = {Alec Radford, Ilya Sutskever, Jong Wook Kim, Gretchen Krueger, Sandhini Agarwal},
    year   = {2021}
}
@misc{esser2020taming,
      title={Taming Transformers for High-Resolution Image Synthesis},
      author={Patrick Esser and Robin Rombach and Björn Ommer},
      year={2020},
      eprint={2012.09841},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}
@misc{ramesh2021zeroshot,
    title   = {Zero-Shot Text-to-Image Generation},
    author  = {Aditya Ramesh and Mikhail Pavlov and Gabriel Goh and Scott Gray and Chelsea Voss and Alec Radford and Mark Chen and Ilya Sutskever},
    year    = {2021},
    eprint  = {2102.12092},
    archivePrefix = {arXiv},
    primaryClass = {cs.CV}
}
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].