All Projects → nerdyrodent → CLIP-Guided-Diffusion

nerdyrodent / CLIP-Guided-Diffusion

Licence: other
Just playing with getting CLIP Guided Diffusion running locally, rather than having to use colab.

Programming Languages

python
139335 projects - #7 most used programming language
shell
77523 projects

Projects that are alternatives of or similar to CLIP-Guided-Diffusion

VQGAN-CLIP-Docker
Zero-Shot Text-to-Image Generation VQGAN+CLIP Dockerized
Stars: ✭ 58 (-82.32%)
Mutual labels:  text-to-image, text2image
feed forward vqgan clip
Feed forward VQGAN-CLIP model, where the goal is to eliminate the need for optimizing the latent space of VQGAN for each input prompt
Stars: ✭ 135 (-58.84%)
Mutual labels:  text-to-image, openai-clip
VQGAN-CLIP
Just playing with getting VQGAN+CLIP running locally, rather than having to use colab.
Stars: ✭ 2,369 (+622.26%)
Mutual labels:  text-to-image, text2image
clip-guided-diffusion
A CLI tool/python module for generating images from text using guided diffusion and CLIP from OpenAI.
Stars: ✭ 260 (-20.73%)
Mutual labels:  text-to-image, openai-clip
universum-contracts
text-to-image generation gems / libraries incl. moonbirds, cyberpunks, coolcats, shiba inu doge, nouns & more
Stars: ✭ 17 (-94.82%)
Mutual labels:  text-to-image
KoDALLE
🇰🇷 Text to Image in Korean
Stars: ✭ 55 (-83.23%)
Mutual labels:  text-to-image
keras-text-to-image
Translate text to image in Keras using GAN and Word2Vec as well as recurrent neural networks
Stars: ✭ 60 (-81.71%)
Mutual labels:  text-to-image
text-to-image
Text to Image Synthesis using Generative Adversarial Networks
Stars: ✭ 72 (-78.05%)
Mutual labels:  text-to-image
text-to-image
Re-implementation of https://github.com/zsdonghao/text-to-image
Stars: ✭ 25 (-92.38%)
Mutual labels:  text-to-image
Dalle Pytorch
Implementation / replication of DALL-E, OpenAI's Text to Image Transformer, in Pytorch
Stars: ✭ 3,661 (+1016.16%)
Mutual labels:  text-to-image
vqgan-clip-app
Local image generation using VQGAN-CLIP or CLIP guided diffusion
Stars: ✭ 94 (-71.34%)
Mutual labels:  text2image
CogView
Text-to-Image generation. The repo for NeurIPS 2021 paper "CogView: Mastering Text-to-Image Generation via Transformers".
Stars: ✭ 708 (+115.85%)
Mutual labels:  text-to-image
text2painting
Convert text into beautiful artistic images
Stars: ✭ 55 (-83.23%)
Mutual labels:  text-to-image
ru-dalle
Generate images from texts. In Russian
Stars: ✭ 1,606 (+389.63%)
Mutual labels:  text-to-image
im2txt2im
I2T2I: Text-to-Image Synthesis with textual data augmentation
Stars: ✭ 29 (-91.16%)
Mutual labels:  text-to-image
text2image-benchmark
Performance comparison of existing GAN based Text To Image algorithms. (GAN-CLS, StackGAN, TAC-GAN)
Stars: ✭ 25 (-92.38%)
Mutual labels:  text2image
Data-Whisperer
An NLP text to vizualization builder for Tableau.
Stars: ✭ 13 (-96.04%)
Mutual labels:  text-to-image
awesome-generative-deep-art
A curated list of generative deep learning tools, works, models, etc. for artistic uses
Stars: ✭ 172 (-47.56%)
Mutual labels:  text2image
Text2Image
The most useful & easy2use PHP library for converting any text into image
Stars: ✭ 29 (-91.16%)
Mutual labels:  text2image
idg
Document image generator
Stars: ✭ 40 (-87.8%)
Mutual labels:  text-to-image

CLIP-Guided-Diffusion

Just playing with getting CLIP Guided Diffusion running locally, rather than having to use colab.

Original colab notebooks by Katherine Crowson (https://github.com/crowsonkb, https://twitter.com/RiversHaveWings):

  • Original 256x256 notebook: Open In Colab

It uses OpenAI's 256x256 unconditional ImageNet diffusion model (https://github.com/openai/guided-diffusion)

  • Original 512x512 notebook: Open In Colab

It uses a 512x512 unconditional ImageNet diffusion model fine-tuned from OpenAI's 512x512 class-conditional ImageNet diffusion model (https://github.com/openai/guided-diffusion)

Together with CLIP (https://github.com/openai/CLIP), they connect text prompts with images.

Either the 256 or 512 model can be used here (by setting --output_size to either 256 or 512)

Some example images:

"A woman standing in a park":

"An alien landscape":

"A painting of a man":

*images enhanced with Real-ESRGAN

You may also be interested in VQGAN-CLIP

Environment

  • Ubuntu 20.04 (Windows untested but should work)
  • Anaconda
  • Nvidia RTX 3090

Typical VRAM requirments:

  • 256 defaults: 10 GB
  • 512 defaults: 18 GB

Set up

This example uses Anaconda to manage virtual Python environments.

Create a new virtual Python environment for CLIP-Guided-Diffusion:

conda create --name cgd python=3.9
conda activate cgd

Download and change directory:

git clone https://github.com/nerdyrodent/CLIP-Guided-Diffusion.git
cd CLIP-Guided-Diffusion

Run the setup file:

./setup.sh

Or if you want to run the commands manually:

# Install dependencies

pip3 install torch==1.9.0+cu111 torchvision==0.10.0+cu111 torchaudio==0.9.0 -f https://download.pytorch.org/whl/torch_stable.html
git clone https://github.com/openai/CLIP
git clone https://github.com/crowsonkb/guided-diffusion
pip install -e ./CLIP
pip install -e ./guided-diffusion
pip install lpips matplotlib

# Download the diffusion models

curl -OL 'https://the-eye.eu/public/AI/models/512x512_diffusion_unconditional_ImageNet/512x512_diffusion_uncond_finetune_008100.pt'
curl -OL 'https://openaipublic.blob.core.windows.net/diffusion/jul-2021/256x256_diffusion_uncond.pt'

Run

The simplest way to run is just to pass in your text prompt. For example:

python generate_diffuse.py -p "A painting of an apple"

Multiple prompts

Text and image prompts can be split using the pipe symbol in order to allow multiple prompts. You can also use a colon followed by a number to set a weight for that prompt. For example:

python generate_diffuse.py -p "A painting of an apple:1.5|a surreal painting of a weird apple:0.5"

Other options

There are a variety of other options to play with. Use help to display them:

python generate_diffuse.py -h
usage: generate_diffuse.py [-h] [-p PROMPTS] [-ip IMAGE_PROMPTS] [-ii INIT_IMAGE]
[-st SKIP_TIMESTEPS] [-is INIT_SCALE] [-m CLIP_MODEL] [-t TIMESTEPS]
[-ds DIFFUSION_STEPS] [-se SAVE_EVERY] [-bs BATCH_SIZE] [-nb N_BATCHES] [-cuts CUTN]
[-cutb CUTN_BATCHES] [-cutp CUT_POW] [-cgs CLIP_GUIDANCE_SCALE]
[-tvs TV_SCALE] [-rgs RANGE_SCALE] [-os IMAGE_SIZE] [-s SEED] [-o OUTPUT] [-nfp] [-pl]

init_image

  • 'skip_timesteps' needs to be between approx. 200 and 500 when using an init image.
  • 'init_scale' enhances the effect of the init image, a good value is 1000.

Timesteps

The number of timesteps (or the number from one of ddim25, ddim50, ddim150, ddim250, ddim500, ddim1000) must divide exactly into diffusion_steps.

image guidance

  • 'clip_guidance_scale' Controls how much the image should look like the prompt.
  • 'tv_scale' Controls the smoothness of the final output.
  • 'range_scale' Controls how far out of range RGB values are allowed to be.

Examples using a number of options:

python generate_diffuse.py -p "An amazing fractal" -os=256 -cgs=1000 -tvs=50 -rgs=50 -cuts=16 -cutb=4 -t=200 -se=200 -m=ViT-B/32 -o=my_fractal.png

python generate_diffuse.py -p "An impressionist painting of a cat:1.75|trending on artstation:0.25" -cgs=500 -tvs=55 -rgs=50 -cuts=16 -cutb=2 -t=100 -ds=2000 -m=ViT-B/32 -pl -o=cat_100.png

(Funny looking cat, but hey!)

Videos

Using the -vid option saves the diffusion steps and makes a video. The steps can also be upscaled if you have the portable version of https://github.com/xinntao/Real-ESRGAN installed locally, and opt to do so.

Other repos

You may also be interested in https://github.com/afiaka87/clip-guided-diffusion

For upscaling images, try https://github.com/xinntao/Real-ESRGAN

Citations

@misc{unpublished2021clip,
    title  = {CLIP: Connecting Text and Images},
    author = {Alec Radford, Ilya Sutskever, Jong Wook Kim, Gretchen Krueger, Sandhini Agarwal},
    year   = {2021}
}
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].