Multi-Modal-CelebA-HQ

Multi-Modal-CelebA-HQ is a large-scale face image dataset that has 30,000 high-resolution face images selected from the CelebA dataset by following CelebA-HQ. Each image has high-quality segmentation mask, sketch, descriptive text, and image with transparent background.

Multi-Modal-CelebA-HQ can be used to train and evaluate algorithms of text-to-image generation, text-guided image manipulation, sketch-to-image generation, image caption, and VQA. This dataset is proposed and used in TediGAN.

Data Generation

The textual descriptions are generated using probabilistic context-free grammar (PCFG) based on the given attributes. We create ten unique single sentence descriptions per image to obtain more training data following the format of the popular CUB dataset and COCO dataset. The previous study proposed CelebTD-HQ, but it is not publicly available.
For label, we use CelebAMask-HQ dataset, which contains manually-annotated semantic mask of facial attributes corresponding to CelebA-HQ.
For sketches, we follow the same data generation pipeline as in DeepFaceDrawing. We first apply Photocopy filter in Photoshop to extract edges, which preserves facial details and introduces excessive noise, then apply the sketch-simplification to get edge maps resembling hand-drawn sketches.
For background removing, we use an open-source tool Rembg and a commercial software removebg. Different backgrounds can be further added using image composition or harmonization methods like DoveNet.

Overview

Note: Upon request, the download links of raw data and annotations have been removed from this repo. Please redirect to their original site for the raw data and email me for the post-processing scripts.

~~All data is hosted on Google Drive~~ (not available).

Path	Size	Files	Format	Description
multi-modal-celeba	~20 GB	420,002		Main folder
├ image	~2 GB	30,000	JPG	images from celeba-hq of size 512×512
├ text	11 MB	30,0000	TXT	10 descriptions of each image in celeba-hq
├ train	347 KB	1	PKL	filenames of training images
├ test	81 KB	1	PKL	filenames of test images

Pretrained Models

We provide the pretrained models of AttnGAN, ControlGAN, DMGAN, DFGAN, and ManiGAN. Please consider citing our paper if you use these pretrained models. Feel free to pull requests if you have any updates. Feel free to pull requests if you have any updates.

Method	FID	LPIPIS	Download
AttnGAN	125.98	0.512	Google Drive
ControlGAN	116.32	0.522	Google Drive
DFGAN	137.60	0.581	Google Drive
DM-GAN	131.05	0.544	Google Drive
TediGAN	106.37	0.456	Google Drive

The pretrained model of ManiGAN is here. The training scripts and pretrained models on faces of sketch-to-to-image and label-to-image can be found here. Those with problems accessing Google Drive can refer to an alternative link at Baidu Cloud (code: b273) for the dataset and pretrained models.

Related Works

CelebA dataset:
Ziwei Liu, Ping Luo, Xiaogang Wang and Xiaoou Tang, "Deep Learning Face Attributes in the Wild", in IEEE International Conference on Computer Vision (ICCV), 2015
CelebA-HQ was collected from CelebA and further post-processed by the following paper :
Karras et. al., "Progressive Growing of GANs for Improved Quality, Stability, and Variation", in Internation Conference on Reoresentation Learning (ICLR), 2018
CelebAMask-HQ manually-annotated masks with the size of 512 x 512 and 19 classes including all facial components and accessories such as skin, nose, eyes, eyebrows, ears, mouth, lip, hair, hat, eyeglass, earring, necklace, neck, and cloth. It was collected by the following paper :
Lee et. al., "MaskGAN: Towards Diverse and Interactive Facial Image Manipulation", in Computer Vision and Pattern Recognition (CVPR), 2020

License and Citation

If you find the dataset and pretrained models helpful for your research, please consider to cite:

@inproceedings{xia2021tedigan,
  title={TediGAN: Text-Guided Diverse Face Image Generation and Manipulation},
  author={Xia, Weihao and Yang, Yujiu and Xue, Jing-Hao and Wu, Baoyuan},
  booktitle={IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
  year={2021}
}

@article{xia2021open,
  title={Towards Open-World Text-Guided Face Image Generation and Manipulation},
  author={Xia, Weihao and Yang, Yujiu and Xue, Jing-Hao and Wu, Baoyuan},
  journal={arxiv preprint arxiv: 2104.08910},
  year={2021}
}

@inproceedings{karras2017progressive,
  title={Progressive growing of gans for improved quality, stability, and variation},
  author={Karras, Tero and Aila, Timo and Laine, Samuli and Lehtinen, Jaakko},
  journal={International Conference on Learning Representations (ICLR)},
  year={2018}
}

@inproceedings{liu2015faceattributes,
 title = {Deep Learning Face Attributes in the Wild},
 author = {Liu, Ziwei and Luo, Ping and Wang, Xiaogang and Tang, Xiaoou},
 booktitle = {Proceedings of International Conference on Computer Vision (ICCV)},
 year = {2015} 
}

If you use the labels, please cite:

@inproceedings{CelebAMask-HQ,
  title={MaskGAN: Towards Diverse and Interactive Facial Image Manipulation},
  author={Lee, Cheng-Han and Liu, Ziwei and Wu, Lingyun and Luo, Ping},
  booktitle={IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
  year={2020}
}

The use of this software is RESTRICTED to non-commercial research and educational purposes. The license is the same as in CelebAMask-HQ.

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

IIGROUP / MM-CelebA-HQ-Dataset