All Projects → cmu-mlsp → Reconstructing_faces_from_voices

cmu-mlsp / Reconstructing_faces_from_voices

Licence: gpl-3.0
An example of the paper "reconstructing faces from voices"

Programming Languages

python
139335 projects - #7 most used programming language

Labels

Projects that are alternatives of or similar to Reconstructing faces from voices

Segan
Speech Enhancement Generative Adversarial Network in TensorFlow
Stars: ✭ 661 (+420.47%)
Mutual labels:  gan, speech
hifigan-denoiser
HiFi-GAN: High Fidelity Denoising and Dereverberation Based on Speech Deep Features in Adversarial Networks
Stars: ✭ 88 (-30.71%)
Mutual labels:  speech, gan
Tfg Voice Conversion
Deep Learning-based Voice Conversion system
Stars: ✭ 115 (-9.45%)
Mutual labels:  speech
Tensorflow Mnist Cgan Cdcgan
Tensorflow implementation of conditional Generative Adversarial Networks (cGAN) and conditional Deep Convolutional Adversarial Networks (cDCGAN) for MANIST dataset.
Stars: ✭ 122 (-3.94%)
Mutual labels:  gan
Tts
Text-to-Speech for Arduino
Stars: ✭ 118 (-7.09%)
Mutual labels:  speech
Msg Gan V1
MSG-GAN: Multi-Scale Gradients GAN (Architecture inspired from ProGAN but doesn't use layer-wise growing)
Stars: ✭ 116 (-8.66%)
Mutual labels:  gan
Generate to adapt
Implementation of "Generate To Adapt: Aligning Domains using Generative Adversarial Networks"
Stars: ✭ 120 (-5.51%)
Mutual labels:  gan
Sketch To Art
🖼 Create artwork from your casual sketch with GAN and style transfer
Stars: ✭ 115 (-9.45%)
Mutual labels:  gan
Kaldi
kaldi-asr/kaldi is the official location of the Kaldi project.
Stars: ✭ 11,151 (+8680.31%)
Mutual labels:  speech
Pi Rec
🔥 PI-REC: Progressive Image Reconstruction Network With Edge and Color Domain. 🔥 图像翻译,条件GAN,AI绘画
Stars: ✭ 1,619 (+1174.8%)
Mutual labels:  gan
Code Switching Papers
A curated list of research papers and resources on code-switching
Stars: ✭ 122 (-3.94%)
Mutual labels:  speech
Speech And Text Unity Ios Android
Speed to text in Unity iOS use Native Speech Recognition
Stars: ✭ 117 (-7.87%)
Mutual labels:  speech
Vae Gan Tensorflow
Tensorflow code of "autoencoding beyond pixels using a learned similarity metric"
Stars: ✭ 116 (-8.66%)
Mutual labels:  gan
Capsule Gan
Code for my Master thesis on "Capsule Architecture as a Discriminator in Generative Adversarial Networks".
Stars: ✭ 120 (-5.51%)
Mutual labels:  gan
Impersonator
PyTorch implementation of our ICCV 2019 paper: Liquid Warping GAN: A Unified Framework for Human Motion Imitation, Appearance Transfer and Novel View Synthesis
Stars: ✭ 1,605 (+1163.78%)
Mutual labels:  gan
Mlds2018spring
Machine Learning and having it Deep and Structured (MLDS) in 2018 spring
Stars: ✭ 124 (-2.36%)
Mutual labels:  gan
Hccg Cyclegan
Handwritten Chinese Characters Generation
Stars: ✭ 115 (-9.45%)
Mutual labels:  gan
O Gan
O-GAN: Extremely Concise Approach for Auto-Encoding Generative Adversarial Networks
Stars: ✭ 117 (-7.87%)
Mutual labels:  gan
Nucleisegmentation
cGAN-based Multi Organ Nuclei Segmentation
Stars: ✭ 120 (-5.51%)
Mutual labels:  gan
Cyclegan
Software that can generate photos from paintings, turn horses into zebras, perform style transfer, and more.
Stars: ✭ 10,933 (+8508.66%)
Mutual labels:  gan

Reconstructing faces from voices

Implementation of Reconstructing faces from voices paper

Yandong Wen, Rita Singh, and Bhiksha Raj

Machine Learning for Signal Processing Group

Carnegie Mellon University

Requirements

This implementation is based on Python 3.7 and Pytorch 1.1.

We recommend you use conda to install the dependencies. All the requirements are found in requirements.txt. Run the following command to create a new conda environment using all the dependencies.

$ ./install.sh

After you run the above script, you need to activate the environment where all the packages had been installed. The environment is called voice2face and can be run by:

$ source activate voice2face

NOTE: If you get an error complaining about "webrtcvad" not being found, then you need to make sure the pip in your PATH is the one found inside your environment. This could happen if you have multiple installations of pip (inside/outside environment).

Processed data

The following are the processed training data we used for this paper. Please feel free to download them.

Voice data (log mel-spectrograms): google drive

Face data (aligned face images): google drive

Once downloaded, update variables voice_dir and face_dir with the corresponding paths.

Configurations

See config.py on how to change configurations.

Train

We provide pretrained models including a voice embedding network and a trained generator in pretrained_models/. Or you can train your own generator by running the training script

$ python gan_train.py

The trained model is models/generator.pth

Test

We provide some examples of generated faces (in data/example_data/) using the model in pretrained_model/. If you want to generate faces for your own voice recordings using the trained model, specify the test_data (as the folder containing voice recordings) and model_path (as the path of the generator) variables in config.py and run:

$ python gan_test.py

Results will be in test_data folder. For each voice recording named <filename>.wav, we generate a face image named <filename>.png.

Note: Now we only support the voice recording with one channel at 16K sample rate. The file names of the voices and faces starting with A-E are validation or testing set, while those starting with F-Z are training set.

Citation

@article{wen2019reconstructing,
  title={Reconstructing faces from voices},
  author={Yandong Wen, Rita Singh, Bhiksha Raj},
  journal={arXiv preprint arXiv:1905.10604},
  year={2019}
}

Contribution

We welcome contributions from everyone and always working to make it better. Please give us a pull request or raise an issue and we will be happy to help.

License

This repository is licensed under GNU GPL-3.0. Please refer to LICENSE.md.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].