All Projects → choyingw → Voice2Mesh

choyingw / Voice2Mesh

Licence: MIT license
CVPR 2022: Cross-Modal Perceptionist: Can Face Geometry be Gleaned from Voices?

Programming Languages

python
139335 projects - #7 most used programming language
C++
36643 projects - #6 most used programming language

Projects that are alternatives of or similar to Voice2Mesh

Wavenet vocoder
WaveNet vocoder
Stars: ✭ 1,926 (+2774.63%)
Mutual labels:  speech, speech-synthesis
Wavegrad
Implementation of Google Brain's WaveGrad high-fidelity vocoder (paper: https://arxiv.org/pdf/2009.00713.pdf). First implementation on GitHub.
Stars: ✭ 245 (+265.67%)
Mutual labels:  speech, speech-synthesis
Lingvo
Lingvo
Stars: ✭ 2,361 (+3423.88%)
Mutual labels:  speech, speech-synthesis
Durian
Implementation of "Duration Informed Attention Network for Multimodal Synthesis" (https://arxiv.org/pdf/1909.01700.pdf) paper.
Stars: ✭ 111 (+65.67%)
Mutual labels:  speech, speech-synthesis
StyleSpeech
Official implementation of Meta-StyleSpeech and StyleSpeech
Stars: ✭ 161 (+140.3%)
Mutual labels:  speech, speech-synthesis
Diffwave
DiffWave is a fast, high-quality neural vocoder and waveform synthesizer.
Stars: ✭ 139 (+107.46%)
Mutual labels:  speech, speech-synthesis
Tacotron pytorch
PyTorch implementation of Tacotron speech synthesis model.
Stars: ✭ 242 (+261.19%)
Mutual labels:  speech, speech-synthesis
Voice Builder
An opensource text-to-speech (TTS) voice building tool
Stars: ✭ 362 (+440.3%)
Mutual labels:  speech, speech-synthesis
TFGAN
TFGAN: Time and Frequency Domain Based Generative Adversarial Network for High-fidelity Speech Synthesis
Stars: ✭ 65 (-2.99%)
Mutual labels:  speech, speech-synthesis
IMS-Toucan
Text-to-Speech Toolkit of the Speech and Language Technologies Group at the University of Stuttgart. Objectives of the development are simplicity, modularity, controllability and multilinguality.
Stars: ✭ 295 (+340.3%)
Mutual labels:  speech, speech-synthesis
Wsay
Windows "say"
Stars: ✭ 36 (-46.27%)
Mutual labels:  speech, speech-synthesis
AdaSpeech
AdaSpeech: Adaptive Text to Speech for Custom Voice
Stars: ✭ 108 (+61.19%)
Mutual labels:  speech, speech-synthesis
Lightspeech
LightSpeech: Lightweight and Fast Text to Speech with Neural Architecture Search
Stars: ✭ 31 (-53.73%)
Mutual labels:  speech, speech-synthesis
Wavegrad
A fast, high-quality neural vocoder.
Stars: ✭ 138 (+105.97%)
Mutual labels:  speech, speech-synthesis
Java Speech Api
The J.A.R.V.I.S. Speech API is designed to be simple and efficient, using the speech engines created by Google to provide functionality for parts of the API. Essentially, it is an API written in Java, including a recognizer, synthesizer, and a microphone capture utility. The project uses Google services for the synthesizer and recognizer. While this requires an Internet connection, it provides a complete, modern, and fully functional speech API in Java.
Stars: ✭ 490 (+631.34%)
Mutual labels:  speech, speech-synthesis
Neural Voice Cloning With Few Samples
Implementation of Neural Voice Cloning with Few Samples Research Paper by Baidu
Stars: ✭ 211 (+214.93%)
Mutual labels:  speech, speech-synthesis
editts
Official implementation of EdiTTS: Score-based Editing for Controllable Text-to-Speech
Stars: ✭ 74 (+10.45%)
Mutual labels:  speech, speech-synthesis
Pysptk
A python wrapper for Speech Signal Processing Toolkit (SPTK).
Stars: ✭ 297 (+343.28%)
Mutual labels:  speech, speech-synthesis
idear
🎙️ Handsfree Audio Development Interface
Stars: ✭ 84 (+25.37%)
Mutual labels:  speech, speech-synthesis
Zero-Shot-TTS
Unofficial Implementation of Zero-Shot Text-to-Speech for Text-Based Insertion in Audio Narration
Stars: ✭ 33 (-50.75%)
Mutual labels:  speech, speech-synthesis

Cross-Modal Perceptionist

CVPR 2022 "Cross-Modal Perceptionist: Can Face Geometry be Gleaned from Voices?"

Cho-Ying Wu, Chin-Cheng Hsu, Ulrich Neumann, University of Southern California

[Paper] [Project page] [Voxceleb-3D Data]

[TODO]:

  1. Direct voice input demo
  2. Evaluation code
  3. Training code

We study the cross-modal learning and analyze the correlation between voices and 3D face geometry. Unlike previous methods for studying this correlation between voices and faces and only work on the 2D domain, we choose 3D representation that can better validate the supportive evidence from the physiology of the correlation between voices and skeletal and articulator structures, which potentially affect facial geometry.

Comparison of recovered 3D face meshes with the baseline.

Consistency for the same identity using different utterances.

Demo

We test on Ubuntu 16.04 LTS, NVIDIA 2080 Ti (only GPU is supported), and use anaconda for installing packages

Install packages

  1. conda create --name CMP python=3.8

  2. Install PyTorch compatible to your computer, we test on PyTorch v1.9 (should be compatible with other 1.0+ versions)

  3. install other dependency: opencv-python, scipy, PIL, Cython

    Or use the environment.yml we provide instead:

    • conda env create -f environment.yml
    • conda activate CMP
  4. Build the rendering toolkit (by c++ and cython) for overlapping 3D meshes on images with configurations

    cd Sim3DR
    bash build_sim3dr.sh
    cd ..
    

Download pretrained models and 3DMM configuration data

  1. Download from [here] (~160M) and unzip under the root folder

Run

  1. python demo.py (This will fetch the preprocessed MFCC and use them as network inputs)
  2. Results will be generated under data/results/ (pre-generated references are under data/results_reference)

More preprocessed MFCC and 3D mesh (3DMM params) pairs can be downloaded: [Voxceleb-3D Data].

Citation

If you find our work useful, please consider to cite us.

@inproceedings{wu2022cross,
title={Cross-Modal Perceptionist: Can Face Geometry be Gleaned from Voices?},
author={Wu, Cho-Ying and Hsu, Chin-Cheng and Neumann, Ulrich},
booktitle={CVPR},
year={2022}
}

This project is developed on [SynergyNet], [3DDFA-V2] and [reconstruction-faces-from-voice]

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].