Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → choyingw → Voice2Mesh

choyingw / Voice2Mesh

Licence: MIT license

CVPR 2022: Cross-Modal Perceptionist: Can Face Geometry be Gleaned from Voices?

Programming Languages

139335 projects - #7 most used programming language

36643 projects - #6 most used programming language

Labels

machine-learning computer-vision deep-learning speech pytorch speech-synthesis biometrics cognitive-science 3d cvpr 3d-models 3dmm speech-to-face cross-modal-learning cvpr2022

Projects that are alternatives of or similar to Voice2Mesh

Wavenet vocoder

WaveNet vocoder

Stars: ✭ 1,926 (+2774.63%)

Mutual labels: speech, speech-synthesis

Implementation of Google Brain's WaveGrad high-fidelity vocoder (paper: https://arxiv.org/pdf/2009.00713.pdf). First implementation on GitHub.

Stars: ✭ 245 (+265.67%)

Mutual labels: speech, speech-synthesis

Lingvo

Stars: ✭ 2,361 (+3423.88%)

Mutual labels: speech, speech-synthesis

Implementation of "Duration Informed Attention Network for Multimodal Synthesis" (https://arxiv.org/pdf/1909.01700.pdf) paper.

Stars: ✭ 111 (+65.67%)

Mutual labels: speech, speech-synthesis

Official implementation of Meta-StyleSpeech and StyleSpeech

Stars: ✭ 161 (+140.3%)

Mutual labels: speech, speech-synthesis

DiffWave is a fast, high-quality neural vocoder and waveform synthesizer.

Stars: ✭ 139 (+107.46%)

Mutual labels: speech, speech-synthesis

Tacotron pytorch

PyTorch implementation of Tacotron speech synthesis model.

Stars: ✭ 242 (+261.19%)

Mutual labels: speech, speech-synthesis

An opensource text-to-speech (TTS) voice building tool

Stars: ✭ 362 (+440.3%)

Mutual labels: speech, speech-synthesis

TFGAN: Time and Frequency Domain Based Generative Adversarial Network for High-fidelity Speech Synthesis

Stars: ✭ 65 (-2.99%)

Mutual labels: speech, speech-synthesis

Text-to-Speech Toolkit of the Speech and Language Technologies Group at the University of Stuttgart. Objectives of the development are simplicity, modularity, controllability and multilinguality.

Stars: ✭ 295 (+340.3%)

Mutual labels: speech, speech-synthesis

Windows "say"

Stars: ✭ 36 (-46.27%)

Mutual labels: speech, speech-synthesis

AdaSpeech: Adaptive Text to Speech for Custom Voice

Stars: ✭ 108 (+61.19%)

Mutual labels: speech, speech-synthesis

LightSpeech: Lightweight and Fast Text to Speech with Neural Architecture Search

Stars: ✭ 31 (-53.73%)

Mutual labels: speech, speech-synthesis

A fast, high-quality neural vocoder.

Stars: ✭ 138 (+105.97%)

Mutual labels: speech, speech-synthesis

Java Speech Api

The J.A.R.V.I.S. Speech API is designed to be simple and efficient, using the speech engines created by Google to provide functionality for parts of the API. Essentially, it is an API written in Java, including a recognizer, synthesizer, and a microphone capture utility. The project uses Google services for the synthesizer and recognizer. While this requires an Internet connection, it provides a complete, modern, and fully functional speech API in Java.

Stars: ✭ 490 (+631.34%)

Mutual labels: speech, speech-synthesis

Neural Voice Cloning With Few Samples

Implementation of Neural Voice Cloning with Few Samples Research Paper by Baidu

Stars: ✭ 211 (+214.93%)

Mutual labels: speech, speech-synthesis

Official implementation of EdiTTS: Score-based Editing for Controllable Text-to-Speech

Stars: ✭ 74 (+10.45%)

Mutual labels: speech, speech-synthesis

A python wrapper for Speech Signal Processing Toolkit (SPTK).

Stars: ✭ 297 (+343.28%)

Mutual labels: speech, speech-synthesis

🎙️ Handsfree Audio Development Interface

Stars: ✭ 84 (+25.37%)

Mutual labels: speech, speech-synthesis

Unofficial Implementation of Zero-Shot Text-to-Speech for Text-Based Insertion in Audio Narration

Stars: ✭ 33 (-50.75%)

Mutual labels: speech, speech-synthesis

View All Similar Projects ➔

Cross-Modal Perceptionist

CVPR 2022 "Cross-Modal Perceptionist: Can Face Geometry be Gleaned from Voices?"

Cho-Ying Wu, Chin-Cheng Hsu, Ulrich Neumann, University of Southern California

[Paper] [Project page] [Voxceleb-3D Data]

[TODO]:

Direct voice input demo
Evaluation code
Training code

We study the cross-modal learning and analyze the correlation between voices and 3D face geometry. Unlike previous methods for studying this correlation between voices and faces and only work on the 2D domain, we choose 3D representation that can better validate the supportive evidence from the physiology of the correlation between voices and skeletal and articulator structures, which potentially affect facial geometry.

Comparison of recovered 3D face meshes with the baseline.

Consistency for the same identity using different utterances.

Demo

We test on Ubuntu 16.04 LTS, NVIDIA 2080 Ti (only GPU is supported), and use anaconda for installing packages

Install packages

conda create --name CMP python=3.8
Install PyTorch compatible to your computer, we test on PyTorch v1.9 (should be compatible with other 1.0+ versions)
install other dependency: opencv-python, scipy, PIL, Cython

Or use the environment.yml we provide instead:
- conda env create -f environment.yml
- conda activate CMP
Build the rendering toolkit (by c++ and cython) for overlapping 3D meshes on images with configurations
```
cd Sim3DR
bash build_sim3dr.sh
cd ..
```

Download pretrained models and 3DMM configuration data

Download from [here] (~160M) and unzip under the root folder

Run

python demo.py (This will fetch the preprocessed MFCC and use them as network inputs)
Results will be generated under data/results/ (pre-generated references are under data/results_reference)

More preprocessed MFCC and 3D mesh (3DMM params) pairs can be downloaded: [Voxceleb-3D Data].

Citation

If you find our work useful, please consider to cite us.

@inproceedings{wu2022cross,
title={Cross-Modal Perceptionist: Can Face Geometry be Gleaned from Voices?},
author={Wu, Cho-Ying and Hsu, Chin-Cheng and Neumann, Ulrich},
booktitle={CVPR},
year={2022}
}

This project is developed on [SynergyNet], [3DDFA-V2] and [reconstruction-faces-from-voice]

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 67

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (0) 🔗