Cross-Modal Perceptionist
CVPR 2022 "Cross-Modal Perceptionist: Can Face Geometry be Gleaned from Voices?"
Cho-Ying Wu, Chin-Cheng Hsu, Ulrich Neumann, University of Southern California
[Paper] [Project page] [Voxceleb-3D Data]
[TODO]:
- Direct voice input demo
- Evaluation code
- Training code
We study the cross-modal learning and analyze the correlation between voices and 3D face geometry. Unlike previous methods for studying this correlation between voices and faces and only work on the 2D domain, we choose 3D representation that can better validate the supportive evidence from the physiology of the correlation between voices and skeletal and articulator structures, which potentially affect facial geometry.
Comparison of recovered 3D face meshes with the baseline.
Consistency for the same identity using different utterances.
Demo
We test on Ubuntu 16.04 LTS, NVIDIA 2080 Ti (only GPU is supported), and use anaconda for installing packages
Install packages
-
conda create --name CMP python=3.8
-
Install PyTorch compatible to your computer, we test on PyTorch v1.9 (should be compatible with other 1.0+ versions)
-
install other dependency: opencv-python, scipy, PIL, Cython
Or use the environment.yml we provide instead:
conda env create -f environment.yml
conda activate CMP
-
Build the rendering toolkit (by c++ and cython) for overlapping 3D meshes on images with configurations
cd Sim3DR bash build_sim3dr.sh cd ..
Download pretrained models and 3DMM configuration data
- Download from [here] (~160M) and unzip under the root folder
Run
python demo.py
(This will fetch the preprocessed MFCC and use them as network inputs)- Results will be generated under
data/results/
(pre-generated references are underdata/results_reference
)
More preprocessed MFCC and 3D mesh (3DMM params) pairs can be downloaded: [Voxceleb-3D Data].
Citation
If you find our work useful, please consider to cite us.
@inproceedings{wu2022cross,
title={Cross-Modal Perceptionist: Can Face Geometry be Gleaned from Voices?},
author={Wu, Cho-Ying and Hsu, Chin-Cheng and Neumann, Ulrich},
booktitle={CVPR},
year={2022}
}
This project is developed on [SynergyNet], [3DDFA-V2] and [reconstruction-faces-from-voice]