All Projects → Wendison → VQMIVC

Wendison / VQMIVC

Licence: MIT license
Official implementation of VQMIVC: One-shot (any-to-any) Voice Conversion @ Interspeech 2021 + Online playing demo!

Programming Languages

Jupyter Notebook
11667 projects
python
139335 projects - #7 most used programming language
shell
77523 projects
perl
6916 projects

Projects that are alternatives of or similar to VQMIVC

Phomeme
Simple sentence mixing tool (work in progress)
Stars: ✭ 18 (-93.53%)
Mutual labels:  speech, voice-conversion
ppg-vc
PPG-Based Voice Conversion
Stars: ✭ 154 (-44.6%)
Mutual labels:  voice-conversion, one-shot
CVC
CVC: Contrastive Learning for Non-parallel Voice Conversion (INTERSPEECH 2021, in PyTorch)
Stars: ✭ 45 (-83.81%)
Mutual labels:  speech, voice-conversion
Shifter
Pitch shifter using WSOLA and resampling implemented by Python3
Stars: ✭ 22 (-92.09%)
Mutual labels:  speech, voice-conversion
JD-NMF
Joint Dictionary Learning-based Non-Negative Matrix Factorization for Voice Conversion (TBME 2016)
Stars: ✭ 20 (-92.81%)
Mutual labels:  speech, voice-conversion
Tts Cube
End-2-end speech synthesis with recurrent neural networks
Stars: ✭ 213 (-23.38%)
Mutual labels:  speech
Tacotron pytorch
PyTorch implementation of Tacotron speech synthesis model.
Stars: ✭ 242 (-12.95%)
Mutual labels:  speech
Edgedict
Working online speech recognition based on RNN Transducer. ( Trained model release available in release )
Stars: ✭ 205 (-26.26%)
Mutual labels:  speech
Esp8266sam
Speech synthesis for ESP8266 using S.A.M. port
Stars: ✭ 199 (-28.42%)
Mutual labels:  speech
Naver-AI-Hackathon-Speech
2019 Clova AI Hackathon : Speech - Rank 12 / Team Kai.Lib
Stars: ✭ 26 (-90.65%)
Mutual labels:  speech
Voice Gender
Gender recognition by voice and speech analysis
Stars: ✭ 248 (-10.79%)
Mutual labels:  speech
Gcc Nmf
Real-time GCC-NMF Blind Speech Separation and Enhancement
Stars: ✭ 231 (-16.91%)
Mutual labels:  speech
Speech Enhancement
Deep learning for audio denoising
Stars: ✭ 207 (-25.54%)
Mutual labels:  speech
Kerasdeepspeech
A Keras CTC implementation of Baidu's DeepSpeech for model experimentation
Stars: ✭ 245 (-11.87%)
Mutual labels:  speech
Neural Voice Cloning With Few Samples
Implementation of Neural Voice Cloning with Few Samples Research Paper by Baidu
Stars: ✭ 211 (-24.1%)
Mutual labels:  speech
lectures-all
Central repository for all lectures on deep learning at UPC ETSETB TelecomBCN.
Stars: ✭ 46 (-83.45%)
Mutual labels:  speech
Timit
The DARPA TIMIT Acoustic-Phonetic Continuous Speech Corpus.
Stars: ✭ 202 (-27.34%)
Mutual labels:  speech
Setk
Tools for Speech Enhancement integrated with Kaldi
Stars: ✭ 227 (-18.35%)
Mutual labels:  speech
Wavegrad
Implementation of Google Brain's WaveGrad high-fidelity vocoder (paper: https://arxiv.org/pdf/2009.00713.pdf). First implementation on GitHub.
Stars: ✭ 245 (-11.87%)
Mutual labels:  speech
Source separation
Deep learning based speech source separation using Pytorch
Stars: ✭ 226 (-18.71%)
Mutual labels:  speech

VQMIVC: Vector Quantization and Mutual Information-Based Unsupervised Speech Representation Disentanglement for One-shot Voice Conversion (Interspeech 2021)

arXiv GitHub Stars download

Run VQMIVC on Replicate

Integrated to Huggingface Spaces with Gradio. See Gradio Web Demo.

Pre-trained models: google-drive or here | Paper demo

This paper proposes a speech representation disentanglement framework for one-shot/any-to-any voice conversion, which performs conversion across arbitrary speakers with only a single target-speaker utterance for reference. Vector quantization with contrastive predictive coding (VQCPC) is used for content encoding and mutual information (MI) is introduced as the correlation metric during training, to achieve proper disentanglement of content, speaker and pitch representations, by reducing their inter-dependencies in an unsupervised manner.

📢 Update

Many thanks to ericguizzo & AK391!

  1. A Replicate demo is provided online, so you can play our pre-trained models there, have fun!
  2. VQMIVC can be trained and tested inside a Docker environment via Cog now.
  3. Gradio Web Demo is available, another online demo!

TODO

  • Add more details on how to use Cog for development

Requirements

Python 3.6 is used, install apex for speeding up training (optional), other requirements are listed in 'requirements.txt':

pip install -r requirements.txt

Quick start with pre-trained models

ParallelWaveGAN is used as the vocoder, so firstly please install ParallelWaveGAN to try the pre-trained models:

python convert_example.py -s {source-wav} -r {reference-wav} -c {converted-wavs-save-path} -m {model-path} 

For example:

python convert_example.py -s test_wavs/p225_038.wav -r test_wavs/p334_047.wav -c converted -m checkpoints/useCSMITrue_useCPMITrue_usePSMITrue_useAmpTrue/VQMIVC-model.ckpt-500.pt 

The converted wav is put in 'converted' directory.

Training and inference:

  • Step1. Data preparation & preprocessing
  1. Put VCTK corpus under directory: 'Dataset/'

  2. Training/testing speakers split & feature (mel+lf0) extraction:

     python preprocess.py
    
  • Step2. model training:
  1. Training with mutual information minimization (MIM):

     python train.py use_CSMI=True use_CPMI=True use_PSMI=True
    
  2. Training without MIM:

     python train.py use_CSMI=False use_CPMI=False use_PSMI=False 
    
  • Step3. model testing:
  1. Put PWG vocoder under directory: 'vocoder/'

  2. Inference with model trained with MIM:

     python convert.py checkpoint=checkpoints/useCSMITrue_useCPMITrue_usePSMITrue_useAmpTrue/model.ckpt-500.pt
    
  3. Inference with model trained without MIM:

     python convert.py checkpoint=checkpoints/useCSMIFalse_useCPMIFalse_usePSMIFalse_useAmpTrue/model.ckpt-500.pt
    

Citation

If the code is used in your research, please Star our repo and cite our paper:

@inproceedings{wang21n_interspeech,
  author={Disong Wang and Liqun Deng and Yu Ting Yeung and Xiao Chen and Xunying Liu and Helen Meng},
  title={{VQMIVC: Vector Quantization and Mutual Information-Based Unsupervised Speech Representation Disentanglement for One-Shot Voice Conversion}},
  year=2021,
  booktitle={Proc. Interspeech 2021},
  pages={1344--1348},
  doi={10.21437/Interspeech.2021-283}
}

Acknowledgements:

  • The content encoder is borrowed from VectorQuantizedCPC, which also inspires the negative sampling within-utterance for CPC;
  • The speaker encoder is borrowed from AdaIN-VC;
  • The decoder is modified from AutoVC;
  • Estimation of mutual information is modified from CLUB;
  • Speech features extraction is based on espnet and Pyworld.
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].