All Projects → BrightGu → MediumVC

BrightGu / MediumVC

Licence: other
Any-to-any voice conversion using synthetic specific-speaker speeches as intermedium features

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to MediumVC

SingleVC
Any-to-one voice conversion using the data augment strategy: pitch shifted and duration remained.
Stars: ✭ 25 (-45.65%)
Mutual labels:  speech-synthesis, voice-conversion, vc
YourTTS
YourTTS: Towards Zero-Shot Multi-Speaker TTS and Zero-Shot Voice Conversion for everyone
Stars: ✭ 217 (+371.74%)
Mutual labels:  speech-synthesis, voice-conversion
voice-conversion
an tutorial implement of voice conversion using pytorch
Stars: ✭ 26 (-43.48%)
Mutual labels:  speech-synthesis, voice-conversion
ppg-vc
PPG-Based Voice Conversion
Stars: ✭ 154 (+234.78%)
Mutual labels:  speech-synthesis, voice-conversion
Espnet
End-to-End Speech Processing Toolkit
Stars: ✭ 4,533 (+9754.35%)
Mutual labels:  speech-synthesis, voice-conversion
sova-tts-engine
Tacotron2 based engine for the SOVA-TTS project
Stars: ✭ 63 (+36.96%)
Mutual labels:  speech-synthesis
vits
VITS: Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech
Stars: ✭ 1,604 (+3386.96%)
Mutual labels:  speech-synthesis
IMS-Toucan
Text-to-Speech Toolkit of the Speech and Language Technologies Group at the University of Stuttgart. Objectives of the development are simplicity, modularity, controllability and multilinguality.
Stars: ✭ 295 (+541.3%)
Mutual labels:  speech-synthesis
VQMIVC
Official implementation of VQMIVC: One-shot (any-to-any) Voice Conversion @ Interspeech 2021 + Online playing demo!
Stars: ✭ 278 (+504.35%)
Mutual labels:  voice-conversion
Zero-Shot-TTS
Unofficial Implementation of Zero-Shot Text-to-Speech for Text-Based Insertion in Audio Narration
Stars: ✭ 33 (-28.26%)
Mutual labels:  speech-synthesis
Khronos
The open source intelligent personal assistant
Stars: ✭ 25 (-45.65%)
Mutual labels:  speech-synthesis
Shifter
Pitch shifter using WSOLA and resampling implemented by Python3
Stars: ✭ 22 (-52.17%)
Mutual labels:  voice-conversion
wiki2ssml
Wiki2SSML provides the WikiVoice markup language used for fine-tuning synthesised voice.
Stars: ✭ 31 (-32.61%)
Mutual labels:  speech-synthesis
Expressive-FastSpeech2
PyTorch Implementation of Non-autoregressive Expressive (emotional, conversational) TTS based on FastSpeech2, supporting English, Korean, and your own languages.
Stars: ✭ 139 (+202.17%)
Mutual labels:  speech-synthesis
GlottDNN
GlottDNN vocoder and tools for training DNN excitation models
Stars: ✭ 30 (-34.78%)
Mutual labels:  speech-synthesis
StyleSpeech
Official implementation of Meta-StyleSpeech and StyleSpeech
Stars: ✭ 161 (+250%)
Mutual labels:  speech-synthesis
sam
Software Automatic Mouth - Tiny Speech Synthesizer
Stars: ✭ 316 (+586.96%)
Mutual labels:  speech-synthesis
Catch-A-Waveform
Official pytorch implementation of the paper: "Catch-A-Waveform: Learning to Generate Audio from a Single Short Example" (NeurIPS 2021)
Stars: ✭ 117 (+154.35%)
Mutual labels:  speech-synthesis
TFGAN
TFGAN: Time and Frequency Domain Based Generative Adversarial Network for High-fidelity Speech Synthesis
Stars: ✭ 65 (+41.3%)
Mutual labels:  speech-synthesis
react-native-spokestack
Spokestack: give your React Native app a voice interface!
Stars: ✭ 53 (+15.22%)
Mutual labels:  speech-synthesis

MediumVC

MediumVC is an utterance-level method towards any-to-any VC. Before that, we propose SingleVC to perform A2O tasks(Xi → Ŷi) , Xi means utterance i spoken by X). The Ŷi are considered as SSIF. To build SingleVC, we employ a novel data augment strategy: pitch-shifted and duration-remained(PSDR) to produce paired asymmetrical training data. Then, based on pre-trained SingleVC, MediumVC performs an asymmetrical reconstruction task(Ŷi → X̂i). Due to the asymmetrical reconstruction mode, MediumVC achieves more efficient feature decoupling and fusion. Experiments demonstrate MediumVC performs strong robustness for unseen speakers across multiple public datasets. Here is the official implementation of the paper, MediumVC.

The following are the overall model architecture.

Model architecture

For the audio samples, please refer to our demo page. The more converted speeches can be found in "Demo/ConvertedSpeeches/".

Envs

You can install the dependencies with

pip install -r requirements.txt

Speaker Encoder

Dvector is a robust speaker verification (SV) system pre-trained on VoxCeleb1 using GE2E loss, and it produces 256-dim speaker embedding. In our evaluation on multiple datasets(VCTK with 30000 pairs, Librispeech with 30000 pairs and VCC2020 with 10000 pairs), the equal error rates(EERs)and thresholds(THRs) are recorded in Table. Then Dvector with THRs is also employed to calculate SV accuracy(ACC) of pairs produced by MediumVC and other contrast methods for objective evaluation. The more details can access paper.

Dataset VCTK LibriSpeech VCC2020
EER(%)/THR 7.71/0.462 7.95/0.337 1.06/0.432

Vocoder

The HiFi-GAN vocoder is employed to convert log mel-spectrograms to waveforms. The model is trained on universal datasets with 13.93M parameters. Through our evaluation, it can synthesize 22.05 kHz high-fidelity speeches over 4.0 MOS, even in cross-language or noisy environments.

Infer

You can download the pretrained model, and then edit "Any2Any/infer/infer_config.yaml".Test Samples could be organized as "wav22050/$figure$/*.wav".

python Any2Any/infer/infer.py

Train from scratch

Preprocessing

The corpus should be organized as "VCTK22050/$figure$/*.wav", and then edit the config file "Any2Any/pre_feature/preprocess_config.yaml".The output "spk_emb_mel_label.pkl" will be used for training.

python Any2Any/pre_feature/figure_spkemb_mel.py

Training

Please edit the paths of pretrained hifi-model,wav2mel,dvector,SingleVC in config file "Any2Any/config.yaml" at first.

python Any2Any/solver.py
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].