All Projects → auspicious3000 → Autovc

auspicious3000 / Autovc

Licence: mit
AutoVC: Zero-Shot Voice Style Transfer with Only Autoencoder Loss

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Autovc

VAENAR-TTS
PyTorch Implementation of VAENAR-TTS: Variational Auto-Encoder based Non-AutoRegressive Text-to-Speech Synthesis.
Stars: ✭ 66 (-86.39%)
Mutual labels:  speech-synthesis, unsupervised-learning
Athena
an open-source implementation of sequence-to-sequence based speech processing engine
Stars: ✭ 542 (+11.75%)
Mutual labels:  unsupervised-learning, speech-synthesis
Espeak
eSpeak NG is an open source speech synthesizer that supports 101 languages and accents.
Stars: ✭ 339 (-30.1%)
Mutual labels:  speech-synthesis
Enlightengan
[IEEE TIP'2021] "EnlightenGAN: Deep Light Enhancement without Paired Supervision" by Yifan Jiang, Xinyu Gong, Ding Liu, Yu Cheng, Chen Fang, Xiaohui Shen, Jianchao Yang, Pan Zhou, Zhangyang Wang
Stars: ✭ 434 (-10.52%)
Mutual labels:  unsupervised-learning
Recycle Gan
Unsupervised Video Retargeting (e.g. face to face, flower to flower, clouds and winds, sunrise and sunset)
Stars: ✭ 367 (-24.33%)
Mutual labels:  unsupervised-learning
Pytorch Cortexnet
PyTorch implementation of the CortexNet predictive model
Stars: ✭ 349 (-28.04%)
Mutual labels:  unsupervised-learning
Disentangling Vae
Experiments for understanding disentanglement in VAE latent representations
Stars: ✭ 398 (-17.94%)
Mutual labels:  unsupervised-learning
Mlxtend
A library of extension and helper modules for Python's data analysis and machine learning libraries.
Stars: ✭ 3,729 (+668.87%)
Mutual labels:  unsupervised-learning
Gantts
PyTorch implementation of GAN-based text-to-speech synthesis and voice conversion (VC)
Stars: ✭ 460 (-5.15%)
Mutual labels:  speech-synthesis
Espnet
End-to-End Speech Processing Toolkit
Stars: ✭ 4,533 (+834.64%)
Mutual labels:  speech-synthesis
Sprocket
Voice Conversion Tool Kit
Stars: ✭ 425 (-12.37%)
Mutual labels:  speech-synthesis
Voice Builder
An opensource text-to-speech (TTS) voice building tool
Stars: ✭ 362 (-25.36%)
Mutual labels:  speech-synthesis
Pase
Problem Agnostic Speech Encoder
Stars: ✭ 348 (-28.25%)
Mutual labels:  unsupervised-learning
Awesome Vaes
A curated list of awesome work on VAEs, disentanglement, representation learning, and generative models.
Stars: ✭ 418 (-13.81%)
Mutual labels:  unsupervised-learning
Mmt
[ICLR-2020] Mutual Mean-Teaching: Pseudo Label Refinery for Unsupervised Domain Adaptation on Person Re-identification.
Stars: ✭ 345 (-28.87%)
Mutual labels:  unsupervised-learning
Corex topic
Hierarchical unsupervised and semi-supervised topic models for sparse count data with CorEx
Stars: ✭ 439 (-9.48%)
Mutual labels:  unsupervised-learning
Paragraph Vectors
📄 A PyTorch implementation of Paragraph Vectors (doc2vec).
Stars: ✭ 337 (-30.52%)
Mutual labels:  unsupervised-learning
Libfaceid
libfaceid is a research framework for prototyping of face recognition solutions. It seamlessly integrates multiple detection, recognition and liveness models w/ speech synthesis and speech recognition.
Stars: ✭ 354 (-27.01%)
Mutual labels:  speech-synthesis
Contrastive Predictive Coding
Keras implementation of Representation Learning with Contrastive Predictive Coding
Stars: ✭ 369 (-23.92%)
Mutual labels:  unsupervised-learning
Sc Sfmlearner Release
Unsupervised Scale-consistent Depth and Ego-motion Learning from Monocular Video (NeurIPS 2019)
Stars: ✭ 468 (-3.51%)
Mutual labels:  unsupervised-learning

AUTOVC: Zero-Shot Voice Style Transfer with Only Autoencoder Loss

Checkout our new project: Unsupervised Speech Decomposition for Rhythm, Pitch, and Timbre Conversion https://github.com/auspicious3000/SpeechSplit

This repository provides a PyTorch implementation of AUTOVC.

AUTOVC is a many-to-many non-parallel voice conversion framework.

If you find this work useful and use it in your research, please consider citing our paper.

@InProceedings{pmlr-v97-qian19c, title = {{A}uto{VC}: Zero-Shot Voice Style Transfer with Only Autoencoder Loss}, author = {Qian, Kaizhi and Zhang, Yang and Chang, Shiyu and Yang, Xuesong and Hasegawa-Johnson, Mark}, pages = {5210--5219}, year = {2019}, editor = {Kamalika Chaudhuri and Ruslan Salakhutdinov}, volume = {97}, series = {Proceedings of Machine Learning Research}, address = {Long Beach, California, USA}, month = {09--15 Jun}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v97/qian19c/qian19c.pdf}, url = {http://proceedings.mlr.press/v97/qian19c.html} }

Audio Demo

The audio demo for AUTOVC can be found here

Dependencies

  • Python 3
  • Numpy
  • PyTorch >= v0.4.1
  • TensorFlow >= v1.3 (only for tensorboard)
  • librosa
  • tqdm
  • wavenet_vocoder pip install wavenet_vocoder for more information, please refer to https://github.com/r9y9/wavenet_vocoder

Pre-trained models

AUTOVC Speaker Encoder WaveNet Vocoder
link link link

0.Convert Mel-Spectrograms

Download pre-trained AUTOVC model, and run the conversion.ipynb in the same directory.

1.Mel-Spectrograms to waveform

Download pre-trained WaveNet Vocoder model, and run the vocoder.ipynb in the same the directory.

Please note the training metadata and testing metadata have different formats.

2.Train model

We have included a small set of training audio files in the wav folder. However, the data is very small and is for code verification purpose only. Please prepare your own dataset for training.

1.Generate spectrogram data from the wav files: python make_spect.py

2.Generate training metadata, including the GE2E speaker embedding (please use one-hot embeddings if you are not doing zero-shot conversion): python make_metadata.py

3.Run the main training script: python main.py

Converges when the reconstruction loss is around 0.0001.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].