All Projects → leimao → Voice Converter Cyclegan

leimao / Voice Converter Cyclegan

Licence: mit
Voice Converter Using CycleGAN and Non-Parallel Data

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Voice Converter Cyclegan

CVC
CVC: Contrastive Learning for Non-parallel Voice Conversion (INTERSPEECH 2021, in PyTorch)
Stars: ✭ 45 (-88.28%)
Mutual labels:  speech, cyclegan
Cyclegan
Tensorflow implementation of CycleGAN
Stars: ✭ 348 (-9.37%)
Mutual labels:  cyclegan
Speech Aligner
speech-aligner,是一个从“人声语音”及其“语言文本”,产生音素级别时间对齐标注的工具。speech-aligner, is a tool that generate phoneme-level alignment between human speech and its transcription
Stars: ✭ 259 (-32.55%)
Mutual labels:  speech
Css10
CSS10: A Collection of Single Speaker Speech Datasets for 10 Languages
Stars: ✭ 302 (-21.35%)
Mutual labels:  speech
Hands On Deep Learning Algorithms With Python
Master Deep Learning Algorithms with Extensive Math by Implementing them using TensorFlow
Stars: ✭ 272 (-29.17%)
Mutual labels:  cyclegan
Cyclegan Tensorflow 2
CycleGAN Tensorflow 2
Stars: ✭ 330 (-14.06%)
Mutual labels:  cyclegan
Noise2Noise-audio denoising without clean training data
Source code for the paper titled "Speech Denoising without Clean Training Data: a Noise2Noise Approach". Paper accepted at the INTERSPEECH 2021 conference. This paper tackles the problem of the heavy dependence of clean speech data required by deep learning based audio denoising methods by showing that it is possible to train deep speech denoisi…
Stars: ✭ 49 (-87.24%)
Mutual labels:  speech
Voice Builder
An opensource text-to-speech (TTS) voice building tool
Stars: ✭ 362 (-5.73%)
Mutual labels:  speech
Attentiongan
AttentionGAN for Unpaired Image-to-Image Translation & Multi-Domain Image-to-Image Translation
Stars: ✭ 341 (-11.2%)
Mutual labels:  cyclegan
Pocketsphinx Python
Python interface to CMU Sphinxbase and Pocketsphinx libraries
Stars: ✭ 298 (-22.4%)
Mutual labels:  speech
Pysptk
A python wrapper for Speech Signal Processing Toolkit (SPTK).
Stars: ✭ 297 (-22.66%)
Mutual labels:  speech
Generative models tutorial with demo
Generative Models Tutorial with Demo: Bayesian Classifier Sampling, Variational Auto Encoder (VAE), Generative Adversial Networks (GANs), Popular GANs Architectures, Auto-Regressive Models, Important Generative Model Papers, Courses, etc..
Stars: ✭ 276 (-28.12%)
Mutual labels:  cyclegan
Ios 10 Sampler
Code examples for new APIs of iOS 10.
Stars: ✭ 3,341 (+770.05%)
Mutual labels:  speech
Speech Vad Demo
集成Webrtc的VAD,用于切分音频文件
Stars: ✭ 259 (-32.55%)
Mutual labels:  speech
Inaspeechsegmenter
CNN-based audio segmentation toolkit. Allows to detect speech, music and speaker gender. Has been designed for large scale gender equality studies based on speech time per gender.
Stars: ✭ 352 (-8.33%)
Mutual labels:  speech
Amazing Python Scripts
🚀 Curated collection of Amazing Python scripts from Basics to Advance with automation task scripts.
Stars: ✭ 229 (-40.36%)
Mutual labels:  speech
Pix2depth
DEPRECATED: Depth Map Estimation from Monocular Images
Stars: ✭ 293 (-23.7%)
Mutual labels:  cyclegan
Android Speech
Android speech recognition and text to speech made easy
Stars: ✭ 310 (-19.27%)
Mutual labels:  speech
Tts
🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production
Stars: ✭ 305 (-20.57%)
Mutual labels:  speech
Pycadl
Python package with source code from the course "Creative Applications of Deep Learning w/ TensorFlow"
Stars: ✭ 356 (-7.29%)
Mutual labels:  cyclegan

Voice Converter CycleGAN

Lei Mao

University of Chicago

Introduction

Cycle-consistent adversarial networks (CycleGAN) has been widely used for image conversions. It turns out that it could also be used for voice conversion. This is an implementation of CycleGAN on human speech conversions. The neural network utilized 1D gated convolution neural network (Gated CNN) for generator, and 2D Gated CNN for discriminator. The model takes Mel-cepstral coefficients (MCEPs) (for spectral envelop) as input for voice conversions.

Dependencies

  • Python 3.5
  • Numpy 1.14
  • TensorFlow 1.8
  • ProgressBar2 3.37.1
  • LibROSA 0.6
  • FFmpeg 4.0
  • PyWorld

Files

.
├── convert.py
├── demo
├── download.py
├── figures
├── LICENSE.md
├── model.py
├── module.py
├── preprocess.py
├── README.md
├── train_log
├── train.py
└── utils.py

Usage

Docker Container

Build the Docker container image using the following command.

$ docker build --rm -t tensorflow-cyclegan-vc:1.0 -f Dockerfile .

Start the Docker container for CycleGAN-VC using the following command.

$ nvidia-docker run -it --rm -v $(pwd):/mnt tensorflow-cyclegan-vc:1.0

Because the model was implemented using TensorFlow 1.8, there could be some warnings due to function deprecations when running the programs.

Download Dataset

Download and unzip VCC2016 dataset to designated directories.

$ python download.py --help
usage: download.py [-h] [--download_dir DOWNLOAD_DIR] [--data_dir DATA_DIR]
                   [--datasets DATASETS]

Download CycleGAN voice conversion datasets.

optional arguments:
  -h, --help            show this help message and exit
  --download_dir DOWNLOAD_DIR
                        Download directory for zipped data
  --data_dir DATA_DIR   Data directory for unzipped data
  --datasets DATASETS   Datasets available: vcc2016

For example, to download the datasets to download directory and extract to data directory:

$ python download.py --download_dir ./download --data_dir ./data --datasets vcc2016

Train Model

To have a good conversion capability, the training would take at least 1000 epochs, which could take very long time even using a NVIDIA GTX TITAN X graphic card.

$ python train.py --help
usage: train.py [-h] [--train_A_dir TRAIN_A_DIR] [--train_B_dir TRAIN_B_DIR]
                [--model_dir MODEL_DIR] [--model_name MODEL_NAME]
                [--random_seed RANDOM_SEED]
                [--validation_A_dir VALIDATION_A_DIR]
                [--validation_B_dir VALIDATION_B_DIR]
                [--output_dir OUTPUT_DIR]
                [--tensorboard_log_dir TENSORBOARD_LOG_DIR]

Train CycleGAN model for datasets.

optional arguments:
  -h, --help            show this help message and exit
  --train_A_dir TRAIN_A_DIR
                        Directory for A.
  --train_B_dir TRAIN_B_DIR
                        Directory for B.
  --model_dir MODEL_DIR
                        Directory for saving models.
  --model_name MODEL_NAME
                        File name for saving model.
  --random_seed RANDOM_SEED
                        Random seed for model training.
  --validation_A_dir VALIDATION_A_DIR
                        Convert validation A after each training epoch. If set
                        none, no conversion would be done during the training.
  --validation_B_dir VALIDATION_B_DIR
                        Convert validation B after each training epoch. If set
                        none, no conversion would be done during the training.
  --output_dir OUTPUT_DIR
                        Output directory for converted validation voices.
  --tensorboard_log_dir TENSORBOARD_LOG_DIR
                        TensorBoard log directory.

For example, to train CycleGAN model for voice conversion between SF1 and TM1:

$ python train.py --train_A_dir ./data/vcc2016_training/SF1 --train_B_dir ./data/vcc2016_training/TM1 --model_dir ./model/sf1_tm1 --model_name sf1_tm1.ckpt --random_seed 0 --validation_A_dir ./data/evaluation_all/SF1 --validation_B_dir ./data/evaluation_all/TM1 --output_dir ./validation_output --tensorboard_log_dir ./log

With validation_A_dir, validation_B_dir, and output_dir set, we could monitor the conversion of validation voices after each epoch using our bare ear.

Voice Conversion

Convert voices using pre-trained models.

$ python convert.py --help
usage: convert.py [-h] [--model_dir MODEL_DIR] [--model_name MODEL_NAME]
                  [--data_dir DATA_DIR]
                  [--conversion_direction CONVERSION_DIRECTION]
                  [--output_dir OUTPUT_DIR]

Convert voices using pre-trained CycleGAN model.

optional arguments:
  -h, --help            show this help message and exit
  --model_dir MODEL_DIR
                        Directory for the pre-trained model.
  --model_name MODEL_NAME
                        Filename for the pre-trained model.
  --data_dir DATA_DIR   Directory for the voices for conversion.
  --conversion_direction CONVERSION_DIRECTION
                        Conversion direction for CycleGAN. A2B or B2A. The
                        first object in the model file name is A, and the
                        second object in the model file name is B.
  --output_dir OUTPUT_DIR
                        Directory for the converted voices.

To convert voice, put wav-formed speeches into data_dir and run the following commands in the terminal, the converted speeches would be saved in the output_dir:

$ python convert.py --model_dir ./model/sf1_tm1 --model_name sf1_tm1.ckpt --data_dir ./data/evaluation_all/SF1 --conversion_direction A2B --output_dir ./converted_voices

The convention for conversion_direction is that the first object in the model filename is A, and the second object in the model filename is B. In this case, SF1 = A and TM1 = B.

Demo

VCC2016 SF1 and TF2 Conversion

In the demo directory, there are voice conversions between the validation data of SF1 and TF2 using the pre-trained model.

200001_SF1.wav and 200001_TF2.wav are real voices for the same speech from SF1 and TF2, respectively.

200001_SF1toTF2.wav and 200001_TF2.wav are the converted voice using the pre-trained model.

200001_SF1toTF2_author.wav is the converted voice from the NTT website for comparison with our model performance.

The conversion performance is extremely good and the converted speech sounds real to me.

Download the pre-trained SF1-TF2 conversion model and conversion of all the validation samples from Google Drive.

Reference

  • Takuhiro Kaneko, Hirokazu Kameoka. Parallel-Data-Free Voice Conversion Using Cycle-Consistent Adversarial Networks. 2017. (Voice Conversion CycleGAN)
  • Wenzhe Shi, Jose Caballero, Ferenc Huszár, Johannes Totz, Andrew P. Aitken, Rob Bishop, Daniel Rueckert, Zehan Wang. Real-Time Single Image and Video Super-Resolution Using an Efficient Sub-Pixel Convolutional Neural Network. 2016. (Pixel Shuffler)
  • Yann Dauphin, Angela Fan, Michael Auli, David Grangier. Language Modeling with Gated Convolutional Networks. 2017. (Gated CNN)
  • Takuhiro Kaneko, Hirokazu Kameoka, Kaoru Hiramatsu, Kunio Kashino. Sequence-to-Sequence Voice Conversion with Similarity Metric Learned Using Generative Adversarial Networks. 2017. (1D Gated CNN)
  • Kun Liu, Jianping Zhang, Yonghong Yan. High Quality Voice Conversion through Phoneme-based Linear Mapping Functions with STRAIGHT for Mandarin. 2007. (Foundamental Frequnecy Transformation)
  • PyWorld and SPTK Comparison
  • Gated CNN TensorFlow

To-Do List

  • [ ] Parallelize data preprocessing
  • [ ] Evaluation metrics
  • [x] Hyper parameter tuning
  • [ ] Train more conversion models
  • [x] Argparse
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].