All Projects → jxzhanggg → Nonparaseq2seqvc_code

jxzhanggg / Nonparaseq2seqvc_code

Licence: mit
Implementation code of non-parallel sequence-to-sequence VC

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Nonparaseq2seqvc code

Wavernn
WaveRNN Vocoder + TTS
Stars: ✭ 1,636 (+962.34%)
Mutual labels:  text-to-speech
Marytts
MARY TTS -- an open-source, multilingual text-to-speech synthesis system written in pure java
Stars: ✭ 1,699 (+1003.25%)
Mutual labels:  text-to-speech
Diffwave
DiffWave is a fast, high-quality neural vocoder and waveform synthesizer.
Stars: ✭ 139 (-9.74%)
Mutual labels:  text-to-speech
Crystal
Crystal - C++ implementation of a unified framework for multilingual TTS synthesis engine with SSML specification as interface.
Stars: ✭ 108 (-29.87%)
Mutual labels:  text-to-speech
Nlp Pretrained Model
A collection of Natural language processing pre-trained models.
Stars: ✭ 122 (-20.78%)
Mutual labels:  text-to-speech
Talkify
Javascript Text to speech library
Stars: ✭ 132 (-14.29%)
Mutual labels:  text-to-speech
Spokestack Python
Spokestack is a library that allows a user to easily incorporate a voice interface into any Python application.
Stars: ✭ 103 (-33.12%)
Mutual labels:  text-to-speech
Tensorflowtts
😝 TensorFlowTTS: Real-Time State-of-the-art Speech Synthesis for Tensorflow 2 (supported including English, French, Korean, Chinese, German and Easy to adapt for other languages)
Stars: ✭ 2,382 (+1446.75%)
Mutual labels:  text-to-speech
Pytorch Dc Tts
Text to Speech with PyTorch (English and Mongolian)
Stars: ✭ 122 (-20.78%)
Mutual labels:  text-to-speech
Vonage Python Sdk
Vonage Server SDK for Python. API support for SMS, Voice, Text-to-Speech, Numbers, Verify (2FA) and more.
Stars: ✭ 134 (-12.99%)
Mutual labels:  text-to-speech
Durian
Implementation of "Duration Informed Attention Network for Multimodal Synthesis" (https://arxiv.org/pdf/1909.01700.pdf) paper.
Stars: ✭ 111 (-27.92%)
Mutual labels:  text-to-speech
Tts
Text-to-Speech for Arduino
Stars: ✭ 118 (-23.38%)
Mutual labels:  text-to-speech
Awesome Ai Services
An overview of the AI-as-a-service landscape
Stars: ✭ 133 (-13.64%)
Mutual labels:  text-to-speech
Cross Lingual Voice Cloning
Tacotron 2 - PyTorch implementation with faster-than-realtime inference modified to enable cross lingual voice cloning.
Stars: ✭ 106 (-31.17%)
Mutual labels:  text-to-speech
Wavegrad
A fast, high-quality neural vocoder.
Stars: ✭ 138 (-10.39%)
Mutual labels:  text-to-speech
Tacotron Pytorch
A Pytorch Implementation of Tacotron: End-to-end Text-to-speech Deep-Learning Model
Stars: ✭ 104 (-32.47%)
Mutual labels:  text-to-speech
Alan Sdk Pcf
Alan AI Power Apps SDK adds a voice assistant or chatbot to your Microsoft Power Apps project.
Stars: ✭ 128 (-16.88%)
Mutual labels:  text-to-speech
Aeneas
aeneas is a Python/C library and a set of tools to automagically synchronize audio and text (aka forced alignment)
Stars: ✭ 1,942 (+1161.04%)
Mutual labels:  text-to-speech
Amazon Polly Sample
Sample application for Amazon Polly. Allows to convert any blog into an audio podcast.
Stars: ✭ 139 (-9.74%)
Mutual labels:  text-to-speech
Androidmarytts
Android MARY TTS - an open-source, offline HMM-Based text-to-speech synthesis system based on MaryTTS
Stars: ✭ 134 (-12.99%)
Mutual labels:  text-to-speech

Non-parallel Seq2seq Voice Conversion

Implementation code of Non-Parallel Sequence-to-Sequence Voice Conversion with Disentangled Linguistic and Speaker Representations.

For audio samples, please visit our demo page.

The structure overview of the model

Dependencies

  • Python 3.6
  • PyTorch 1.0.1
  • CUDA 10.0

Data

It is recommended you download the VCTK and CMU-ARCTIC datasets.

Usage

Installation

Install Python dependencies.

$ pip install -r requirements.txt

Feature Extraction

Extract Mel-Spectrograms, Spectrograms and Phonemes

You can use extract_features.py

Customize data reader

Write a snippet of code to walk through the dataset for generating list file for train, valid and test set.

Then you will need to modify the data reader to read your training data. The following are scripts you will need to modify.

For pre-training:

For fine-tuning:

Pre-train the model

Add correct paths to your local data, and run the bash script:

$ cd pre-train
$ bash run.sh

Run the inference code to generate audio samples on multi-speaker dataset. During inference, our model can be run on either TTS (using text inputs) or VC (using Mel-spectrogram inputs) mode.

$ python inference.py

Fine-tune the model

Fine-tune the model and generate audio samples on conversion pair. During inference, our model can be run on either TTS (using text inputs) or VC (using Mel-spectrogram inputs) mode.

$ cd fine-tune
$ bash run.sh

Training Time

On a single NVIDIA 1080 Ti GPU, with a batch size of 32, pre-training on VCTK takes approximately 64 hours of wall-clock time. Fine-tuning on two speakers (500 utterances each speaker) with a batch size of 8 takes approximately 6 hours of wall-clock time.

Citation

If you use this code, please cite:

@article{zhangnonpara2020, 
author={Jing-Xuan {Zhang} and Zhen-Hua {Ling} and Li-Rong {Dai}}, 
journal={IEEE/ACM Transactions on Audio, Speech, and Language Processing}, 
title={Non-Parallel Sequence-to-Sequence Voice Conversion with Disentangled Linguistic and Speaker Representations}, 
year={2020}, 
volume={28}, 
number={1}, 
pages={540-552}}

Acknowledgements

Part of code was adapted from the following project:

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].