All Projects → keonlee9420 → VAENAR-TTS

keonlee9420 / VAENAR-TTS

Licence: MIT license
PyTorch Implementation of VAENAR-TTS: Variational Auto-Encoder based Non-AutoRegressive Text-to-Speech Synthesis.

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to VAENAR-TTS

Parallel-Tacotron2
PyTorch Implementation of Google's Parallel Tacotron 2: A Non-Autoregressive Neural TTS Model with Differentiable Duration Modeling
Stars: ✭ 149 (+125.76%)
Mutual labels:  text-to-speech, duration, tts, speech-synthesis, vae, self-attention, neural-tts, non-autoregressive
WaveGrad2
PyTorch Implementation of Google Brain's WaveGrad 2: Iterative Refinement for Text-to-Speech Synthesis
Stars: ✭ 55 (-16.67%)
Mutual labels:  text-to-speech, duration, tts, speech-synthesis, neural-tts, non-autoregressive
Cross-Speaker-Emotion-Transfer
PyTorch Implementation of ByteDance's Cross-speaker Emotion Transfer Based on Speaker Condition Layer Normalization and Semi-Supervised Training in Text-To-Speech
Stars: ✭ 107 (+62.12%)
Mutual labels:  text-to-speech, tts, speech-synthesis, neural-tts, non-autoregressive, non-ar
Daft-Exprt
PyTorch Implementation of Daft-Exprt: Robust Prosody Transfer Across Speakers for Expressive Speech Synthesis
Stars: ✭ 41 (-37.88%)
Mutual labels:  text-to-speech, tts, speech-synthesis, neural-tts, non-autoregressive
Expressive-FastSpeech2
PyTorch Implementation of Non-autoregressive Expressive (emotional, conversational) TTS based on FastSpeech2, supporting English, Korean, and your own languages.
Stars: ✭ 139 (+110.61%)
Mutual labels:  text-to-speech, tts, speech-synthesis, non-autoregressive
StyleSpeech
Official implementation of Meta-StyleSpeech and StyleSpeech
Stars: ✭ 161 (+143.94%)
Mutual labels:  text-to-speech, tts, speech-synthesis, neural-tts
Comprehensive-Tacotron2
PyTorch Implementation of Google's Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions. This implementation supports both single-, multi-speaker TTS and several techniques to enforce the robustness and efficiency of the model.
Stars: ✭ 22 (-66.67%)
Mutual labels:  text-to-speech, tts, speech-synthesis, neural-tts
Wavernn
WaveRNN Vocoder + TTS
Stars: ✭ 1,636 (+2378.79%)
Mutual labels:  text-to-speech, tts, speech-synthesis
react-native-spokestack
Spokestack: give your React Native app a voice interface!
Stars: ✭ 53 (-19.7%)
Mutual labels:  text-to-speech, tts, speech-synthesis
vits
VITS: Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech
Stars: ✭ 1,604 (+2330.3%)
Mutual labels:  text-to-speech, tts, speech-synthesis
Durian
Implementation of "Duration Informed Attention Network for Multimodal Synthesis" (https://arxiv.org/pdf/1909.01700.pdf) paper.
Stars: ✭ 111 (+68.18%)
Mutual labels:  text-to-speech, tts, speech-synthesis
Spokestack Python
Spokestack is a library that allows a user to easily incorporate a voice interface into any Python application.
Stars: ✭ 103 (+56.06%)
Mutual labels:  text-to-speech, tts, speech-synthesis
Cs224n Gpu That Talks
Attention, I'm Trying to Speak: End-to-end speech synthesis (CS224n '18)
Stars: ✭ 52 (-21.21%)
Mutual labels:  text-to-speech, tts, speech-synthesis
Wsay
Windows "say"
Stars: ✭ 36 (-45.45%)
Mutual labels:  text-to-speech, tts, speech-synthesis
Lightspeech
LightSpeech: Lightweight and Fast Text to Speech with Neural Architecture Search
Stars: ✭ 31 (-53.03%)
Mutual labels:  text-to-speech, tts, speech-synthesis
Crystal
Crystal - C++ implementation of a unified framework for multilingual TTS synthesis engine with SSML specification as interface.
Stars: ✭ 108 (+63.64%)
Mutual labels:  text-to-speech, tts, speech-synthesis
TensorVox
Desktop application for neural speech synthesis written in C++
Stars: ✭ 140 (+112.12%)
Mutual labels:  text-to-speech, tts, speech-synthesis
Marytts
MARY TTS -- an open-source, multilingual text-to-speech synthesis system written in pure java
Stars: ✭ 1,699 (+2474.24%)
Mutual labels:  text-to-speech, tts, speech-synthesis
Tensorflowtts
😝 TensorFlowTTS: Real-Time State-of-the-art Speech Synthesis for Tensorflow 2 (supported including English, French, Korean, Chinese, German and Easy to adapt for other languages)
Stars: ✭ 2,382 (+3509.09%)
Mutual labels:  text-to-speech, tts, speech-synthesis
IMS-Toucan
Text-to-Speech Toolkit of the Speech and Language Technologies Group at the University of Stuttgart. Objectives of the development are simplicity, modularity, controllability and multilinguality.
Stars: ✭ 295 (+346.97%)
Mutual labels:  text-to-speech, tts, speech-synthesis

VAENAR-TTS - PyTorch Implementation

PyTorch Implementation of VAENAR-TTS: Variational Auto-Encoder based Non-AutoRegressive Text-to-Speech Synthesis.

The validation logs up to 70K of synthesized mel and alignment are shown below (LJSpeech_val_dec_attn_0_LJ029-0157 and LJSpeech_val_step_LJ029-0157 from top to bottom).

Quickstart

Dependencies

You can install the Python dependencies with

pip3 install -r requirements.txt

Inference

You have to download the pretrained models and put them in output/ckpt/LJSpeech/.

For English single-speaker TTS, run

python3 synthesize.py --text "YOUR_DESIRED_TEXT" --restore_step RESTORE_STEP --mode single -p config/LJSpeech/preprocess.yaml -m config/LJSpeech/model.yaml -t config/LJSpeech/train.yaml

The generated utterances will be put in output/result/.

Batch Inference

Batch inference is also supported, try

python3 synthesize.py --source preprocessed_data/LJSpeech/val.txt --restore_step RESTORE_STEP --mode batch -p config/LJSpeech/preprocess.yaml -m config/LJSpeech/model.yaml -t config/LJSpeech/train.yaml

to synthesize all utterances in preprocessed_data/LJSpeech/val.txt

Training

Datasets

The supported datasets are

  • LJSpeech: a single-speaker English dataset consists of 13100 short audio clips of a female speaker reading passages from 7 non-fiction books, approximately 24 hours in total.

Preprocessing

First, run

python3 prepare_align.py config/LJSpeech/preprocess.yaml

for some preparations. And then run the preprocessing script.

python3 preprocess.py config/LJSpeech/preprocess.yaml

Training

Train your model with

python3 train.py -p config/LJSpeech/preprocess.yaml -m config/LJSpeech/model.yaml -t config/LJSpeech/train.yaml

TensorBoard

Use

tensorboard --logdir output/log/LJSpeech

to serve TensorBoard on your localhost. The loss curves, synthesized mel-spectrograms, and audios are shown.

Implementation Issues

  • Removed arguments, methods during converting Tensorflow to PyTorch: name, kwargs, training, get_config()
  • Specify in_features in LinearNorm which is corresponding to tf.keras.layers.Dense. Also, in_channels is explicitly specified in Conv1D.
  • get_mask_from_lengths() function returns logical not of that of FastSpeech2.
  • In this implementation, the griffin_lim algorithms is used to convert a mel-spectrogram to a waveform. You can use HiFi-GAN as a vocoder by setting config, but you need to train it from scratch (you cannot use the provided pre-trained HiFi-GAN model).

Citation

@misc{lee2021vaenar-tts,
  author = {Lee, Keon},
  title = {VAENAR-TTS},
  year = {2021},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/keonlee9420/VAENAR-TTS}}
}

References

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].