Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → keonlee9420 → VAENAR-TTS

keonlee9420 / VAENAR-TTS

Licence: MIT license

PyTorch Implementation of VAENAR-TTS: Variational Auto-Encoder based Non-AutoRegressive Text-to-Speech Synthesis.

Programming Languages

139335 projects - #7 most used programming language

Labels

text-to-speech duration pytorch tts speech-synthesis vae unsupervised-learning glow self-attention neural-tts non-autoregressive transforer non-ar unsupervised-duration

Projects that are alternatives of or similar to VAENAR-TTS

Parallel-Tacotron2

PyTorch Implementation of Google's Parallel Tacotron 2: A Non-Autoregressive Neural TTS Model with Differentiable Duration Modeling

Stars: ✭ 149 (+125.76%)

Mutual labels: text-to-speech, duration, tts, speech-synthesis, vae, self-attention, neural-tts, non-autoregressive

PyTorch Implementation of Google Brain's WaveGrad 2: Iterative Refinement for Text-to-Speech Synthesis

Stars: ✭ 55 (-16.67%)

Mutual labels: text-to-speech, duration, tts, speech-synthesis, neural-tts, non-autoregressive

Cross-Speaker-Emotion-Transfer

PyTorch Implementation of ByteDance's Cross-speaker Emotion Transfer Based on Speaker Condition Layer Normalization and Semi-Supervised Training in Text-To-Speech

Stars: ✭ 107 (+62.12%)

Mutual labels: text-to-speech, tts, speech-synthesis, neural-tts, non-autoregressive, non-ar

PyTorch Implementation of Daft-Exprt: Robust Prosody Transfer Across Speakers for Expressive Speech Synthesis

Stars: ✭ 41 (-37.88%)

Mutual labels: text-to-speech, tts, speech-synthesis, neural-tts, non-autoregressive

Expressive-FastSpeech2

PyTorch Implementation of Non-autoregressive Expressive (emotional, conversational) TTS based on FastSpeech2, supporting English, Korean, and your own languages.

Stars: ✭ 139 (+110.61%)

Mutual labels: text-to-speech, tts, speech-synthesis, non-autoregressive

Official implementation of Meta-StyleSpeech and StyleSpeech

Stars: ✭ 161 (+143.94%)

Mutual labels: text-to-speech, tts, speech-synthesis, neural-tts

Comprehensive-Tacotron2

PyTorch Implementation of Google's Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions. This implementation supports both single-, multi-speaker TTS and several techniques to enforce the robustness and efficiency of the model.

Stars: ✭ 22 (-66.67%)

Mutual labels: text-to-speech, tts, speech-synthesis, neural-tts

WaveRNN Vocoder + TTS

Stars: ✭ 1,636 (+2378.79%)

Mutual labels: text-to-speech, tts, speech-synthesis

react-native-spokestack

Spokestack: give your React Native app a voice interface!

Stars: ✭ 53 (-19.7%)

Mutual labels: text-to-speech, tts, speech-synthesis

VITS: Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech

Stars: ✭ 1,604 (+2330.3%)

Mutual labels: text-to-speech, tts, speech-synthesis

Implementation of "Duration Informed Attention Network for Multimodal Synthesis" (https://arxiv.org/pdf/1909.01700.pdf) paper.

Stars: ✭ 111 (+68.18%)

Mutual labels: text-to-speech, tts, speech-synthesis

Spokestack Python

Spokestack is a library that allows a user to easily incorporate a voice interface into any Python application.

Stars: ✭ 103 (+56.06%)

Mutual labels: text-to-speech, tts, speech-synthesis

Cs224n Gpu That Talks

Attention, I'm Trying to Speak: End-to-end speech synthesis (CS224n '18)

Stars: ✭ 52 (-21.21%)

Mutual labels: text-to-speech, tts, speech-synthesis

Windows "say"

Stars: ✭ 36 (-45.45%)

Mutual labels: text-to-speech, tts, speech-synthesis

LightSpeech: Lightweight and Fast Text to Speech with Neural Architecture Search

Stars: ✭ 31 (-53.03%)

Mutual labels: text-to-speech, tts, speech-synthesis

Crystal - C++ implementation of a unified framework for multilingual TTS synthesis engine with SSML specification as interface.

Stars: ✭ 108 (+63.64%)

Mutual labels: text-to-speech, tts, speech-synthesis

Desktop application for neural speech synthesis written in C++

Stars: ✭ 140 (+112.12%)

Mutual labels: text-to-speech, tts, speech-synthesis

MARY TTS -- an open-source, multilingual text-to-speech synthesis system written in pure java

Stars: ✭ 1,699 (+2474.24%)

Mutual labels: text-to-speech, tts, speech-synthesis

😝 TensorFlowTTS: Real-Time State-of-the-art Speech Synthesis for Tensorflow 2 (supported including English, French, Korean, Chinese, German and Easy to adapt for other languages)

Stars: ✭ 2,382 (+3509.09%)

Mutual labels: text-to-speech, tts, speech-synthesis

Text-to-Speech Toolkit of the Speech and Language Technologies Group at the University of Stuttgart. Objectives of the development are simplicity, modularity, controllability and multilinguality.

Stars: ✭ 295 (+346.97%)

Mutual labels: text-to-speech, tts, speech-synthesis

View All Similar Projects ➔

VAENAR-TTS - PyTorch Implementation

PyTorch Implementation of VAENAR-TTS: Variational Auto-Encoder based Non-AutoRegressive Text-to-Speech Synthesis.

The validation logs up to 70K of synthesized mel and alignment are shown below (LJSpeech_val_dec_attn_0_LJ029-0157 and LJSpeech_val_step_LJ029-0157 from top to bottom).

Quickstart

Dependencies

You can install the Python dependencies with

pip3 install -r requirements.txt

Inference

You have to download the pretrained models and put them in output/ckpt/LJSpeech/.

For English single-speaker TTS, run

python3 synthesize.py --text "YOUR_DESIRED_TEXT" --restore_step RESTORE_STEP --mode single -p config/LJSpeech/preprocess.yaml -m config/LJSpeech/model.yaml -t config/LJSpeech/train.yaml

The generated utterances will be put in output/result/.

Batch Inference

Batch inference is also supported, try

python3 synthesize.py --source preprocessed_data/LJSpeech/val.txt --restore_step RESTORE_STEP --mode batch -p config/LJSpeech/preprocess.yaml -m config/LJSpeech/model.yaml -t config/LJSpeech/train.yaml

to synthesize all utterances in preprocessed_data/LJSpeech/val.txt

Training

Datasets

The supported datasets are

LJSpeech: a single-speaker English dataset consists of 13100 short audio clips of a female speaker reading passages from 7 non-fiction books, approximately 24 hours in total.

Preprocessing

First, run

python3 prepare_align.py config/LJSpeech/preprocess.yaml

for some preparations. And then run the preprocessing script.

python3 preprocess.py config/LJSpeech/preprocess.yaml

Training

Train your model with

python3 train.py -p config/LJSpeech/preprocess.yaml -m config/LJSpeech/model.yaml -t config/LJSpeech/train.yaml

TensorBoard

Use

tensorboard --logdir output/log/LJSpeech

to serve TensorBoard on your localhost. The loss curves, synthesized mel-spectrograms, and audios are shown.

Implementation Issues

Removed arguments, methods during converting Tensorflow to PyTorch: name, kwargs, training, get_config()
Specify in_features in LinearNorm which is corresponding to tf.keras.layers.Dense. Also, in_channels is explicitly specified in Conv1D.
get_mask_from_lengths() function returns logical not of that of FastSpeech2.
In this implementation, the griffin_lim algorithms is used to convert a mel-spectrogram to a waveform. You can use HiFi-GAN as a vocoder by setting config, but you need to train it from scratch (you cannot use the provided pre-trained HiFi-GAN model).

Citation

@misc{lee2021vaenar-tts,
  author = {Lee, Keon},
  title = {VAENAR-TTS},
  year = {2021},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/keonlee9420/VAENAR-TTS}}
}

References

thuhcsi's VAENAR-TTS

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 66

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (1) 🔗