Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → keonlee9420 → WaveGrad2

keonlee9420 / WaveGrad2

Licence: MIT license

PyTorch Implementation of Google Brain's WaveGrad 2: Iterative Refinement for Text-to-Speech Synthesis

Programming Languages

139335 projects - #7 most used programming language

Labels

audio text-to-speech duration end-to-end pytorch tts speech-synthesis robust synthesis neural-tts non-autoregressive text-to-audio score-matching phoneme-to-waveform

Projects that are alternatives of or similar to WaveGrad2

Parallel-Tacotron2

PyTorch Implementation of Google's Parallel Tacotron 2: A Non-Autoregressive Neural TTS Model with Differentiable Duration Modeling

Stars: ✭ 149 (+170.91%)

Mutual labels: text-to-speech, duration, tts, speech-synthesis, neural-tts, non-autoregressive

PyTorch Implementation of VAENAR-TTS: Variational Auto-Encoder based Non-AutoRegressive Text-to-Speech Synthesis.

Stars: ✭ 66 (+20%)

Mutual labels: text-to-speech, duration, tts, speech-synthesis, neural-tts, non-autoregressive

PyTorch Implementation of Daft-Exprt: Robust Prosody Transfer Across Speakers for Expressive Speech Synthesis

Stars: ✭ 41 (-25.45%)

Mutual labels: text-to-speech, tts, speech-synthesis, neural-tts, non-autoregressive

Cross-Speaker-Emotion-Transfer

PyTorch Implementation of ByteDance's Cross-speaker Emotion Transfer Based on Speaker Condition Layer Normalization and Semi-Supervised Training in Text-To-Speech

Stars: ✭ 107 (+94.55%)

Mutual labels: text-to-speech, tts, speech-synthesis, neural-tts, non-autoregressive

Official implementation of Meta-StyleSpeech and StyleSpeech

Stars: ✭ 161 (+192.73%)

Mutual labels: text-to-speech, tts, speech-synthesis, neural-tts

Comprehensive-Tacotron2

PyTorch Implementation of Google's Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions. This implementation supports both single-, multi-speaker TTS and several techniques to enforce the robustness and efficiency of the model.

Stars: ✭ 22 (-60%)

Mutual labels: text-to-speech, tts, speech-synthesis, neural-tts

Expressive-FastSpeech2

PyTorch Implementation of Non-autoregressive Expressive (emotional, conversational) TTS based on FastSpeech2, supporting English, Korean, and your own languages.

Stars: ✭ 139 (+152.73%)

Mutual labels: text-to-speech, tts, speech-synthesis, non-autoregressive

Unofficial Implementation of Zero-Shot Text-to-Speech for Text-Based Insertion in Audio Narration

Stars: ✭ 33 (-40%)

Mutual labels: text-to-speech, tts, speech-synthesis

Implementation of "Duration Informed Attention Network for Multimodal Synthesis" (https://arxiv.org/pdf/1909.01700.pdf) paper.

Stars: ✭ 111 (+101.82%)

Mutual labels: text-to-speech, tts, speech-synthesis

Text to Speech with PyTorch (English and Mongolian)

Stars: ✭ 122 (+121.82%)

Mutual labels: text-to-speech, tts, speech-synthesis

😝 TensorFlowTTS: Real-Time State-of-the-art Speech Synthesis for Tensorflow 2 (supported including English, French, Korean, Chinese, German and Easy to adapt for other languages)

Stars: ✭ 2,382 (+4230.91%)

Mutual labels: text-to-speech, tts, speech-synthesis

Desktop application for neural speech synthesis written in C++

Stars: ✭ 140 (+154.55%)

Mutual labels: text-to-speech, tts, speech-synthesis

WaveRNN Vocoder + TTS

Stars: ✭ 1,636 (+2874.55%)

Mutual labels: text-to-speech, tts, speech-synthesis

MARY TTS -- an open-source, multilingual text-to-speech synthesis system written in pure java

Stars: ✭ 1,699 (+2989.09%)

Mutual labels: text-to-speech, tts, speech-synthesis

Implementation of Google Brain's WaveGrad high-fidelity vocoder (paper: https://arxiv.org/pdf/2009.00713.pdf). First implementation on GitHub.

Stars: ✭ 245 (+345.45%)

Mutual labels: text-to-speech, tts, speech-synthesis

Crystal - C++ implementation of a unified framework for multilingual TTS synthesis engine with SSML specification as interface.

Stars: ✭ 108 (+96.36%)

Mutual labels: text-to-speech, tts, speech-synthesis

react-native-spokestack

Spokestack: give your React Native app a voice interface!

Stars: ✭ 53 (-3.64%)

Mutual labels: text-to-speech, tts, speech-synthesis

VITS: Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech

Stars: ✭ 1,604 (+2816.36%)

Mutual labels: text-to-speech, tts, speech-synthesis

Spokestack Python

Spokestack is a library that allows a user to easily incorporate a voice interface into any Python application.

Stars: ✭ 103 (+87.27%)

Mutual labels: text-to-speech, tts, speech-synthesis

Tacotron Pytorch

A Pytorch Implementation of Tacotron: End-to-end Text-to-speech Deep-Learning Model

Stars: ✭ 104 (+89.09%)

Mutual labels: text-to-speech, end-to-end, speech-synthesis

View All Similar Projects ➔

WaveGrad2 - PyTorch Implementation

PyTorch Implementation of Google Brain's WaveGrad 2: Iterative Refinement for Text-to-Speech Synthesis.

Quickstart

Dependencies

You can install the Python dependencies with

pip3 install -r requirements.txt

Inference

You have to download the pretrained models and put them in output/ckpt/LJSpeech/.

For English single-speaker TTS, run

python3 synthesize.py --text "YOUR_DESIRED_TEXT" --restore_step RESTORE_STEP --mode single -p config/LJSpeech/preprocess.yaml -m config/LJSpeech/model.yaml -t config/LJSpeech/train.yaml

The generated utterances will be put in output/result/.

Batch Inference

Batch inference is also supported, try

python3 synthesize.py --source preprocessed_data/LJSpeech/val.txt --restore_step RESTORE_STEP --mode batch -p config/LJSpeech/preprocess.yaml -m config/LJSpeech/model.yaml -t config/LJSpeech/train.yaml

to synthesize all utterances in preprocessed_data/LJSpeech/val.txt

Controllability

The speaking rate of the synthesized utterances can be controlled by specifying the desired duration ratios. For example, one can increase the speaking rate by 20 % by

python3 synthesize.py --text "YOUR_DESIRED_TEXT" --restore_step RESTORE_STEP --mode single -p config/LJSpeech/preprocess.yaml -m config/LJSpeech/model.yaml -t config/LJSpeech/train.yaml --duration_control 0.8

Training

Datasets

The supported datasets are

LJSpeech: a single-speaker English dataset consists of 13100 short audio clips of a female speaker reading passages from 7 non-fiction books, approximately 24 hours in total.

Preprocessing

First, run

python3 prepare_align.py config/LJSpeech/preprocess.yaml

for some preparations.

As described in the paper, Montreal Forced Aligner (MFA) is used to obtain the alignments between the utterances and the phoneme sequences. Alignments for the LJSpeech datasets are provided here (thanks to ming024's FastSpeech2). You have to unzip the files in preprocessed_data/LJSpeech/TextGrid/.

After that, run the preprocessing script by

python3 preprocess.py config/LJSpeech/preprocess.yaml

Alternately, you can align the corpus by yourself. Download the official MFA package and run

./montreal-forced-aligner/bin/mfa_align raw_data/LJSpeech/ lexicon/librispeech-lexicon.txt english preprocessed_data/LJSpeech

or

./montreal-forced-aligner/bin/mfa_train_and_align raw_data/LJSpeech/ lexicon/librispeech-lexicon.txt preprocessed_data/LJSpeech

to align the corpus and then run the preprocessing script.

python3 preprocess.py config/LJSpeech/preprocess.yaml

Training

Train your model with

python3 train.py -p config/LJSpeech/preprocess.yaml -m config/LJSpeech/model.yaml -t config/LJSpeech/train.yaml

TensorBoard

Use

tensorboard --logdir output/log/LJSpeech

to serve TensorBoard on your localhost. The loss curves, synthesized mel-spectrograms, and audios are shown.

Implementation Issues

Use 22050Hz instead of 24KHz and follow general LJSpeech configurations.
No ZoneOutBiLSTM in TextEncoder. Use nn.LSTM instead.
Preprocess text input without silence tokens inserted at word boundaries.

Citation

@misc{lee2021wavegrad2,
  author = {Lee, Keon},
  title = {WaveGrad2},
  year = {2021},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/keonlee9420/WaveGrad2}}
}

References

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 55

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (0) 🔗