Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → BogiHsu → Tacotron2-PyTorch

BogiHsu / Tacotron2-PyTorch

Licence: MIT license

Yet another PyTorch implementation of Tacotron 2 with reduction factor and faster training speed.

Programming Languages

139335 projects - #7 most used programming language

Jupyter Notebook

11667 projects

Labels

text-to-speech pytorch tts pretrained-models tacotron ljspeech tacotron2-pytorch tacotron2 reduction-factor

Projects that are alternatives of or similar to Tacotron2-PyTorch

Comprehensive-Tacotron2

PyTorch Implementation of Google's Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions. This implementation supports both single-, multi-speaker TTS and several techniques to enforce the robustness and efficiency of the model.

Stars: ✭ 22 (-81.36%)

Mutual labels: text-to-speech, tts, tacotron, tacotron2, reduction-factor

🤖 💬 Deep learning for Text to Speech (Discussion forum: https://discourse.mozilla.org/c/tts)

Stars: ✭ 5,427 (+4499.15%)

Mutual labels: text-to-speech, tts, tacotron, tacotron2

Multispeaker & Emotional TTS based on Tacotron 2 and Waveglow

Stars: ✭ 102 (-13.56%)

Mutual labels: tts, tacotron, tacotron2-pytorch, tacotron2

Desktop application for neural speech synthesis written in C++

Stars: ✭ 140 (+18.64%)

Mutual labels: text-to-speech, tts, tacotron2

🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production

Stars: ✭ 305 (+158.47%)

Mutual labels: text-to-speech, tts, tacotron

WaveRNN Vocoder + TTS

Stars: ✭ 1,636 (+1286.44%)

Mutual labels: text-to-speech, tts, tacotron

😝 TensorFlowTTS: Real-Time State-of-the-art Speech Synthesis for Tensorflow 2 (supported including English, French, Korean, Chinese, German and Easy to adapt for other languages)

Stars: ✭ 2,382 (+1918.64%)

Mutual labels: text-to-speech, tts, tacotron2

Tacotron Pytorch

Pytorch implementation of Tacotron

Stars: ✭ 189 (+60.17%)

Mutual labels: text-to-speech, tts, tacotron

WIP Tensorflow implementation of https://github.com/mozilla/TTS

Stars: ✭ 14 (-88.14%)

Mutual labels: text-to-speech, tts, tacotron

A Text to Speech Reader Front-end that Reads from the Clipboard and with Exceptionable Features

Stars: ✭ 16 (-86.44%)

Mutual labels: text-to-speech, tts

EMPHASIS-pytorch

EMPHASIS: An Emotional Phoneme-based Acoustic Model for Speech Synthesis System

Stars: ✭ 15 (-87.29%)

Mutual labels: text-to-speech, tts

Text-to-Speach golang package based in Amazon Polly service

Stars: ✭ 19 (-83.9%)

Mutual labels: text-to-speech, tts

PyTorch Implementation of Google Brain's WaveGrad 2: Iterative Refinement for Text-to-Speech Synthesis

Stars: ✭ 55 (-53.39%)

Mutual labels: text-to-speech, tts

Ukrainian TTS (text-to-speech) using Coqui TTS

Stars: ✭ 74 (-37.29%)

Mutual labels: text-to-speech, tts

PyTorch Implementation of Daft-Exprt: Robust Prosody Transfer Across Speakers for Expressive Speech Synthesis

Stars: ✭ 41 (-65.25%)

Mutual labels: text-to-speech, tts

Multi-Speaker Pytorch FastSpeech2: Fast and High-Quality End-to-End Text to Speech ✊

Stars: ✭ 64 (-45.76%)

Mutual labels: text-to-speech, tts

Official repository of STYLER: Style Factor Modeling with Rapidity and Robustness via Speech Decomposition for Expressive and Controllable Neural Text to Speech, INTERSPEECH 2021

Stars: ✭ 105 (-11.02%)

Mutual labels: text-to-speech, tts

macOS CLI for changing the default TTS (text-to-speech) voice and printing information about and speaking text with multiple voices.

Stars: ✭ 53 (-55.08%)

Mutual labels: text-to-speech, tts

Cross-Speaker-Emotion-Transfer

PyTorch Implementation of ByteDance's Cross-speaker Emotion Transfer Based on Speaker Condition Layer Normalization and Semi-Supervised Training in Text-To-Speech

Stars: ✭ 107 (-9.32%)

Mutual labels: text-to-speech, tts

PyTorch Implementation of VAENAR-TTS: Variational Auto-Encoder based Non-AutoRegressive Text-to-Speech Synthesis.

Stars: ✭ 66 (-44.07%)

Mutual labels: text-to-speech, tts

View All Similar Projects ➔

Tacotron2-PyTorch

Yet another PyTorch implementation of Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions. The project is highly based on these. I made some modification to improve speed and performance of both training and inference.

TODO

Add Colab demo.
Update README.
Upload pretrained models.
Compatible with WaveGlow and Hifi-GAN.

Requirements

Python >= 3.5.2
torch >= 1.0.0
numpy
scipy
pillow
inflect
librosa
Unidecode
matplotlib
tensorboardX

Preprocessing

Currently only support LJ Speech. You can modify hparams.py for different sampling rates. prep decides whether to preprocess all utterances before training or online preprocess. pth sepecifies the path to store preprocessed data.

Training

For training Tacotron2, run the following command.

python3 train.py \
    --data_dir=<dir/to/dataset> \
    --ckpt_dir=<dir/to/models>

If you have multiple GPUs, try distributed.launch.

python -m torch.distributed.launch --nproc_per_node <NUM_GPUS> train.py \
    --data_dir=<dir/to/dataset> \
    --ckpt_dir=<dir/to/models>

Note that the training batch size will become <NUM_GPUS> times larger.

For training using a pretrained model, run the following command.

python3 train.py \
    --data_dir=<dir/to/dataset> \
    --ckpt_dir=<dir/to/models> \
    --ckpt_pth=<pth/to/pretrained/model>

For using Tensorboard (optional), run the following command.

python3 train.py \
    --data_dir=<dir/to/dataset> \
    --ckpt_dir=<dir/to/models> \
    --log_dir=<dir/to/logs>

You can find alinment images and synthesized audio clips during training. The text to synthesize can be set in hparams.py.

Inference

For synthesizing wav files, run the following command.

python3 inference.py \
    --ckpt_pth=<pth/to/model> \
    --img_pth=<pth/to/save/alignment> \
    --npy_pth=<pth/to/save/mel> \
    --wav_pth=<pth/to/save/wav> \
    --text=<text/to/synthesize>

Pretrained Model

You can download pretrained models from Realeases. The hyperparameter for training is also in the directory. All the models were trained using 8 GPUs.

Vocoder

A vocoder is not implemented. But the model is compatible with WaveGlow and Hifi-GAN. Check the Colab demo for more information.

References

This project is highly based on the works below.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 118

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (1) 🔗