All Projects β†’ BogiHsu β†’ Tacotron2-PyTorch

BogiHsu / Tacotron2-PyTorch

Licence: MIT license
Yet another PyTorch implementation of Tacotron 2 with reduction factor and faster training speed.

Programming Languages

python
139335 projects - #7 most used programming language
Jupyter Notebook
11667 projects

Projects that are alternatives of or similar to Tacotron2-PyTorch

Comprehensive-Tacotron2
PyTorch Implementation of Google's Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions. This implementation supports both single-, multi-speaker TTS and several techniques to enforce the robustness and efficiency of the model.
Stars: ✭ 22 (-81.36%)
Mutual labels:  text-to-speech, tts, tacotron, tacotron2, reduction-factor
Tts
πŸ€– πŸ’¬ Deep learning for Text to Speech (Discussion forum: https://discourse.mozilla.org/c/tts)
Stars: ✭ 5,427 (+4499.15%)
Mutual labels:  text-to-speech, tts, tacotron, tacotron2
tacotron2
Multispeaker & Emotional TTS based on Tacotron 2 and Waveglow
Stars: ✭ 102 (-13.56%)
Mutual labels:  tts, tacotron, tacotron2-pytorch, tacotron2
TensorVox
Desktop application for neural speech synthesis written in C++
Stars: ✭ 140 (+18.64%)
Mutual labels:  text-to-speech, tts, tacotron2
Tts
πŸΈπŸ’¬ - a deep learning toolkit for Text-to-Speech, battle-tested in research and production
Stars: ✭ 305 (+158.47%)
Mutual labels:  text-to-speech, tts, tacotron
Wavernn
WaveRNN Vocoder + TTS
Stars: ✭ 1,636 (+1286.44%)
Mutual labels:  text-to-speech, tts, tacotron
Tensorflowtts
😝 TensorFlowTTS: Real-Time State-of-the-art Speech Synthesis for Tensorflow 2 (supported including English, French, Korean, Chinese, German and Easy to adapt for other languages)
Stars: ✭ 2,382 (+1918.64%)
Mutual labels:  text-to-speech, tts, tacotron2
Tacotron Pytorch
Pytorch implementation of Tacotron
Stars: ✭ 189 (+60.17%)
Mutual labels:  text-to-speech, tts, tacotron
TTS tf
WIP Tensorflow implementation of https://github.com/mozilla/TTS
Stars: ✭ 14 (-88.14%)
Mutual labels:  text-to-speech, tts, tacotron
JSpeak
A Text to Speech Reader Front-end that Reads from the Clipboard and with Exceptionable Features
Stars: ✭ 16 (-86.44%)
Mutual labels:  text-to-speech, tts
EMPHASIS-pytorch
EMPHASIS: An Emotional Phoneme-based Acoustic Model for Speech Synthesis System
Stars: ✭ 15 (-87.29%)
Mutual labels:  text-to-speech, tts
golang-tts
Text-to-Speach golang package based in Amazon Polly service
Stars: ✭ 19 (-83.9%)
Mutual labels:  text-to-speech, tts
WaveGrad2
PyTorch Implementation of Google Brain's WaveGrad 2: Iterative Refinement for Text-to-Speech Synthesis
Stars: ✭ 55 (-53.39%)
Mutual labels:  text-to-speech, tts
ukrainian-tts
Ukrainian TTS (text-to-speech) using Coqui TTS
Stars: ✭ 74 (-37.29%)
Mutual labels:  text-to-speech, tts
Daft-Exprt
PyTorch Implementation of Daft-Exprt: Robust Prosody Transfer Across Speakers for Expressive Speech Synthesis
Stars: ✭ 41 (-65.25%)
Mutual labels:  text-to-speech, tts
FastSpeech2
Multi-Speaker Pytorch FastSpeech2: Fast and High-Quality End-to-End Text to Speech ✊
Stars: ✭ 64 (-45.76%)
Mutual labels:  text-to-speech, tts
STYLER
Official repository of STYLER: Style Factor Modeling with Rapidity and Robustness via Speech Decomposition for Expressive and Controllable Neural Text to Speech, INTERSPEECH 2021
Stars: ✭ 105 (-11.02%)
Mutual labels:  text-to-speech, tts
voices
macOS CLI for changing the default TTS (text-to-speech) voice and printing information about and speaking text with multiple voices.
Stars: ✭ 53 (-55.08%)
Mutual labels:  text-to-speech, tts
Cross-Speaker-Emotion-Transfer
PyTorch Implementation of ByteDance's Cross-speaker Emotion Transfer Based on Speaker Condition Layer Normalization and Semi-Supervised Training in Text-To-Speech
Stars: ✭ 107 (-9.32%)
Mutual labels:  text-to-speech, tts
VAENAR-TTS
PyTorch Implementation of VAENAR-TTS: Variational Auto-Encoder based Non-AutoRegressive Text-to-Speech Synthesis.
Stars: ✭ 66 (-44.07%)
Mutual labels:  text-to-speech, tts

Tacotron2-PyTorch

Yet another PyTorch implementation of Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions. The project is highly based on these. I made some modification to improve speed and performance of both training and inference.

TODO

  • Add Colab demo.
  • Update README.
  • Upload pretrained models.
  • Compatible with WaveGlow and Hifi-GAN.

Requirements

  • Python >= 3.5.2
  • torch >= 1.0.0
  • numpy
  • scipy
  • pillow
  • inflect
  • librosa
  • Unidecode
  • matplotlib
  • tensorboardX

Preprocessing

Currently only support LJ Speech. You can modify hparams.py for different sampling rates. prep decides whether to preprocess all utterances before training or online preprocess. pth sepecifies the path to store preprocessed data.

Training

  1. For training Tacotron2, run the following command.
python3 train.py \
    --data_dir=<dir/to/dataset> \
    --ckpt_dir=<dir/to/models>
  1. If you have multiple GPUs, try distributed.launch.
python -m torch.distributed.launch --nproc_per_node <NUM_GPUS> train.py \
    --data_dir=<dir/to/dataset> \
    --ckpt_dir=<dir/to/models>

Note that the training batch size will become <NUM_GPUS> times larger.

  1. For training using a pretrained model, run the following command.
python3 train.py \
    --data_dir=<dir/to/dataset> \
    --ckpt_dir=<dir/to/models> \
    --ckpt_pth=<pth/to/pretrained/model>
  1. For using Tensorboard (optional), run the following command.
python3 train.py \
    --data_dir=<dir/to/dataset> \
    --ckpt_dir=<dir/to/models> \
    --log_dir=<dir/to/logs>

You can find alinment images and synthesized audio clips during training. The text to synthesize can be set in hparams.py.

Inference

  • For synthesizing wav files, run the following command.
python3 inference.py \
    --ckpt_pth=<pth/to/model> \
    --img_pth=<pth/to/save/alignment> \
    --npy_pth=<pth/to/save/mel> \
    --wav_pth=<pth/to/save/wav> \
    --text=<text/to/synthesize>

Pretrained Model

You can download pretrained models from Realeases. The hyperparameter for training is also in the directory. All the models were trained using 8 GPUs.

Vocoder

A vocoder is not implemented. But the model is compatible with WaveGlow and Hifi-GAN. Check the Colab demo for more information. Open In Colab

References

This project is highly based on the works below.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].