All Projects → ttaoREtw → Tacotron Pytorch

ttaoREtw / Tacotron Pytorch

Licence: mit
A Pytorch Implementation of Tacotron: End-to-end Text-to-speech Deep-Learning Model

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Tacotron Pytorch

WaveGrad2
PyTorch Implementation of Google Brain's WaveGrad 2: Iterative Refinement for Text-to-Speech Synthesis
Stars: ✭ 55 (-47.12%)
Mutual labels:  text-to-speech, end-to-end, speech-synthesis
Wavernn
WaveRNN Vocoder + TTS
Stars: ✭ 1,636 (+1473.08%)
Mutual labels:  speech-synthesis, tacotron, text-to-speech
Comprehensive-Tacotron2
PyTorch Implementation of Google's Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions. This implementation supports both single-, multi-speaker TTS and several techniques to enforce the robustness and efficiency of the model.
Stars: ✭ 22 (-78.85%)
Mutual labels:  text-to-speech, speech-synthesis, tacotron
Tacotron 2
DeepMind's Tacotron-2 Tensorflow implementation
Stars: ✭ 1,968 (+1792.31%)
Mutual labels:  speech-synthesis, tacotron, text-to-speech
Tacotron2
A PyTorch implementation of Tacotron2, an end-to-end text-to-speech(TTS) system described in "Natural TTS Synthesis By Conditioning Wavenet On Mel Spectrogram Predictions".
Stars: ✭ 43 (-58.65%)
Mutual labels:  speech-synthesis, tacotron, text-to-speech
ttslearn
ttslearn: Library for Pythonで学ぶ音声合成 (Text-to-speech with Python)
Stars: ✭ 158 (+51.92%)
Mutual labels:  text-to-speech, speech-synthesis, seq2seq
Openseq2seq
Toolkit for efficient experimentation with Speech Recognition, Text2Speech and NLP
Stars: ✭ 1,378 (+1225%)
Mutual labels:  seq2seq, speech-synthesis, text-to-speech
Espeak
eSpeak NG is an open source speech synthesizer that supports 101 languages and accents.
Stars: ✭ 339 (+225.96%)
Mutual labels:  speech-synthesis, text-to-speech
Voice Builder
An opensource text-to-speech (TTS) voice building tool
Stars: ✭ 362 (+248.08%)
Mutual labels:  speech-synthesis, text-to-speech
Merlin
This is now the official location of the Merlin project.
Stars: ✭ 1,168 (+1023.08%)
Mutual labels:  speech-synthesis, text-to-speech
Rhvoice
a free and open source speech synthesizer for Russian and other languages
Stars: ✭ 750 (+621.15%)
Mutual labels:  speech-synthesis, text-to-speech
Tts
🤖 💬 Deep learning for Text to Speech (Discussion forum: https://discourse.mozilla.org/c/tts)
Stars: ✭ 5,427 (+5118.27%)
Mutual labels:  tacotron, text-to-speech
Multilingual text to speech
An implementation of Tacotron 2 that supports multilingual experiments with parameter-sharing, code-switching, and voice cloning.
Stars: ✭ 324 (+211.54%)
Mutual labels:  speech-synthesis, text-to-speech
Espnet
End-to-End Speech Processing Toolkit
Stars: ✭ 4,533 (+4258.65%)
Mutual labels:  speech-synthesis, end-to-end
Hifi Gan
HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis
Stars: ✭ 325 (+212.5%)
Mutual labels:  speech-synthesis, text-to-speech
Parallelwavegan
Unofficial Parallel WaveGAN (+ MelGAN & Multi-band MelGAN) with Pytorch
Stars: ✭ 682 (+555.77%)
Mutual labels:  speech-synthesis, text-to-speech
Cognitive Speech Tts
Microsoft Text-to-Speech API sample code in several languages, part of Cognitive Services.
Stars: ✭ 312 (+200%)
Mutual labels:  speech-synthesis, text-to-speech
Tts
🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production
Stars: ✭ 305 (+193.27%)
Mutual labels:  tacotron, text-to-speech
Espeak Ng
eSpeak NG is an open source speech synthesizer that supports more than hundred languages and accents.
Stars: ✭ 799 (+668.27%)
Mutual labels:  speech-synthesis, text-to-speech
Lightspeech
LightSpeech: Lightweight and Fast Text to Speech with Neural Architecture Search
Stars: ✭ 31 (-70.19%)
Mutual labels:  speech-synthesis, text-to-speech

A Pytorch Implementation of Tacotron: End-to-end Text-to-speech Deep-Learning Model

Implement google's Tacotron TTS system with pytorch. tacotron

Updates

2018/09/15 => Fix RNN feeding bug.
2018/11/04 => Add attention mask and loss mask.
2019/05/17 => 2nd version updated.
2019/05/28 => fix attention plot bug.

TODO

  • [ ] Add vocoder
  • [ ] Multispeaker version

Requirements

See used_packages.txt.

Usage

  • Data
    Download LJSpeech provided by keithito. It contains 13100 short audio clips of a single speaker. The total length is approximately 24 hrs.

  • Preprocessing

# Generate a directory 'training/' containing extracted features and a new meta file 'ljspeech_meta.txt'
$ python data/preprocess.py --output-dir training \ 
                            --data-dir <WHERE_YOU_PUT_YOUR_DATASET>/LJSpeech-1.1/wavs \
                            --old-meta <WHERE_YOU_PUT_YOUR_DATASET>/LJSpeech-1.1/metadata.csv \
                            --config config/config.yaml
  • Split dataset
# Generate 'meta_train.txt' and 'meta_test.txt' in 'training/'
$ python data/train_test_split.py --meta-all training/ljspeech_meta.txt \ 
                                  --ratio-test 0.1
  • Train
# Start training
$ python main.py --config config/config.yaml \
                 --checkpoint-dir <WHERE_TO_PUT_YOUR_CHECKPOINTS> 

# Continue training
$ python main.py --config config/config.yaml \
                 --checkpoint-dir <WHERE_TO_PUT_YOUR_CHECKPOINTS> \
                 --checkpoint-path <LAST_CHECKPOINT_PATH>
  • Examine the training process
# Scalars : loss curve 
# Audio   : validation wavs
# Images  : validation spectrograms & attentions
$ tensorboard --logdir log
  • Inference
# Generate synthesized speech 
$ python generate_speech.py --text "For example, Taiwan is a great place." \
                            --output <DESIRED_OUTPUT_PATH> \ 
                            --checkpoint-path <CHECKPOINT_PATH> \
                            --config config/config.yaml

Samples

All the samples can be found here. These samples are generated after 102k updates.

Checkpoint

The pretrained model can be downloaded in this link.

Alignment

The proper alignment shows after 10k steps of updating.

Differences from the original Tacotron

  1. Gradient clipping
  2. Noam style learning rate decay (The mechanism that Attention is all you need applies.)

Acknowlegements

This work is based on r9y9's implementation of Tacotron.

Refenrence

  • Tacotron: Towards End-to-End Speech Synthesis [link]
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].