Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

PyTorch Implementation of Google's Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions. This implementation supports both single-, multi-speaker TTS and several techniques to enforce the robustness and efficiency of the model.

Stars: ✭ 22 (-78.85%)

Mutual labels: text-to-speech, speech-synthesis, tacotron

Tacotron 2

DeepMind's Tacotron-2 Tensorflow implementation

Stars: ✭ 1,968 (+1792.31%)

Mutual labels: speech-synthesis, tacotron, text-to-speech

Tacotron2

A PyTorch implementation of Tacotron2, an end-to-end text-to-speech(TTS) system described in "Natural TTS Synthesis By Conditioning Wavenet On Mel Spectrogram Predictions".

Stars: ✭ 43 (-58.65%)

Mutual labels: speech-synthesis, tacotron, text-to-speech

ttslearn

ttslearn: Library for Pythonで学ぶ音声合成 (Text-to-speech with Python)

Stars: ✭ 158 (+51.92%)

Mutual labels: text-to-speech, speech-synthesis, seq2seq

Openseq2seq

Toolkit for efficient experimentation with Speech Recognition, Text2Speech and NLP

Stars: ✭ 1,378 (+1225%)

Mutual labels: seq2seq, speech-synthesis, text-to-speech

Espeak

eSpeak NG is an open source speech synthesizer that supports 101 languages and accents.

Stars: ✭ 339 (+225.96%)

Mutual labels: speech-synthesis, text-to-speech

Voice Builder

An opensource text-to-speech (TTS) voice building tool

Stars: ✭ 362 (+248.08%)

Mutual labels: speech-synthesis, text-to-speech

Merlin

This is now the official location of the Merlin project.

Stars: ✭ 1,168 (+1023.08%)

Mutual labels: speech-synthesis, text-to-speech

Rhvoice

a free and open source speech synthesizer for Russian and other languages

Stars: ✭ 750 (+621.15%)

Mutual labels: speech-synthesis, text-to-speech

Tts

🤖 💬 Deep learning for Text to Speech (Discussion forum: https://discourse.mozilla.org/c/tts)

Stars: ✭ 5,427 (+5118.27%)

Mutual labels: tacotron, text-to-speech

Multilingual text to speech

An implementation of Tacotron 2 that supports multilingual experiments with parameter-sharing, code-switching, and voice cloning.

Stars: ✭ 324 (+211.54%)

Mutual labels: speech-synthesis, text-to-speech

Espnet

End-to-End Speech Processing Toolkit

Stars: ✭ 4,533 (+4258.65%)

Mutual labels: speech-synthesis, end-to-end

Hifi Gan

HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis

Stars: ✭ 325 (+212.5%)

Mutual labels: speech-synthesis, text-to-speech

Parallelwavegan

Unofficial Parallel WaveGAN (+ MelGAN & Multi-band MelGAN) with Pytorch

Stars: ✭ 682 (+555.77%)

Mutual labels: speech-synthesis, text-to-speech

Cognitive Speech Tts

Microsoft Text-to-Speech API sample code in several languages, part of Cognitive Services.

Stars: ✭ 312 (+200%)

Mutual labels: speech-synthesis, text-to-speech

Tts

🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production

Stars: ✭ 305 (+193.27%)

Mutual labels: tacotron, text-to-speech

Espeak Ng

eSpeak NG is an open source speech synthesizer that supports more than hundred languages and accents.

Stars: ✭ 799 (+668.27%)

Mutual labels: speech-synthesis, text-to-speech

Lightspeech

LightSpeech: Lightweight and Fast Text to Speech with Neural Architecture Search

Stars: ✭ 31 (-70.19%)

Mutual labels: speech-synthesis, text-to-speech

View All Similar Projects ➔

A Pytorch Implementation of Tacotron: End-to-end Text-to-speech Deep-Learning Model

Implement google's Tacotron TTS system with pytorch.

Updates

2018/09/15 => Fix RNN feeding bug.
2018/11/04 => Add attention mask and loss mask.
2019/05/17 => 2nd version updated.
2019/05/28 => fix attention plot bug.

TODO

[ ] Add vocoder
[ ] Multispeaker version

Requirements

See used_packages.txt.

Usage

Data
Download LJSpeech provided by keithito. It contains 13100 short audio clips of a single speaker. The total length is approximately 24 hrs.
Preprocessing

# Generate a directory 'training/' containing extracted features and a new meta file 'ljspeech_meta.txt'
$ python data/preprocess.py --output-dir training \ 
                            --data-dir <WHERE_YOU_PUT_YOUR_DATASET>/LJSpeech-1.1/wavs \
                            --old-meta <WHERE_YOU_PUT_YOUR_DATASET>/LJSpeech-1.1/metadata.csv \
                            --config config/config.yaml

Split dataset

# Generate 'meta_train.txt' and 'meta_test.txt' in 'training/'
$ python data/train_test_split.py --meta-all training/ljspeech_meta.txt \ 
                                  --ratio-test 0.1

Train

# Start training
$ python main.py --config config/config.yaml \
                 --checkpoint-dir <WHERE_TO_PUT_YOUR_CHECKPOINTS> 

# Continue training
$ python main.py --config config/config.yaml \
                 --checkpoint-dir <WHERE_TO_PUT_YOUR_CHECKPOINTS> \
                 --checkpoint-path <LAST_CHECKPOINT_PATH>

Examine the training process

# Scalars : loss curve 
# Audio   : validation wavs
# Images  : validation spectrograms & attentions
$ tensorboard --logdir log

Inference

# Generate synthesized speech 
$ python generate_speech.py --text "For example, Taiwan is a great place." \
                            --output <DESIRED_OUTPUT_PATH> \ 
                            --checkpoint-path <CHECKPOINT_PATH> \
                            --config config/config.yaml

Samples

All the samples can be found here. These samples are generated after 102k updates.

Checkpoint

The pretrained model can be downloaded in this link.

Alignment

The proper alignment shows after 10k steps of updating.

Differences from the original Tacotron

Gradient clipping
Noam style learning rate decay (The mechanism that Attention is all you need applies.)

Acknowlegements

This work is based on r9y9's implementation of Tacotron.

Refenrence

Tacotron: Towards End-to-End Speech Synthesis [link]

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 104

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (0) 🔗