All Projects → kaituoxu → Tacotron2

kaituoxu / Tacotron2

A PyTorch implementation of Tacotron2, an end-to-end text-to-speech(TTS) system described in "Natural TTS Synthesis By Conditioning Wavenet On Mel Spectrogram Predictions".

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Tacotron2

Wavernn
WaveRNN Vocoder + TTS
Stars: ✭ 1,636 (+3704.65%)
Mutual labels:  speech-synthesis, tacotron, text-to-speech
Tacotron Pytorch
A Pytorch Implementation of Tacotron: End-to-end Text-to-speech Deep-Learning Model
Stars: ✭ 104 (+141.86%)
Mutual labels:  speech-synthesis, tacotron, text-to-speech
Comprehensive-Tacotron2
PyTorch Implementation of Google's Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions. This implementation supports both single-, multi-speaker TTS and several techniques to enforce the robustness and efficiency of the model.
Stars: ✭ 22 (-48.84%)
Mutual labels:  text-to-speech, speech-synthesis, tacotron
Tacotron 2
DeepMind's Tacotron-2 Tensorflow implementation
Stars: ✭ 1,968 (+4476.74%)
Mutual labels:  speech-synthesis, tacotron, text-to-speech
Nnmnkwii
Library to build speech synthesis systems designed for easy and fast prototyping.
Stars: ✭ 308 (+616.28%)
Mutual labels:  speech-synthesis, text-to-speech
Glow Tts
A Generative Flow for Text-to-Speech via Monotonic Alignment Search
Stars: ✭ 284 (+560.47%)
Mutual labels:  speech-synthesis, text-to-speech
Wsay
Windows "say"
Stars: ✭ 36 (-16.28%)
Mutual labels:  speech-synthesis, text-to-speech
Tts
🤖 💬 Deep learning for Text to Speech (Discussion forum: https://discourse.mozilla.org/c/tts)
Stars: ✭ 5,427 (+12520.93%)
Mutual labels:  tacotron, text-to-speech
Tacotron pytorch
Tacotron implementation of pytorch
Stars: ✭ 12 (-72.09%)
Mutual labels:  speech-synthesis, tacotron
Hifi Gan
HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis
Stars: ✭ 325 (+655.81%)
Mutual labels:  speech-synthesis, text-to-speech
Espeak Ng
eSpeak NG is an open source speech synthesizer that supports more than hundred languages and accents.
Stars: ✭ 799 (+1758.14%)
Mutual labels:  speech-synthesis, text-to-speech
Parakeet
PAddle PARAllel text-to-speech toolKIT (supporting WaveFlow, WaveNet, Transformer TTS and Tacotron2)
Stars: ✭ 279 (+548.84%)
Mutual labels:  speech-synthesis, text-to-speech
Tts
🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production
Stars: ✭ 305 (+609.3%)
Mutual labels:  tacotron, text-to-speech
Cognitive Speech Tts
Microsoft Text-to-Speech API sample code in several languages, part of Cognitive Services.
Stars: ✭ 312 (+625.58%)
Mutual labels:  speech-synthesis, text-to-speech
esp32-flite
Speech synthesis running on ESP32 based on Flite engine.
Stars: ✭ 28 (-34.88%)
Mutual labels:  text-to-speech, speech-synthesis
Multilingual text to speech
An implementation of Tacotron 2 that supports multilingual experiments with parameter-sharing, code-switching, and voice cloning.
Stars: ✭ 324 (+653.49%)
Mutual labels:  speech-synthesis, text-to-speech
Voice Builder
An opensource text-to-speech (TTS) voice building tool
Stars: ✭ 362 (+741.86%)
Mutual labels:  speech-synthesis, text-to-speech
Lightspeech
LightSpeech: Lightweight and Fast Text to Speech with Neural Architecture Search
Stars: ✭ 31 (-27.91%)
Mutual labels:  speech-synthesis, text-to-speech
Rhvoice
a free and open source speech synthesizer for Russian and other languages
Stars: ✭ 750 (+1644.19%)
Mutual labels:  speech-synthesis, text-to-speech
leon
🧠 Leon is your open-source personal assistant.
Stars: ✭ 8,560 (+19806.98%)
Mutual labels:  text-to-speech, speech-synthesis

Tacotron 2

A PyTorch implementation of Tacotron2, described in Natural TTS Synthesis By Conditioning Wavenet On Mel Spectrogram Predictions, an end-to-end text-to-speech(TTS) neural network architecture, which directly converts character text sequence to speech.

Install

  • Python3.6+ (Recommend Anaconda)
  • PyTorch 0.4.1+
  • pip install -r requirements.txt
  • If you want to run egs/ljspeech/run.sh, download LJ Speech Dataset for free.

Usage

Quick start

$ cd egs/ljspeech
# Modify wav_dir to your LJ Speech dir
$ bash run.sh

That's all.

You can change parameter by $ bash run.sh --parameter_name parameter_value, egs, $ bash run.sh --stage 2. See parameter name in egs/ljspeech/run.sh before . utils/parse_options.sh.

Workflow

Workflow of egs/ljspeech/run.sh:

  • Stage 1: Training
  • Stage 2: Synthesising

More detail

egs/ljspeech/run.sh provide example usage.

# Set PATH and PYTHONPATH
$ cd egs/ljspeech/; . ./path.sh
# Train:
$ train.py -h
# Synthesis audio:
$ synthesis.py -h

How to visualize loss?

If you want to visualize your loss, you can use visdom to do that:

  1. Open a new terminal in your remote server (recommend tmux) and run $ visdom
  2. Open a new terminal and run $ bash run.sh --visdom 1 --visdom_id "<any-string>" or $ train.py ... --visdom 1 --vidsdom_id "<any-string>"
  3. Open your browser and type <your-remote-server-ip>:8097, egs, 127.0.0.1:8097
  4. In visdom website, chose <any-string> in Environment to see your loss loss

How to resume training?

$ bash run.sh --continue_from <model-path>

How to use multi-GPU?

Use comma separated gpu-id sequence, such as:

$ bash run.sh --id "0,1"

How to solve out of memory?

  • When happened in training, try to reduce batch_size or use more GPU. $ bash run.sh --batch_size <lower-value> or $ bash run.sh --id "0,1".

Reference and Resource

NOTE

This is a work in progress and any contribution is welcome (dev branch is main development branch).

I implement feature prediction network + Griffin-Lim to synthesis speech now.

Attention and synthesised audio on 37k iterations: attn spec

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].