All Projects β†’ as-ideas β†’ Transformertts

as-ideas / Transformertts

Licence: other
πŸ€–πŸ’¬ Transformer TTS: Implementation of a non-autoregressive Transformer based neural network for text to speech.

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Transformertts

Transformer Tts
A Pytorch Implementation of "Neural Speech Synthesis with Transformer Network"
Stars: ✭ 418 (-32.25%)
Mutual labels:  text-to-speech, tts
Xzvoice
Free and open source text-to-speech software
Stars: ✭ 355 (-42.46%)
Mutual labels:  text-to-speech, tts
talkbot
Text-to-speech and translation bot for Discord
Stars: ✭ 27 (-95.62%)
Mutual labels:  text-to-speech, tts
Multilingual text to speech
An implementation of Tacotron 2 that supports multilingual experiments with parameter-sharing, code-switching, and voice cloning.
Stars: ✭ 324 (-47.49%)
Mutual labels:  text-to-speech, tts
Voice Builder
An opensource text-to-speech (TTS) voice building tool
Stars: ✭ 362 (-41.33%)
Mutual labels:  text-to-speech, tts
editts
Official implementation of EdiTTS: Score-based Editing for Controllable Text-to-Speech
Stars: ✭ 74 (-88.01%)
Mutual labels:  text-to-speech, tts
esp32-flite
Speech synthesis running on ESP32 based on Flite engine.
Stars: ✭ 28 (-95.46%)
Mutual labels:  text-to-speech, tts
sam
SAM: Software Automatic Mouth (Ported from https://github.com/vidarh/SAM)
Stars: ✭ 33 (-94.65%)
Mutual labels:  text-to-speech, tts
Tts
πŸΈπŸ’¬ - a deep learning toolkit for Text-to-Speech, battle-tested in research and production
Stars: ✭ 305 (-50.57%)
Mutual labels:  text-to-speech, tts
Parakeet
PAddle PARAllel text-to-speech toolKIT (supporting WaveFlow, WaveNet, Transformer TTS and Tacotron2)
Stars: ✭ 279 (-54.78%)
Mutual labels:  text-to-speech, tts
Tts
πŸ€– πŸ’¬ Deep learning for Text to Speech (Discussion forum: https://discourse.mozilla.org/c/tts)
Stars: ✭ 5,427 (+779.58%)
Mutual labels:  text-to-speech, tts
Cognitive Speech Tts
Microsoft Text-to-Speech API sample code in several languages, part of Cognitive Services.
Stars: ✭ 312 (-49.43%)
Mutual labels:  text-to-speech, tts
speak.awf
An Alfred 3 workflow that uses macOS's TTS (text-to-speech) feature to speak text aloud.
Stars: ✭ 29 (-95.3%)
Mutual labels:  text-to-speech, tts
Hifi Gan
HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis
Stars: ✭ 325 (-47.33%)
Mutual labels:  text-to-speech, tts
Fre-GAN-pytorch
Fre-GAN: Adversarial Frequency-consistent Audio Synthesis
Stars: ✭ 73 (-88.17%)
Mutual labels:  text-to-speech, tts
google-translate-tts
Node library for Google Translate TTS (Text-to-Speech) API
Stars: ✭ 23 (-96.27%)
Mutual labels:  text-to-speech, tts
persian-tts
πŸ”Š A simple human-based text-to-speach synthesiser and ReactNative app for Persian language.
Stars: ✭ 18 (-97.08%)
Mutual labels:  text-to-speech, tts
spokestack-android
Extensible Android mobile voice framework: wakeword, ASR, NLU, and TTS. Easily add voice to any Android app!
Stars: ✭ 52 (-91.57%)
Mutual labels:  text-to-speech, tts
Comprehensive-Tacotron2
PyTorch Implementation of Google's Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions. This implementation supports both single-, multi-speaker TTS and several techniques to enforce the robustness and efficiency of the model.
Stars: ✭ 22 (-96.43%)
Mutual labels:  text-to-speech, tts
Glow Tts
A Generative Flow for Text-to-Speech via Monotonic Alignment Search
Stars: ✭ 284 (-53.97%)
Mutual labels:  text-to-speech, tts



A Text-to-Speech Transformer in TensorFlow 2

Implementation of a non-autoregressive Transformer based neural network for Text-to-Speech (TTS).
This repo is based, among others, on the following papers:

Our pre-trained LJSpeech model is compatible with the pre-trained vocoders:

(older versions are available also for WaveRNN)

For quick inference with these vocoders, checkout the Vocoding branch

Non-Autoregressive

Being non-autoregressive, this Transformer model is:

  • Robust: No repeats and failed attention modes for challenging sentences.
  • Fast: With no autoregression, predictions take a fraction of the time.
  • Controllable: It is possible to control the speed and pitch of the generated utterance.

πŸ”ˆ Samples

Can be found here.

These samples' spectrograms are converted using the pre-trained MelGAN vocoder.

Try it out on Colab:

Open In Colab

Updates

  • 06/20: Added normalisation and pre-trained models compatible with the faster MelGAN vocoder.
  • 11/20: Added pitch prediction. Autoregressive model is now specialized as an Aligner and Forward is now the only TTS model. Changed models architectures. Discontinued WaveRNN support. Improved duration extraction with Dijkstra algorithm.
  • 03/20: Vocoding branch.

πŸ“– Contents

Installation

Make sure you have:

  • Python >= 3.6

Install espeak as phonemizer backend (for macOS use brew):

sudo apt-get install espeak

Then install the rest with pip:

pip install -r requirements.txt

Read the individual scripts for more command line arguments.

Pre-Trained LJSpeech API

Use our pre-trained model (with Griffin-Lim) from command line with

python predict_tts.py -t "Please, say something."

Or in a python script

from data.audio import Audio
from model.factory import tts_ljspeech

model, config = tts_ljspeech()
audio = Audio(config)
out = model.predict('Please, say something.')

# Convert spectrogram to wav (with griffin lim)
wav = audio.reconstruct_waveform(out['mel'].numpy().T)

Dataset

You can directly use LJSpeech to create the training dataset.

Configuration

  • If training on LJSpeech, or if unsure, simply use config/session_paths.yaml to create MelGAN compatible models
    • swap data_config.yaml for data_config_wavernn.yaml to create models compatible with WaveRNN
  • EDIT PATHS: in config/session_paths.yaml edit the paths to point at your dataset and log folders

Custom dataset

Prepare a folder containing your metadata and wav files, for instance

|- dataset_folder/
|   |- metadata.csv
|   |- wavs/
|       |- file1.wav
|       |- ...

if metadata.csv has the following format wav_file_name|transcription you can use the ljspeech preprocessor in data/metadata_readers.py, otherwise add your own under the same file.

Make sure that:

  • the metadata reader function name is the same as data_name field in session_paths.yaml.
  • the metadata file (can be anything) is specified under metadata_path in session_paths.yaml

Training

Change the --config argument based on the configuration of your choice.

Train Aligner Model

Create training dataset

python create_training_data.py --config config/session_paths.yaml

This will populate the training data directory (default transformer_tts_data.ljspeech).

Training

python train_aligner.py --config config/session_paths.yaml

Train TTS Model

Compute alignment dataset

First use the aligner model to create the durations dataset

python extract_durations.py --config config/session_paths.yaml

this will add the durations.<session name> as well as the char-wise pitch folders to the training data directory.

Training

python train_tts.py --config config/session_paths.yaml

Training & Model configuration

  • Training and model settings can be configured in <model>_config.yaml

Resume or restart training

  • To resume training simply use the same configuration files
  • To restart training, delete the weights and/or the logs from the logs folder with the training flag --reset_dir (both) or --reset_logs, --reset_weights

Monitor training

tensorboard --logdir /logs/directory/

Tensorboard Demo

Checkpoint to hdf5 weights [optional]

You can convert the checkpoint files to hdf5 model weights by running

python checkpoints_to_weights.py --config config/session_paths.yaml

Prediction

With training checkpoints

From command line with

python predict_tts.py -t "Please, say something." --config config/session_paths.yaml

Or in a python script

from utils.config_manager import Config
from data.audio import Audio

config_loader = Config(config_path=f'config/session_paths.yaml')
audio = Audio(config_loader.config)
model = config_loader.load_model() # optional: can specify checkpoint name
out = model.predict('Please, say something.')

# Convert spectrogram to wav (with griffin lim)
wav = audio.reconstruct_waveform(out['mel'].numpy().T)

With model weights

From command line with

python predict_tts.py -t "Please, say something." -c config/session_paths.yaml -w path/to/model_weights.hdf5

Or in a python script

from data.audio import Audio
from model.factory import tts_custom

model, config = tts_custom(config_path='path/to/config.yaml', 
                           weights_path='path/to/weights.hdf5')
audio = Audio(config)
out = model.predict('Please, say something.')

# Convert spectrogram to wav (with griffin lim)
wav = audio.reconstruct_waveform(out['mel'].numpy().T)

Model Weights

Model URL Commit Vocoder Commit
ljspeech_tts_model (latest) 0cd7d33 aca5990
ljspeech_melgan_forward_model 1c1cb03 aca5990
ljspeech_melgan_autoregressive_model_v2 1c1cb03 aca5990
ljspeech_wavernn_forward_model 1c1cb03 3595219
ljspeech_wavernn_autoregressive_model_v2 1c1cb03 3595219
ljspeech_wavernn_forward_model d9ccee6 3595219
ljspeech_wavernn_autoregressive_model_v2 d9ccee6 3595219
ljspeech_wavernn_autoregressive_model_v1 2f3a1b5 3595219

Maintainers

Special thanks

MelGAN and WaveRNN: data normalization and samples' vocoders are from these repos.

Erogol and the Mozilla TTS team for the lively exchange on the topic.

Copyright

See LICENSE for details.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].