All Projects → sovaai → sova-tts-engine

sovaai / sova-tts-engine

Licence: Apache-2.0 license
Tacotron2 based engine for the SOVA-TTS project

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to sova-tts-engine

Tensorflowtts
😝 TensorFlowTTS: Real-Time State-of-the-art Speech Synthesis for Tensorflow 2 (supported including English, French, Korean, Chinese, German and Easy to adapt for other languages)
Stars: ✭ 2,382 (+3680.95%)
Mutual labels:  speech-synthesis, tacotron2
Comprehensive-Tacotron2
PyTorch Implementation of Google's Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions. This implementation supports both single-, multi-speaker TTS and several techniques to enforce the robustness and efficiency of the model.
Stars: ✭ 22 (-65.08%)
Mutual labels:  speech-synthesis, tacotron2
TensorVox
Desktop application for neural speech synthesis written in C++
Stars: ✭ 140 (+122.22%)
Mutual labels:  speech-synthesis, tacotron2
tacotron2
Pytorch implementation of "Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions", ICASSP, 2018.
Stars: ✭ 17 (-73.02%)
Mutual labels:  speech-synthesis, tacotron2
Tacotron pytorch
PyTorch implementation of Tacotron speech synthesis model.
Stars: ✭ 242 (+284.13%)
Mutual labels:  speech-synthesis
Cyclegan Vc2
Voice Conversion by CycleGAN (语音克隆/语音转换): CycleGAN-VC2
Stars: ✭ 158 (+150.79%)
Mutual labels:  speech-synthesis
Tacotron 2
DeepMind's Tacotron-2 Tensorflow implementation
Stars: ✭ 1,968 (+3023.81%)
Mutual labels:  speech-synthesis
Wavenet vocoder
WaveNet vocoder
Stars: ✭ 1,926 (+2957.14%)
Mutual labels:  speech-synthesis
IMS-Toucan
Text-to-Speech Toolkit of the Speech and Language Technologies Group at the University of Stuttgart. Objectives of the development are simplicity, modularity, controllability and multilinguality.
Stars: ✭ 295 (+368.25%)
Mutual labels:  speech-synthesis
voder
An emulation of the Voder Speech Synthesizer.
Stars: ✭ 19 (-69.84%)
Mutual labels:  speech-synthesis
Tacotron
A TensorFlow implementation of Google's Tacotron speech synthesis with pre-trained model (unofficial)
Stars: ✭ 2,581 (+3996.83%)
Mutual labels:  speech-synthesis
Naomi
The Naomi Project is an open source, technology agnostic platform for developing always-on, voice-controlled applications!
Stars: ✭ 171 (+171.43%)
Mutual labels:  speech-synthesis
Wavegrad
Implementation of Google Brain's WaveGrad high-fidelity vocoder (paper: https://arxiv.org/pdf/2009.00713.pdf). First implementation on GitHub.
Stars: ✭ 245 (+288.89%)
Mutual labels:  speech-synthesis
Vocgan
VocGAN: A High-Fidelity Real-time Vocoder with a Hierarchically-nested Adversarial Network
Stars: ✭ 158 (+150.79%)
Mutual labels:  speech-synthesis
idear
🎙️ Handsfree Audio Development Interface
Stars: ✭ 84 (+33.33%)
Mutual labels:  speech-synthesis
Normit
Translations with speech synthesis in your terminal as a node package
Stars: ✭ 219 (+247.62%)
Mutual labels:  speech-synthesis
Neural Voice Cloning With Few Samples
Implementation of Neural Voice Cloning with Few Samples Research Paper by Baidu
Stars: ✭ 211 (+234.92%)
Mutual labels:  speech-synthesis
ttsflow
tensorflow speech synthesis c++ inference for voicenet
Stars: ✭ 17 (-73.02%)
Mutual labels:  speech-synthesis
Universalvocoding
A PyTorch implementation of "Robust Universal Neural Vocoding"
Stars: ✭ 197 (+212.7%)
Mutual labels:  speech-synthesis
GlottDNN
GlottDNN vocoder and tools for training DNN excitation models
Stars: ✭ 30 (-52.38%)
Mutual labels:  speech-synthesis

Tacotron2

The Tacotron2 network is used as the main synthesis engine in the SOVA-TTS project. We took its implementation from NVIDIA, added various improvements that might be found in articles, and made the code more user-friendly.

Key differences:

  1. GST module is added;
  2. Mutual Information Estimator is added (based on the following article and repo);
  3. Added the possibility to include attention loss in the train process (using diagonal or prealigned guidance);
  4. Some work has been done to improve the usability of the code;
  5. Other minor changes and additions.

How to train a new model

First of all you need to install all dependencies (which can be found in the reuqirements.txt) and convert the dataset to the LJ Speech format, where each line contains relative path to the audio file and its text, separated by "|" sign, e.g.:

wavs/000000.wav|С трев+ожным ч+увством бер+усь я з+а пер+о.

Then divide it into two files: the training list (90% of the data) and the validation list (10% of the data).

After that configure the config file as needed (here you can find an explanation of the main fields of the config file), or just use the default one, filling in the values of parameters output_dir (where to save checkpoints), training_files (path to the training list), validation_files (path to the validation list) and audios_path (path to the audio folder, so that together with the relative path to the audio, the full path is obtained).

When everything is ready, launch the training process:

  • in case if you changed hparams.yaml inside the 'data' folder: python train.py
  • in case if you have some other config file: python train.py -p path/to/hparams.yaml
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].