Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → sovaai → sova-tts-engine

sovaai / sova-tts-engine

Licence: Apache-2.0 license

Tacotron2 based engine for the SOVA-TTS project

Programming Languages

139335 projects - #7 most used programming language

Labels

speech-synthesis tacotron2

Projects that are alternatives of or similar to sova-tts-engine

😝 TensorFlowTTS: Real-Time State-of-the-art Speech Synthesis for Tensorflow 2 (supported including English, French, Korean, Chinese, German and Easy to adapt for other languages)

Stars: ✭ 2,382 (+3680.95%)

Mutual labels: speech-synthesis, tacotron2

Comprehensive-Tacotron2

PyTorch Implementation of Google's Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions. This implementation supports both single-, multi-speaker TTS and several techniques to enforce the robustness and efficiency of the model.

Stars: ✭ 22 (-65.08%)

Mutual labels: speech-synthesis, tacotron2

Desktop application for neural speech synthesis written in C++

Stars: ✭ 140 (+122.22%)

Mutual labels: speech-synthesis, tacotron2

Pytorch implementation of "Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions", ICASSP, 2018.

Stars: ✭ 17 (-73.02%)

Mutual labels: speech-synthesis, tacotron2

Tacotron pytorch

PyTorch implementation of Tacotron speech synthesis model.

Stars: ✭ 242 (+284.13%)

Mutual labels: speech-synthesis

Voice Conversion by CycleGAN (语音克隆/语音转换): CycleGAN-VC2

Stars: ✭ 158 (+150.79%)

Mutual labels: speech-synthesis

DeepMind's Tacotron-2 Tensorflow implementation

Stars: ✭ 1,968 (+3023.81%)

Mutual labels: speech-synthesis

Wavenet vocoder

WaveNet vocoder

Stars: ✭ 1,926 (+2957.14%)

Mutual labels: speech-synthesis

Text-to-Speech Toolkit of the Speech and Language Technologies Group at the University of Stuttgart. Objectives of the development are simplicity, modularity, controllability and multilinguality.

Stars: ✭ 295 (+368.25%)

Mutual labels: speech-synthesis

An emulation of the Voder Speech Synthesizer.

Stars: ✭ 19 (-69.84%)

Mutual labels: speech-synthesis

A TensorFlow implementation of Google's Tacotron speech synthesis with pre-trained model (unofficial)

Stars: ✭ 2,581 (+3996.83%)

Mutual labels: speech-synthesis

The Naomi Project is an open source, technology agnostic platform for developing always-on, voice-controlled applications!

Stars: ✭ 171 (+171.43%)

Mutual labels: speech-synthesis

Implementation of Google Brain's WaveGrad high-fidelity vocoder (paper: https://arxiv.org/pdf/2009.00713.pdf). First implementation on GitHub.

Stars: ✭ 245 (+288.89%)

Mutual labels: speech-synthesis

VocGAN: A High-Fidelity Real-time Vocoder with a Hierarchically-nested Adversarial Network

Stars: ✭ 158 (+150.79%)

Mutual labels: speech-synthesis

🎙️ Handsfree Audio Development Interface

Stars: ✭ 84 (+33.33%)

Mutual labels: speech-synthesis

Translations with speech synthesis in your terminal as a node package

Stars: ✭ 219 (+247.62%)

Mutual labels: speech-synthesis

Neural Voice Cloning With Few Samples

Implementation of Neural Voice Cloning with Few Samples Research Paper by Baidu

Stars: ✭ 211 (+234.92%)

Mutual labels: speech-synthesis

tensorflow speech synthesis c++ inference for voicenet

Stars: ✭ 17 (-73.02%)

Mutual labels: speech-synthesis

Universalvocoding

A PyTorch implementation of "Robust Universal Neural Vocoding"

Stars: ✭ 197 (+212.7%)

Mutual labels: speech-synthesis

GlottDNN vocoder and tools for training DNN excitation models

Stars: ✭ 30 (-52.38%)

Mutual labels: speech-synthesis

View All Similar Projects ➔

Tacotron2

The Tacotron2 network is used as the main synthesis engine in the SOVA-TTS project. We took its implementation from NVIDIA, added various improvements that might be found in articles, and made the code more user-friendly.

Key differences:

GST module is added;
Mutual Information Estimator is added (based on the following article and repo);
Added the possibility to include attention loss in the train process (using diagonal or prealigned guidance);
Some work has been done to improve the usability of the code;
Other minor changes and additions.

How to train a new model

First of all you need to install all dependencies (which can be found in the reuqirements.txt) and convert the dataset to the LJ Speech format, where each line contains relative path to the audio file and its text, separated by "|" sign, e.g.:

wavs/000000.wav|С трев+ожным ч+увством бер+усь я з+а пер+о.

Then divide it into two files: the training list (90% of the data) and the validation list (10% of the data).

After that configure the config file as needed (here you can find an explanation of the main fields of the config file), or just use the default one, filling in the values of parameters output_dir (where to save checkpoints), training_files (path to the training list), validation_files (path to the validation list) and audios_path (path to the audio folder, so that together with the relative path to the audio, the full path is obtained).

When everything is ready, launch the training process:

in case if you changed hparams.yaml inside the 'data' folder: python train.py
in case if you have some other config file: python train.py -p path/to/hparams.yaml

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 63

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (2) 🔗