Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → tiberiu44 → Tts Cube

tiberiu44 / Tts Cube

Licence: apache-2.0

End-2-end speech synthesis with recurrent neural networks

Programming Languages

python

139335 projects - #7 most used programming language

Labels

neural-network lstm speech text-to-speech synthesis character neural

Projects that are alternatives of or similar to Tts Cube

Lightspeech

LightSpeech: Lightweight and Fast Text to Speech with Neural Architecture Search

Stars: ✭ 31 (-85.45%)

Mutual labels: speech, text-to-speech

Durian

Implementation of "Duration Informed Attention Network for Multimodal Synthesis" (https://arxiv.org/pdf/1909.01700.pdf) paper.

Stars: ✭ 111 (-47.89%)

Mutual labels: speech, text-to-speech

Wsay

Windows "say"

Stars: ✭ 36 (-83.1%)

Mutual labels: speech, text-to-speech

Tts

🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production

Stars: ✭ 305 (+43.19%)

Mutual labels: speech, text-to-speech

Esp8266sam

Speech synthesis for ESP8266 using S.A.M. port

Stars: ✭ 199 (-6.57%)

Mutual labels: speech, synthesis

Cboard

AAC communication system with text-to-speech for the browser

Stars: ✭ 437 (+105.16%)

Mutual labels: speech, text-to-speech

Gtts

Python library and CLI tool to interface with Google Translate's text-to-speech API

Stars: ✭ 1,303 (+511.74%)

Mutual labels: speech, text-to-speech

Fre-GAN-pytorch

Fre-GAN: Adversarial Frequency-consistent Audio Synthesis

Stars: ✭ 73 (-65.73%)

Mutual labels: text-to-speech, speech

Diffwave

DiffWave is a fast, high-quality neural vocoder and waveform synthesizer.

Stars: ✭ 139 (-34.74%)

Mutual labels: speech, text-to-speech

Voc

A physical model of the human vocal tract using literate programming, based on Pink Trombone.

Stars: ✭ 129 (-39.44%)

Mutual labels: speech, synthesis

Voice Builder

An opensource text-to-speech (TTS) voice building tool

Stars: ✭ 362 (+69.95%)

Mutual labels: speech, text-to-speech

Aeneas

aeneas is a Python/C library and a set of tools to automagically synchronize audio and text (aka forced alignment)

Stars: ✭ 1,942 (+811.74%)

Mutual labels: speech, text-to-speech

Tts

🤖 💬 Deep learning for Text to Speech (Discussion forum: https://discourse.mozilla.org/c/tts)

Stars: ✭ 5,427 (+2447.89%)

Mutual labels: speech, text-to-speech

Vad

Voice activity detection (VAD) toolkit including DNN, bDNN, LSTM and ACAM based VAD. We also provide our directly recorded dataset.

Stars: ✭ 622 (+192.02%)

Mutual labels: lstm, speech

editts

Official implementation of EdiTTS: Score-based Editing for Controllable Text-to-Speech

Stars: ✭ 74 (-65.26%)

Mutual labels: text-to-speech, speech

Watbot

An Android ChatBot powered by IBM Watson Services (Assistant V1, Text-to-Speech, and Speech-to-Text with Speaker Recognition) on IBM Cloud.

Stars: ✭ 64 (-69.95%)

Mutual labels: speech, text-to-speech

ttslearn

ttslearn: Library for Pythonで学ぶ音声合成 (Text-to-speech with Python)

Stars: ✭ 158 (-25.82%)

Mutual labels: text-to-speech, speech

spokestack-android

Extensible Android mobile voice framework: wakeword, ASR, NLU, and TTS. Easily add voice to any Android app!

Stars: ✭ 52 (-75.59%)

Mutual labels: text-to-speech, speech

Tts

Text-to-Speech for Arduino

Stars: ✭ 118 (-44.6%)

Mutual labels: speech, text-to-speech

Wavegrad

A fast, high-quality neural vocoder.

Stars: ✭ 138 (-35.21%)

Mutual labels: speech, text-to-speech

View All Similar Projects ➔

Introduction

New: Interactive demo using Google Colaboratory can be found here

TTS-Cube is an end-2-end speech synthesis system that provides a full processing pipeline to train and deploy TTS models.

It is entirely based on neural networks, requires no pre-aligned data and can be trained to produce audio just by using character or phoneme sequences.

Markdown does not allow embedding of audio files. For a better experience check-out the project's website.

For installation please follow these instructions. Training and usage examples can be found here. A notebook demo can be found here.

Output examples

Encoder outputs:

"Arată că interesul utilizatorilor de internet față de acțiuni ecologiste de genul Earth Hour este unul extrem de ridicat."

"Pentru a contracara proiectul, Rusia a demarat un proiect concurent, South Stream, în care a încercat să atragă inclusiv o parte dintre partenerii Nabucco."

Vocoder output (conditioned on gold-standard data)

Note: The mel-spectrum is computed with a frame-shift of 12.5ms. This means that Griffin-Lim reconstruction produces sloppy results at most (regardless on the number of iterations)

original vocoder

End to end decoding

The encoder model is still converging, so right now the examples are still of low quality. We will update the files as soon as we have a stable Encoder model.

synthesized original(unseen)

Technical details

TTS-Cube is based on concepts described in Tacotron (1 and 2), Char2Wav and WaveRNN, but it's architecture does not stick to the exact recipes:

It has a dual-architecture, composed of (a) a module (Encoder) that converts sequences of characters or phonemes into mel-log spectrogram and (b) a RNN-based Vocoder that is conditioned on the spectrogram to produce audio
The Encoder is similar to those proposed in Tacotron (Wang et al., 2017) and Char2Wav (Sotelo et al., 2017), but
- has a lightweight architecture with just a two-layer BDLSTM encoder and a two-layer LSTM decoder
- uses the guided attention trick (Tachibana et al., 2017), which provides incredibly fast convergence of the attention module (in our experiments we were unable to reach an acceptable model without this trick)
- does not employ any CNN/pre-net or post-net
- uses a simple highway connection from the attention to the output of the decoder (which we observed that forces the encoder to actually learn how to produce the mean-values of the mel-log spectrum for particular phones/characters)
The initail vocoder was similar to WaveRNN(Kalchbrenner et al., 2018), but instead of modifying the RNN cells (as proposed in their paper), we used two coupled neural networks
We are now using Clarinet (Ping et al., 2018)

References

The ParallelWavenet/ClariNet code is adapted from this ClariNet repo.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 213

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (6) 🔗