All Projects → Kyubyong → Dc_tts

Kyubyong / Dc_tts

Licence: apache-2.0
A TensorFlow Implementation of DC-TTS: yet another text-to-speech model

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Dc tts

simple-obs-stt
Speech-to-text and keyboard input captions for OBS.
Stars: ✭ 89 (-91.25%)
Mutual labels:  speech, tts, speech-to-text
Lingvo
Lingvo
Stars: ✭ 2,361 (+132.15%)
Mutual labels:  speech, speech-to-text, tts
Discordspeechbot
A speech-to-text bot for discord with music commands and more using NodeJS. Ideally for controlling your Discord server using voice commands, can also be useful for hearing-impaired people.
Stars: ✭ 35 (-96.56%)
Mutual labels:  speech, speech-to-text
Tts
🤖 💬 Deep learning for Text to Speech (Discussion forum: https://discourse.mozilla.org/c/tts)
Stars: ✭ 5,427 (+433.63%)
Mutual labels:  speech, tts
Wsay
Windows "say"
Stars: ✭ 36 (-96.46%)
Mutual labels:  speech, tts
Annyang
💬 Speech recognition for your site
Stars: ✭ 6,216 (+511.21%)
Mutual labels:  speech, speech-to-text
Css10
CSS10: A Collection of Single Speaker Speech Datasets for 10 Languages
Stars: ✭ 302 (-70.3%)
Mutual labels:  speech, speech-to-text
Tts
🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production
Stars: ✭ 305 (-70.01%)
Mutual labels:  speech, tts
spokestack-android
Extensible Android mobile voice framework: wakeword, ASR, NLU, and TTS. Easily add voice to any Android app!
Stars: ✭ 52 (-94.89%)
Mutual labels:  speech, tts
Java Speech Api
The J.A.R.V.I.S. Speech API is designed to be simple and efficient, using the speech engines created by Google to provide functionality for parts of the API. Essentially, it is an API written in Java, including a recognizer, synthesizer, and a microphone capture utility. The project uses Google services for the synthesizer and recognizer. While this requires an Internet connection, it provides a complete, modern, and fully functional speech API in Java.
Stars: ✭ 490 (-51.82%)
Mutual labels:  speech, speech-to-text
Cboard
AAC communication system with text-to-speech for the browser
Stars: ✭ 437 (-57.03%)
Mutual labels:  speech, tts
Tacotron
Audio samples accompanying publications related to Tacotron, an end-to-end speech synthesis model.
Stars: ✭ 493 (-51.52%)
Mutual labels:  speech, tts
sova-asr
SOVA ASR (Automatic Speech Recognition)
Stars: ✭ 123 (-87.91%)
Mutual labels:  speech, speech-to-text
editts
Official implementation of EdiTTS: Score-based Editing for Controllable Text-to-Speech
Stars: ✭ 74 (-92.72%)
Mutual labels:  speech, tts
Android Speech
Android speech recognition and text to speech made easy
Stars: ✭ 310 (-69.52%)
Mutual labels:  speech, tts
Fre-GAN-pytorch
Fre-GAN: Adversarial Frequency-consistent Audio Synthesis
Stars: ✭ 73 (-92.82%)
Mutual labels:  speech, tts
Voice Builder
An opensource text-to-speech (TTS) voice building tool
Stars: ✭ 362 (-64.41%)
Mutual labels:  speech, tts
Nodejs Speech
Node.js client for Google Cloud Speech: Speech to text conversion powered by machine learning.
Stars: ✭ 545 (-46.41%)
Mutual labels:  speech, speech-to-text
ttslearn
ttslearn: Library for Pythonで学ぶ音声合成 (Text-to-speech with Python)
Stars: ✭ 158 (-84.46%)
Mutual labels:  speech, tts
bingspeech-api-client
Microsoft Bing Speech API client in node.js
Stars: ✭ 32 (-96.85%)
Mutual labels:  tts, speech-to-text

A TensorFlow Implementation of DC-TTS: yet another text-to-speech model

I implement yet another text-to-speech model, dc-tts, introduced in Efficiently Trainable Text-to-Speech System Based on Deep Convolutional Networks with Guided Attention. My goal, however, is not just replicating the paper. Rather, I'd like to gain insights about various sound projects.

Requirements

  • NumPy >= 1.11.1
  • TensorFlow >= 1.3 (Note that the API of tf.contrib.layers.layer_norm has changed since 1.3)
  • librosa
  • tqdm
  • matplotlib
  • scipy

Data

I train English models and an Korean model on four different speech datasets.

1. LJ Speech Dataset
2. Nick Offerman's Audiobooks
3. Kate Winslet's Audiobook
4. KSS Dataset

LJ Speech Dataset is recently widely used as a benchmark dataset in the TTS task because it is publicly available, and it has 24 hours of reasonable quality samples. Nick's and Kate's audiobooks are additionally used to see if the model can learn even with less data, variable speech samples. They are 18 hours and 5 hours long, respectively. Finally, KSS Dataset is a Korean single speaker speech dataset that lasts more than 12 hours.

Training

  • STEP 0. Download LJ Speech Dataset or prepare your own data.
  • STEP 1. Adjust hyper parameters in hyperparams.py. (If you want to do preprocessing, set prepro True`.
  • STEP 2. Run python train.py 1 for training Text2Mel. (If you set prepro True, run python prepro.py first)
  • STEP 3. Run python train.py 2 for training SSRN.

You can do STEP 2 and 3 at the same time, if you have more than one gpu card.

Training Curves

Attention Plot

Sample Synthesis

I generate speech samples based on Harvard Sentences as the original paper does. It is already included in the repo.

  • Run synthesize.py and check the files in samples.

Generated Samples

Dataset Samples
LJ 50k 200k 310k 800k
Nick 40k 170k 300k 800k
Kate 40k 160k 300k 800k
KSS 400k

Pretrained Model for LJ

Download this.

Notes

  • The paper didn't mention normalization, but without normalization I couldn't get it to work. So I added layer normalization.
  • The paper fixed the learning rate to 0.001, but it didn't work for me. So I decayed it.
  • I tried to train Text2Mel and SSRN simultaneously, but it didn't work. I guess separating those two networks mitigates the burden of training.
  • The authors claimed that the model can be trained within a day, but unfortunately the luck was not mine. However obviously this is much fater than Tacotron as it uses only convolution layers.
  • Thanks to the guided attention, the attention plot looks monotonic almost from the beginning. I guess this seems to hold the aligment tight so it won't lose track.
  • The paper didn't mention dropouts. I applied them as I believe it helps for regularization.
  • Check also other TTS models such as Tacotron and Deep Voice 3.
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].