All Projects → Kyubyong → Speaker_adapted_tts

Kyubyong / Speaker_adapted_tts

Making a TTS model with 1 minute of speech samples within 10 minutes

Projects that are alternatives of or similar to Speaker adapted tts

simple-obs-stt
Speech-to-text and keyboard input captions for OBS.
Stars: ✭ 89 (-51.37%)
Mutual labels:  tts, speech-to-text
Lingvo
Lingvo
Stars: ✭ 2,361 (+1190.16%)
Mutual labels:  speech-to-text, tts
Dc tts
A TensorFlow Implementation of DC-TTS: yet another text-to-speech model
Stars: ✭ 1,017 (+455.74%)
Mutual labels:  speech-to-text, tts
react-native-spokestack
Spokestack: give your React Native app a voice interface!
Stars: ✭ 53 (-71.04%)
Mutual labels:  tts, speech-to-text
open-speech-corpora
💎 A list of accessible speech corpora for ASR, TTS, and other Speech Technologies
Stars: ✭ 841 (+359.56%)
Mutual labels:  tts, speech-to-text
bingspeech-api-client
Microsoft Bing Speech API client in node.js
Stars: ✭ 32 (-82.51%)
Mutual labels:  tts, speech-to-text
Spokestack Python
Spokestack is a library that allows a user to easily incorporate a voice interface into any Python application.
Stars: ✭ 103 (-43.72%)
Mutual labels:  speech-to-text, tts
Awesome Speech Recognition Speech Synthesis Papers
Automatic Speech Recognition (ASR), Speaker Verification, Speech Synthesis, Text-to-Speech (TTS), Language Modelling, Singing Voice Synthesis (SVS), Voice Conversion (VC)
Stars: ✭ 2,085 (+1039.34%)
Mutual labels:  tts
Melnet
Implementation of "MelNet: A Generative Model for Audio in the Frequency Domain"
Stars: ✭ 161 (-12.02%)
Mutual labels:  tts
Speechrecognizerbutton
UIButton subclass with push to talk recording, speech recognition and Siri-style waveform view.
Stars: ✭ 144 (-21.31%)
Mutual labels:  speech-to-text
Tensorflowtts
😝 TensorFlowTTS: Real-Time State-of-the-art Speech Synthesis for Tensorflow 2 (supported including English, French, Korean, Chinese, German and Easy to adapt for other languages)
Stars: ✭ 2,382 (+1201.64%)
Mutual labels:  tts
Speech To Text Russian
Проект для распознавания речи на русском языке на основе pykaldi.
Stars: ✭ 151 (-17.49%)
Mutual labels:  speech-to-text
Hey Jetson
Deep Learning based Automatic Speech Recognition with attention for the Nvidia Jetson.
Stars: ✭ 161 (-12.02%)
Mutual labels:  speech-to-text
Zzz Retired openstt
RETIRED - OpenSTT is now retired. If you would like more information on Mycroft AI's open source STT projects, please visit:
Stars: ✭ 146 (-20.22%)
Mutual labels:  speech-to-text
Gst Tacotron
A PyTorch implementation of Style Tokens: Unsupervised Style Modeling, Control and Transfer in End-to-End Speech Synthesis
Stars: ✭ 175 (-4.37%)
Mutual labels:  tts
Tacotron
A TensorFlow Implementation of Tacotron: A Fully End-to-End Text-To-Speech Synthesis Model
Stars: ✭ 1,756 (+859.56%)
Mutual labels:  tts
Google Tts
Google TTS (Text-To-Speech) for node.js
Stars: ✭ 180 (-1.64%)
Mutual labels:  tts
Naomi
The Naomi Project is an open source, technology agnostic platform for developing always-on, voice-controlled applications!
Stars: ✭ 171 (-6.56%)
Mutual labels:  speech-to-text
Jiwer
Evaluate your speech-to-text system with similarity measures such as word error rate (WER)
Stars: ✭ 158 (-13.66%)
Mutual labels:  speech-to-text
Proctoring Ai
Creating a software for automatic monitoring in online proctoring
Stars: ✭ 155 (-15.3%)
Mutual labels:  speech-to-text

Making a TTS model with 1 minute of speech samples within 10 minutes

Seeing my implementaions of Tacotron and DCTTS, many people have asked me "How large speech dataset is needed for neural TTS?" or "Can you make a TTS model with X hour(s)/minute(s) of training data?" I'm fully aware of the importance of those questions. When you plan a service using TTS, it is not always likely to get lots of speech samples. I would like to give an answer. I really do. But unfortunately I have no answer. The only thing I know is that I could train a model successfully with five hours of speech samples I extracted from Kate Winslet's audiobook. I haven't tried less data than that. I could try it, but I actually I have a better idea. Since I have a decent model trained with the LJ Speech Dataset for several days, why don't I use it? After all, we all have different voices, but the way we speak English is not totally different.

In the above two repos, I trained TTS models using all the speech samples of my two favorite celebrities, Nick Offerman and Kate Winslet, from scratch. This time, I use only one minute of the speech samples. The following are the synthesized samples after 10 minutes of fine-tuning training. Do you think they sound like them?

Additionally, I collected 10 speech samples of Modern Family celebrities from YouTube, and generated their voice, training on those sample.

Check here to see the model details, source code and the pretrained model which served as a seed.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].