All Projects → thorstenMueller → deep-learning-german-tts

thorstenMueller / deep-learning-german-tts

Licence: CC0-1.0 license
Thorsten-Voice: A free to use, offline working, high quality german TTS voice should be available for every project without any license struggling.

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to deep-learning-german-tts

CISTEM
Stemmer for German
Stars: ✭ 33 (-87.69%)
Mutual labels:  german, deutsch
vits
VITS: Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech
Stars: ✭ 1,604 (+498.51%)
Mutual labels:  tts, speech-synthesis
IMS-Toucan
Text-to-Speech Toolkit of the Speech and Language Technologies Group at the University of Stuttgart. Objectives of the development are simplicity, modularity, controllability and multilinguality.
Stars: ✭ 295 (+10.07%)
Mutual labels:  tts, speech-synthesis
VAENAR-TTS
PyTorch Implementation of VAENAR-TTS: Variational Auto-Encoder based Non-AutoRegressive Text-to-Speech Synthesis.
Stars: ✭ 66 (-75.37%)
Mutual labels:  tts, speech-synthesis
Zero-Shot-TTS
Unofficial Implementation of Zero-Shot Text-to-Speech for Text-Based Insertion in Audio Narration
Stars: ✭ 33 (-87.69%)
Mutual labels:  tts, speech-synthesis
Tacotron
A TensorFlow implementation of Google's Tacotron speech synthesis with pre-trained model (unofficial)
Stars: ✭ 2,581 (+863.06%)
Mutual labels:  tts, speech-synthesis
TFGAN
TFGAN: Time and Frequency Domain Based Generative Adversarial Network for High-fidelity Speech Synthesis
Stars: ✭ 65 (-75.75%)
Mutual labels:  tts, speech-synthesis
Marytts
MARY TTS -- an open-source, multilingual text-to-speech synthesis system written in pure java
Stars: ✭ 1,699 (+533.96%)
Mutual labels:  tts, speech-synthesis
TensorVox
Desktop application for neural speech synthesis written in C++
Stars: ✭ 140 (-47.76%)
Mutual labels:  tts, speech-synthesis
StyleSpeech
Official implementation of Meta-StyleSpeech and StyleSpeech
Stars: ✭ 161 (-39.93%)
Mutual labels:  tts, speech-synthesis
Lingvo
Lingvo
Stars: ✭ 2,361 (+780.97%)
Mutual labels:  tts, speech-synthesis
AdaSpeech
AdaSpeech: Adaptive Text to Speech for Custom Voice
Stars: ✭ 108 (-59.7%)
Mutual labels:  tts, speech-synthesis
Awesome Speech Recognition Speech Synthesis Papers
Automatic Speech Recognition (ASR), Speaker Verification, Speech Synthesis, Text-to-Speech (TTS), Language Modelling, Singing Voice Synthesis (SVS), Voice Conversion (VC)
Stars: ✭ 2,085 (+677.99%)
Mutual labels:  tts, speech-synthesis
Wavegrad
Implementation of Google Brain's WaveGrad high-fidelity vocoder (paper: https://arxiv.org/pdf/2009.00713.pdf). First implementation on GitHub.
Stars: ✭ 245 (-8.58%)
Mutual labels:  tts, speech-synthesis
Tensorflowtts
😝 TensorFlowTTS: Real-Time State-of-the-art Speech Synthesis for Tensorflow 2 (supported including English, French, Korean, Chinese, German and Easy to adapt for other languages)
Stars: ✭ 2,382 (+788.81%)
Mutual labels:  tts, speech-synthesis
react-native-spokestack
Spokestack: give your React Native app a voice interface!
Stars: ✭ 53 (-80.22%)
Mutual labels:  tts, speech-synthesis
Deepvoice3 pytorch
PyTorch implementation of convolutional neural networks-based text-to-speech synthesis models
Stars: ✭ 1,654 (+517.16%)
Mutual labels:  tts, speech-synthesis
Pytorch Dc Tts
Text to Speech with PyTorch (English and Mongolian)
Stars: ✭ 122 (-54.48%)
Mutual labels:  tts, speech-synthesis
Expressive-FastSpeech2
PyTorch Implementation of Non-autoregressive Expressive (emotional, conversational) TTS based on FastSpeech2, supporting English, Korean, and your own languages.
Stars: ✭ 139 (-48.13%)
Mutual labels:  tts, speech-synthesis
Cross-Speaker-Emotion-Transfer
PyTorch Implementation of ByteDance's Cross-speaker Emotion Transfer Based on Speaker Condition Layer Normalization and Semi-Supervised Training in Text-To-Speech
Stars: ✭ 107 (-60.07%)
Mutual labels:  tts, speech-synthesis

Thorsten-Voice logo

Motivation for Thorsten-Voice project 🗣️ 💬

A free to use, offline working, high quality german TTS voice should be available for every project without any license struggling.

follow on Twitter YouTube Channel Subscribers Project website

Some personal words before using Thorsten-Voice

I contribute my voice as a person believing in a world where all people are equal. No matter of gender, sexual orientation, religion, skin color and geocoordinates of birth location. A global world where everybody is warmly welcome on any place on this planet and open and free knowledge and education is available to everyone. 🌍 (Thorsten Müller)

Please keep in mind, that i am no professional voice talent. I'm just a normal guy sharing his voice with the world.

Voice-Datasets

Voice datasets are listed on Zenodo:

Dataset DOI Link
Thorsten-21.02-neutral DOI
Thorsten-21.06-emotional DOI
Thorsten-22.05-neutral soon to come

Thorsten-21.02-neutral

DOI

@dataset{muller_thorsten_2021_5525342,
  author       = {Müller, Thorsten and
                  Kreutz, Dominik},
  title        = {Thorsten-Voice - "Thorsten-21.02-neutral" Dataset},
  month        = feb,
  year         = 2021,
  note         = {{Please use it to make the world a better place for 
                   whole humankind.}},
  publisher    = {Zenodo},
  version      = {3.0},
  doi          = {10.5281/zenodo.5525342},
  url          = {https://doi.org/10.5281/zenodo.5525342}
}

🗣️ Listen to some audio recordings from this dataset here.

Dataset summary

  • Recorded by Thorsten Müller
  • Optimized by Dominik Kreutz
  • LJSpeech file and directory structure
  • 22.668 recorded phrases (wav files)
  • More than 23 hours of pure audio
  • Samplerate 22.050Hz
  • Mono
  • Normalized to -24dB
  • Phrase length (min/avg/max): 2 / 52 / 180 chars
  • No silence at beginning/ending
  • Avg spoken chars per second: 14
  • Sentences with question mark: 2.780
  • Sentences with exclamation mark: 1.840

Dataset evolution

As described in the PDF document (evolution of thorsten dataset) this dataset consists of three recording phases.

  • Phase 1: Recorded with a cheap usb microphone (low quality)
  • Phase 2: Recorded with a good microphone (good quality)
  • Phase 3: Recorded with same good microphone but longer phrases (> 100 chars) (good quality)

If you want to use a dataset subset you can see which files belong to which recording phase in recording quality csv file.

Thorsten-21.06-emotional

DOI

@dataset{muller_thorsten_2021_5525023,
  author       = {Müller, Thorsten and
                  Kreutz, Dominik},
  title        = {{Thorsten-Voice - "Thorsten-21.06-emotional" 
                   Dataset}},
  month        = jun,
  year         = 2021,
  note         = {{Please use it to make the world a better place for 
                   whole humankind.}},
  publisher    = {Zenodo},
  version      = {2.0},
  doi          = {10.5281/zenodo.5525023},
  url          = {https://doi.org/10.5281/zenodo.5525023}
}

All emotional recordings where recorded by myself and i tried to feel and pronounce that emotion even if the phrase context does not match that emotion. Example: I pronounced the sleepy recordings in the tone i have shortly before falling asleep.

Samples

Listen to the phrase "Mist, wieder nichts geschafft." in following emotions.

Dataset summary

  • Recorded by Thorsten Müller
  • Optimized by Dominik Kreutz
  • 300 sentences * 8 emotions = 2.400 recordings
  • Mono
  • Samplerate 22.050Hz
  • Normalized to -24dB
  • No silence at beginning/ending
  • Sentence length: 59 - 148 chars

Thorsten-22.05-neutral

🗣️ Listen to some audio recordings from this dataset here.

Soon to come

TTS Models

Thorsten-21.04-Tacotron2-DCA

This TTS-model has been trained on Thorsten-21.02-neutral dataset. The recommended trained Fullband-MelGAN Vocoder can be downloaded here.

Run the model:

  • pip install TTS==0.5.0
  • tts-server --model_name tts_models/de/thorsten/tacotron2-DCA

Thorsten-22.05-VITS

Trained on dataset Thorsten-22.05-neutral. Audio samples are available on Thorsten-Voice website.

To run TTS server just follow these steps:

  • pip install tts==0.7.1
  • tts-server --model_name tts_models/de/thorsten/vits
  • Open browser on http://localhost:5002 and enjoy playing

Thorsten-22.08-Tacotron2-DDC

Trained on dataset Thorsten-22.05-neutral. Audio samples are available on Thorsten-Voice website.

To run TTS server just follow these steps:

  • pip install tts==0.8.0
  • tts-server --model_name tts_models/de/thorsten/tacotron2-DDC
  • Open browser on http://localhost:5002 and enjoy playing

Other models

Silero

You can use a free A-GPL licensed models trained on Thorsten-21.02-neutral dataset via the silero-models project.

ZDisket

ZDisket made a tool called TensorVox for setting up an TTS environment on Windows and included a german TTS model trained by monatis. Thanks for sharing that 👍. See it in action on Youtube.

Public talks

I really want to bring the topic "Open Voice For An Open Future" to a bigger public attention.

  • I've been part of a Linux User Group podcast about Mycroft AI and talked on my TTS efforts on that in (May 2021).
  • I was invited by Yusuf from Turkish tensorflow community to talk on "How to make machines speak with your own voice". This talk has been streamed live on Youtube and is available here. If you're interested on the showed slides, feel free to download my presentation here (June 2021) )
  • I've been invited as speaker on VoiceLunch language & linguistics on 03.01.2022. Here are my slides (January 2022).

Youtube channel

In summer 2021 i've started to share my lessons learned and experiences on open voice tech, in special TTS on my little Youtube channel. If you check out and like my videos i'd happy to welcome you as subscriber and member of my little Youtube community.

Feel free to file an issue if you ...

  • Use my TTS voice in your project(s)
  • Want to share your trained "Thorsten" model
  • Get to know about any abuse usage of my voice

Thanks section

Cool projects

Cool people

Even more special people

Additionally, a really nice thanks for my dear colleague, Sebastian Kraus, for supporting me with audio recording equipment and for being the creative mastermind behind the logo design.

And last but not least i want to say a huge, huge thank you to a special guy who supported me on this journey as a partner right from the beginning. Not just with nice words, but with his time, audio optimization knowhow and finally GPU power.

Thank you so much, dear Dominik (@domcross) for being my partner on this journey.

Thorsten (Twitter: @ThorstenVoice)

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].