All Projects → alkhimey → esp32-flite

alkhimey / esp32-flite

Licence: other
Speech synthesis running on ESP32 based on Flite engine.

Programming Languages

c
50402 projects - #5 most used programming language

Projects that are alternatives of or similar to esp32-flite

LVCNet
LVCNet: Efficient Condition-Dependent Modeling Network for Waveform Generation
Stars: ✭ 67 (+139.29%)
Mutual labels:  text-to-speech, tts, speech-synthesis
spokestack-android
Extensible Android mobile voice framework: wakeword, ASR, NLU, and TTS. Easily add voice to any Android app!
Stars: ✭ 52 (+85.71%)
Mutual labels:  text-to-speech, tts, speech-synthesis
Cross-Speaker-Emotion-Transfer
PyTorch Implementation of ByteDance's Cross-speaker Emotion Transfer Based on Speaker Condition Layer Normalization and Semi-Supervised Training in Text-To-Speech
Stars: ✭ 107 (+282.14%)
Mutual labels:  text-to-speech, tts, speech-synthesis
Zero-Shot-TTS
Unofficial Implementation of Zero-Shot Text-to-Speech for Text-Based Insertion in Audio Narration
Stars: ✭ 33 (+17.86%)
Mutual labels:  text-to-speech, tts, speech-synthesis
Fre-GAN-pytorch
Fre-GAN: Adversarial Frequency-consistent Audio Synthesis
Stars: ✭ 73 (+160.71%)
Mutual labels:  text-to-speech, tts, speech-synthesis
Parallel-Tacotron2
PyTorch Implementation of Google's Parallel Tacotron 2: A Non-Autoregressive Neural TTS Model with Differentiable Duration Modeling
Stars: ✭ 149 (+432.14%)
Mutual labels:  text-to-speech, tts, speech-synthesis
talkie
Text-to-speech browser extension button. Select text on any web page, and have the computer read it out loud for you by simply clicking the Talkie button.
Stars: ✭ 43 (+53.57%)
Mutual labels:  text-to-speech, tts, speech-synthesis
vits
VITS: Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech
Stars: ✭ 1,604 (+5628.57%)
Mutual labels:  text-to-speech, tts, speech-synthesis
leon
🧠 Leon is your open-source personal assistant.
Stars: ✭ 8,560 (+30471.43%)
Mutual labels:  text-to-speech, speech-synthesis, flite
open-speech-corpora
💎 A list of accessible speech corpora for ASR, TTS, and other Speech Technologies
Stars: ✭ 841 (+2903.57%)
Mutual labels:  text-to-speech, tts, speech-synthesis
TensorVox
Desktop application for neural speech synthesis written in C++
Stars: ✭ 140 (+400%)
Mutual labels:  text-to-speech, tts, speech-synthesis
WaveGrad2
PyTorch Implementation of Google Brain's WaveGrad 2: Iterative Refinement for Text-to-Speech Synthesis
Stars: ✭ 55 (+96.43%)
Mutual labels:  text-to-speech, tts, speech-synthesis
StyleSpeech
Official implementation of Meta-StyleSpeech and StyleSpeech
Stars: ✭ 161 (+475%)
Mutual labels:  text-to-speech, tts, speech-synthesis
ttslearn
ttslearn: Library for Pythonで学ぶ音声合成 (Text-to-speech with Python)
Stars: ✭ 158 (+464.29%)
Mutual labels:  text-to-speech, tts, speech-synthesis
Expressive-FastSpeech2
PyTorch Implementation of Non-autoregressive Expressive (emotional, conversational) TTS based on FastSpeech2, supporting English, Korean, and your own languages.
Stars: ✭ 139 (+396.43%)
Mutual labels:  text-to-speech, tts, speech-synthesis
VAENAR-TTS
PyTorch Implementation of VAENAR-TTS: Variational Auto-Encoder based Non-AutoRegressive Text-to-Speech Synthesis.
Stars: ✭ 66 (+135.71%)
Mutual labels:  text-to-speech, tts, speech-synthesis
IMS-Toucan
Text-to-Speech Toolkit of the Speech and Language Technologies Group at the University of Stuttgart. Objectives of the development are simplicity, modularity, controllability and multilinguality.
Stars: ✭ 295 (+953.57%)
Mutual labels:  text-to-speech, tts, speech-synthesis
react-native-spokestack
Spokestack: give your React Native app a voice interface!
Stars: ✭ 53 (+89.29%)
Mutual labels:  text-to-speech, tts, speech-synthesis
AdaSpeech
AdaSpeech: Adaptive Text to Speech for Custom Voice
Stars: ✭ 108 (+285.71%)
Mutual labels:  text-to-speech, tts, speech-synthesis
Daft-Exprt
PyTorch Implementation of Daft-Exprt: Robust Prosody Transfer Across Speakers for Expressive Speech Synthesis
Stars: ✭ 41 (+46.43%)
Mutual labels:  text-to-speech, tts, speech-synthesis

Flite on ESP32 Example

This project demonstrates speech synthesis on the ESP32. It performs the synthesis locally using the CMU Flite library, rather than offloading this task to cloud providers.

For this project, Flite 2.2 (commit hash e9880474) was ported to esp-idf 3.2.2 framework and is now a set of reusable components that can be found in the "components" directory.

The cmu_us_kal voice is provided as an example. Other predefined voices that come with Flite are too big to fit into the FLASH. New voices could be added as separate components provided that they fit into FLASH.

Running the Example

The example runs a simple http server that receives GET requests of text to be synthesized. The program synthesizes the text and sends the PCM data over I2S. On the I2S receiving side I used PCM5102 chip, but any other chip might work. Additionally it could be possible to route I2S to an ESP32 internal 8 bit DAC.

First, configure using make menuconfig. You need to set your Wi-Fi SSID and password as well as the pins to use for I2S. I tested with BCK = 26, WS = 25 and DATA = 22.

Since the produced WAV file is stored as an array of PCM values allocated on the heap, enough heap space must be available. The space required depends on the length of the synthesized text. Therefore using WROVER model of ESP32 that have 4MB of PSARAM is advised. The PSRAM must be enabled in menuconfig. It is a little hidden in the menus: Component config -> ESP32 Specific -> Support for external, SPI connected RAM -> SPI RAM Config. Once enabled, it will bre added to the heap allocation pool.

To send the text for ESP32 to synthesize, one need to send an http GET request of /say path with a query parameter s. This can be done with a web browse. Browse to http://<ip of esp device>/say?s=This is an example text. The query string is limited to approximately 256 characters, but this is an artificial limitation of the example program and the Flite library can synthesize much longer texts at once.

The synthesized data is streamed in chunks, thus playback can begin before Flite finished processing all of the text. This reduces delay for longer texts and gives a real time feel. This is one of the advantages of using Flite rather than using cloud services and downloading the synthesized data via Wi-Fi.

Adding to Your Project

  1. Copy the components into your project.

  2. Make sure your app partiton has at least 2MB.

     factory,  app,  factory, 0x10000, 0x2F0000,
    
  3. Configure with make menuconfig.

  4. Then use the following code:

     cst_voice *register_cmu_us_kal(const char *voxdir);
     int i2s_stream_chunk(const cst_wave *w, int start, int size, 
                     int last, cst_audio_streaming_info *asi)
     {
         // write here code that processes the wav chunk. For example send it to
         // I2S, drive a DAC or send it via Wi-Fi/Bluetooth/Serial to another
         // device.
     }
     ...
     
     /* Initialization code */
     flite_init();        
     cst_voice *v = register_cmu_us_kal(NULL);
    
     cst_audio_streaming_info *asi = 
         cst_alloc(struct cst_audio_streaming_info_struct,1);
    
     asi->min_buffsize = 256;
     asi->asc = i2s_stream_chunk;
     asi->userdata = NULL;
    
     feat_set(v->features,"streaming_info",audio_streaming_info_val(asi));
    
     /* Synthesis Code */
     cst_wave * wav = flite_text_to_wave("Replace with your text",v);
     delete_wave(wav);
    

Use Case Ideas

  • Talking clock and calendar
  • Talking weather station
  • News reader
  • Mail or Twitter reader
  • Chat bot
  • Personal assistant
  • Talking toys
  • Educational games

Projects That Use Flite

If you have used Flite in your project, open a pull request with a link to the project and I will add it here.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].