Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

PyTorch Implementation of Google's Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions. This implementation supports both single-, multi-speaker TTS and several techniques to enforce the robustness and efficiency of the model.

Stars: ✭ 22 (-91.79%)

Mutual labels: tts

home-assistant-custom-components-linkplay

LinkPlay based media devices integration for Home Assistant. Fully compatible with Mini Media Player card including speaker group management. Supports snapshot and restore functionality for TTS.

Stars: ✭ 62 (-76.87%)

Mutual labels: tts

editts

Official implementation of EdiTTS: Score-based Editing for Controllable Text-to-Speech

Stars: ✭ 74 (-72.39%)

Mutual labels: tts

leopard-chat-ui-teneo

Leopard Chat UI - A Teneo Chat Client based on Vue and Vuetify

Stars: ✭ 65 (-75.75%)

Mutual labels: tts

View All Similar Projects ➔

To make a smart speaker

中文

Here is a collection of resources to make a smart speaker. ~~Hope we can make an open source one for daily use.~~ I believe we have enough resources to make an open source smart speaker. Let's do it. Take a look at the progress of the project named smart speaker from scratch on hackaday. The first hardware kit is available now.

The simplified flowchart of a smart speaker is like:

+---+   +----------------+   +---+   +---+   +---+
|Mic|-->|Audio Processing|-->|KWS|-->|STT|-->|NLU|
+---+   +----------------+   +---+   +---+   +-+-+
                                               |
                                               |
+-------+   +---+   +----------------------+   |
|Speaker|<--|TTS|<--|Knowledge/Skill/Action|<--+
+-------+   +---+   +----------------------+

Audio Processing includes Acoustic Echo Cancellation (AEC), Beamforming, Noise Suppression (NS), etc.
Keyword Spotting (KWS) detects a keyword (such as OK Google, Hey Siri) to start a conversation.
Speech To Text (STT)
Natural Language Understanding (NLU) converts raw text into structured data.
Knowledge/Skill/Action - Knowledge base and plugins (Alexa Skill, Google Action) to provide an answer.
Text To Speech

KWS + STT + NLU + Skill + TTS

Active open source projects

Snips ⭐️ - the first 100% on-device and private-by-design open-source Voice AI platform
Mycroft ⭐️ - a hackable open source voice assistant
SEPIA 🤖 - Highly customizable, open-source, cross-platform voice assistant and VUI framework (HTML + Java + x)
Kalliope - a framework that will help you to create your own personal assistant, kind of similar with Mycroft (Both written by Python)
dingdang robot - a 🇨🇳 voice interaction robot based on Jasper and built with raspberry pi

SDK

Amazon Alexa Voice Service - is the most widely used voice assistant
Google Assistant SDK

It has the smartest brain, its extension called Google Action can be created on a few steps with digitalflow.ai and its Device Action is very suit for home smart devices.
Baidu DuerOS
Snips
- Install Snips on Raspberry Pi 3, Linux, osX, iOS and Android
SEPIA Installation, SEPIA with Porcupine + ReSpeaker

KWS

Mycroft Precise - A lightweight, simple-to-use, RNN wake word listener
Snowboy - DNN based hotword and wake word detection toolkit
Honk - PyTorch reimplementation of Google's TensorFlow CNNs for keyword spotting
ML-KWS-For-MCU - Maybe the most promise for resource constrained devices such as ARM Cortex M7 microcontroller
Porcupine - Lightweight, cross-platform engine to build custom wake words in seconds

STT

Mozilla DeepSpeech - A TensorFlow implementation of Baidu's DeepSpeech architecture
Kaldi
wav2letter++ - a fast, open source speech processing toolkit from the Speech team at Facebook AI Research built to facilitate research in end-to-end models for speech recognition.
Zamia Speech - Open tools, data, models (kaldi models and wav2letter++ models) for cloudless automatic speech recognition. It can be run on Raspberry Pi
PocketSphinx - a lightweight speech recognition engine using HMM + GMM

NLU

Rasa NLU
- Rasa NLU for Chinese
Snips NLU - a Python library that allows to parse sentences written in natural language and extracts structured information.

TTS

Mozilla TTS - Deep learning for Text to Speech
Mimic - Mycroft's TTS engine, based on CMU's Flite (Festival Lite)
manytts - an open-source, multilingual text-to-speech synthesis system written in pure java
espeak-ng - an open source speech synthesizer that supports 99 languages and accents.
ekho - Chinese text-to-speech engine
WaveNet, Tacotron 2

Audio Processing

Acoustic Echo Cancellation
- SpeexDSP, its python binding speexdsp-python
- EC - Echo Cancelation Daemon based on SpeexDSP AEC for Raspberry Pi or other devices running Linux.
Direction Of Arrival (DOA) - Most used DOA algorithms is GCC-PHAT
- tdoa
- odas - ODAS stands for Open embeddeD Audition System. This is a library dedicated to perform sound source localization, tracking, separation and post-filtering. ODAS is coded entirely in C, for more portability, and is optimized to run easily on low-cost embedded hardware. ODAS is free and open source.
Beamforming
- BeamformIt - filter&sum beamforming
- CGMM Beamforming - a reference implementation
- MVDR Beamforming
- GSC Beamforming
Voice Activity Detection
- WebRTC VAD, py-webrtcvad
- DNN VAD
Noise Suppresion
- NS of WebRTC audio processing, python-webrtc-audio-processing

Audio I/O

PortAudio, pyaudio
libsoundio
ALSA
PulseAudio
Pipewire

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 268

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (3) 🔗