All Projects → voice-engine → Make A Smart Speaker

voice-engine / Make A Smart Speaker

A collection of resources to make a smart speaker

Labels

Projects that are alternatives of or similar to Make A Smart Speaker

react-native-spokestack
Spokestack: give your React Native app a voice interface!
Stars: ✭ 53 (-80.22%)
Mutual labels:  nlu, tts
Spokestack Python
Spokestack is a library that allows a user to easily incorporate a voice interface into any Python application.
Stars: ✭ 103 (-61.57%)
Mutual labels:  nlu, tts
spokestack-android
Extensible Android mobile voice framework: wakeword, ASR, NLU, and TTS. Easily add voice to any Android app!
Stars: ✭ 52 (-80.6%)
Mutual labels:  nlu, tts
spokestack-tray-android
A UI component that makes it easy to add voice interaction to your app.
Stars: ✭ 13 (-95.15%)
Mutual labels:  nlu, tts
google-translate-tts
Node library for Google Translate TTS (Text-to-Speech) API
Stars: ✭ 23 (-91.42%)
Mutual labels:  tts
sam
SAM: Software Automatic Mouth (Ported from https://github.com/vidarh/SAM)
Stars: ✭ 33 (-87.69%)
Mutual labels:  tts
ha-rhvoice
Home Assistant integration for RHVoice - a local text-to-speech engine.
Stars: ✭ 19 (-92.91%)
Mutual labels:  tts
YourTTS
YourTTS: Towards Zero-Shot Multi-Speaker TTS and Zero-Shot Voice Conversion for everyone
Stars: ✭ 217 (-19.03%)
Mutual labels:  tts
Flutter tts
Flutter Text to Speech package
Stars: ✭ 263 (-1.87%)
Mutual labels:  tts
Articutapi
API of Articut 中文斷詞 (兼具語意詞性標記):「斷詞」又稱「分詞」,是中文資訊處理的基礎。Articut 不用機器學習,不需資料模型,只用現代白話中文語法規則,即能達到 SIGHAN 2005 F1-measure 94% 以上,Recall 96% 以上的成績。
Stars: ✭ 252 (-5.97%)
Mutual labels:  nlu
talkbot
Text-to-speech and translation bot for Discord
Stars: ✭ 27 (-89.93%)
Mutual labels:  tts
Fre-GAN-pytorch
Fre-GAN: Adversarial Frequency-consistent Audio Synthesis
Stars: ✭ 73 (-72.76%)
Mutual labels:  tts
RequestifyTF2
Client side commands for mic spamming and more!
Stars: ✭ 13 (-95.15%)
Mutual labels:  tts
Rasa nlu gq
turn natural language into structured data(支持中文,自定义了N种模型,支持不同的场景和任务)
Stars: ✭ 256 (-4.48%)
Mutual labels:  nlu
apple airplayer
Make your AirPlay devices as TTS speakers
Stars: ✭ 84 (-68.66%)
Mutual labels:  tts
persian-tts
🔊 A simple human-based text-to-speach synthesiser and ReactNative app for Persian language.
Stars: ✭ 18 (-93.28%)
Mutual labels:  tts
Comprehensive-Tacotron2
PyTorch Implementation of Google's Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions. This implementation supports both single-, multi-speaker TTS and several techniques to enforce the robustness and efficiency of the model.
Stars: ✭ 22 (-91.79%)
Mutual labels:  tts
home-assistant-custom-components-linkplay
LinkPlay based media devices integration for Home Assistant. Fully compatible with Mini Media Player card including speaker group management. Supports snapshot and restore functionality for TTS.
Stars: ✭ 62 (-76.87%)
Mutual labels:  tts
editts
Official implementation of EdiTTS: Score-based Editing for Controllable Text-to-Speech
Stars: ✭ 74 (-72.39%)
Mutual labels:  tts
leopard-chat-ui-teneo
Leopard Chat UI - A Teneo Chat Client based on Vue and Vuetify
Stars: ✭ 65 (-75.75%)
Mutual labels:  tts

To make a smart speaker

中文

Here is a collection of resources to make a smart speaker. Hope we can make an open source one for daily use. I believe we have enough resources to make an open source smart speaker. Let's do it. Take a look at the progress of the project named smart speaker from scratch on hackaday. The first hardware kit is available now.

The simplified flowchart of a smart speaker is like:

+---+   +----------------+   +---+   +---+   +---+
|Mic|-->|Audio Processing|-->|KWS|-->|STT|-->|NLU|
+---+   +----------------+   +---+   +---+   +-+-+
                                               |
                                               |
+-------+   +---+   +----------------------+   |
|Speaker|<--|TTS|<--|Knowledge/Skill/Action|<--+
+-------+   +---+   +----------------------+
  • Audio Processing includes Acoustic Echo Cancellation (AEC), Beamforming, Noise Suppression (NS), etc.
  • Keyword Spotting (KWS) detects a keyword (such as OK Google, Hey Siri) to start a conversation.
  • Speech To Text (STT)
  • Natural Language Understanding (NLU) converts raw text into structured data.
  • Knowledge/Skill/Action - Knowledge base and plugins (Alexa Skill, Google Action) to provide an answer.
  • Text To Speech

KWS + STT + NLU + Skill + TTS

Active open source projects

  • Snips ⭐️ - the first 100% on-device and private-by-design open-source Voice AI platform
  • Mycroft ⭐️ - a hackable open source voice assistant
  • SEPIA 🤖 - Highly customizable, open-source, cross-platform voice assistant and VUI framework (HTML + Java + x)
  • Kalliope - a framework that will help you to create your own personal assistant, kind of similar with Mycroft (Both written by Python)
  • dingdang robot - a 🇨🇳 voice interaction robot based on Jasper and built with raspberry pi

SDK

KWS

  • Mycroft Precise - A lightweight, simple-to-use, RNN wake word listener
  • Snowboy - DNN based hotword and wake word detection toolkit
  • Honk - PyTorch reimplementation of Google's TensorFlow CNNs for keyword spotting
  • ML-KWS-For-MCU - Maybe the most promise for resource constrained devices such as ARM Cortex M7 microcontroller
  • Porcupine - Lightweight, cross-platform engine to build custom wake words in seconds

STT

  • Mozilla DeepSpeech - A TensorFlow implementation of Baidu's DeepSpeech architecture
  • Kaldi
  • wav2letter++ - a fast, open source speech processing toolkit from the Speech team at Facebook AI Research built to facilitate research in end-to-end models for speech recognition.
  • Zamia Speech - Open tools, data, models (kaldi models and wav2letter++ models) for cloudless automatic speech recognition. It can be run on Raspberry Pi
  • PocketSphinx - a lightweight speech recognition engine using HMM + GMM

NLU

TTS

  • Mozilla TTS - Deep learning for Text to Speech
  • Mimic - Mycroft's TTS engine, based on CMU's Flite (Festival Lite)
  • manytts - an open-source, multilingual text-to-speech synthesis system written in pure java
  • espeak-ng - an open source speech synthesizer that supports 99 languages and accents.
  • ekho - Chinese text-to-speech engine
  • WaveNet, Tacotron 2

Audio Processing

  • Acoustic Echo Cancellation

    • SpeexDSP, its python binding speexdsp-python
    • EC - Echo Cancelation Daemon based on SpeexDSP AEC for Raspberry Pi or other devices running Linux.
  • Direction Of Arrival (DOA) - Most used DOA algorithms is GCC-PHAT

    • tdoa
    • odas - ODAS stands for Open embeddeD Audition System. This is a library dedicated to perform sound source localization, tracking, separation and post-filtering. ODAS is coded entirely in C, for more portability, and is optimized to run easily on low-cost embedded hardware. ODAS is free and open source.
  • Beamforming

  • Voice Activity Detection

  • Noise Suppresion

Audio I/O

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].