All Projects → zycv → awesome-keyword-spotting

zycv / awesome-keyword-spotting

Licence: MIT license
This repository is a curated list of awesome Speech Keyword Spotting (Wake-Up Word Detection).

Projects that are alternatives of or similar to awesome-keyword-spotting

react-native-spokestack
Spokestack: give your React Native app a voice interface!
Stars: ✭ 53 (-64.67%)
Mutual labels:  speech-recognition, speech-processing
Formant Analyzer
iOS application for finding formants in spoken sounds
Stars: ✭ 43 (-71.33%)
Mutual labels:  speech-recognition, speech-processing
Awesome Diarization
A curated list of awesome Speaker Diarization papers, libraries, datasets, and other resources.
Stars: ✭ 673 (+348.67%)
Mutual labels:  speech-recognition, speech-processing
Multi-Hotword Spotting
Won't it be cool to build a speech assistant like Alexa or Siri yourself without voice API and network connection?
Stars: ✭ 31 (-79.33%)
Mutual labels:  speech-recognition, keyword-spotting
Speechbrain.github.io
The SpeechBrain project aims to build a novel speech toolkit fully based on PyTorch. With SpeechBrain users can easily create speech processing systems, ranging from speech recognition (both HMM/DNN and end-to-end), speaker recognition, speech enhancement, speech separation, multi-microphone speech processing, and many others.
Stars: ✭ 242 (+61.33%)
Mutual labels:  speech-recognition, speech-processing
Speech-Command-Recognition-with-Capsule-Network
Speech command recognition with capsule network & various NNs / KWS on Google Speech Command Dataset.
Stars: ✭ 20 (-86.67%)
Mutual labels:  speech-recognition, keyword-spotting
Pncc
A implementation of Power Normalized Cepstral Coefficients: PNCC
Stars: ✭ 40 (-73.33%)
Mutual labels:  speech-recognition, speech-processing
Speech-Backbones
This is the main repository of open-sourced speech technology by Huawei Noah's Ark Lab.
Stars: ✭ 205 (+36.67%)
Mutual labels:  speech-recognition, speech-processing
Zzz Retired openstt
RETIRED - OpenSTT is now retired. If you would like more information on Mycroft AI's open source STT projects, please visit:
Stars: ✭ 146 (-2.67%)
Mutual labels:  speech-recognition, speech-processing
Nonautoreggenprogress
Tracking the progress in non-autoregressive generation (translation, transcription, etc.)
Stars: ✭ 118 (-21.33%)
Mutual labels:  speech-recognition, speech-processing
scim
[wip]Speech recognition tool-box written by Nim. Based on Arraymancer.
Stars: ✭ 17 (-88.67%)
Mutual labels:  speech-recognition, speech-processing
multilingual kws
Few-shot Keyword Spotting in Any Language and Multilingual Spoken Word Corpus
Stars: ✭ 122 (-18.67%)
Mutual labels:  speech-recognition, keyword-spotting
UniSpeech
UniSpeech - Large Scale Self-Supervised Learning for Speech
Stars: ✭ 224 (+49.33%)
Mutual labels:  speech-recognition, speech-processing
Uspeech
Speech recognition toolkit for the arduino
Stars: ✭ 448 (+198.67%)
Mutual labels:  speech-recognition, speech-processing
spokestack-ios
Spokestack: give your iOS app a voice interface!
Stars: ✭ 27 (-82%)
Mutual labels:  speech-recognition, speech-processing
Sincnet
SincNet is a neural architecture for efficiently processing raw audio samples.
Stars: ✭ 764 (+409.33%)
Mutual labels:  speech-recognition, speech-processing
QuantumSpeech-QCNN
IEEE ICASSP 21 - Quantum Convolution Neural Networks for Speech Processing and Automatic Speech Recognition
Stars: ✭ 71 (-52.67%)
Mutual labels:  speech-recognition, speech-processing
open-speech-corpora
💎 A list of accessible speech corpora for ASR, TTS, and other Speech Technologies
Stars: ✭ 841 (+460.67%)
Mutual labels:  speech-recognition, speech-processing
Keras Sincnet
Keras (tensorflow) implementation of SincNet (Mirco Ravanelli, Yoshua Bengio - https://github.com/mravanelli/SincNet)
Stars: ✭ 47 (-68.67%)
Mutual labels:  speech-recognition, speech-processing
UHV-OTS-Speech
A data annotation pipeline to generate high-quality, large-scale speech datasets with machine pre-labeling and fully manual auditing.
Stars: ✭ 94 (-37.33%)
Mutual labels:  speech-recognition, speech-processing

Awesome Keyword Spotting

Table of contents

Overview

In speech processing, keyword spotting deals with the identification of keywords in utterances. This repo is a curated list of awesome Speech Keyword Spotting (Wake-Up Word Detection) papers.

Survey

Publications

2022

2021

2020

2019

2018

Others

OpenSource Code

Software

  1. WeKws (Production First and Production Ready End-to-End Keyword Spotting Toolkit)

    Small footprint keyword spotting (KWS), or specifically wake-up word (WuW) detection is a typical and important module in internet of things (IoT) devices. It provides a way for users to control IoT devices with a hands-free experience. A WuW detection system usually runs locally and persistently on IoT devices, which requires low consumptional power, less model parameters, low computational comlexity and to detect predefined keyword in a streaming way, i.e., requires low latency.

  2. Porcupine

    Porcupine is a highly-accurate and lightweight wake word engine. It enables building always-listening voice-enabled applications.

  3. Sonus

    Sonus lets you quickly and easily add a VUI (Voice User Interface) to any hardware or software project. Just like Alexa, Google Assistant, and Siri, Sonus is always listening offline for a customizable hotword. Once that hotword is detected your speech is streamed to the cloud recognition service of your choice - then you get the results in realtime.

  4. Picovoice

    Picovoice is the end-to-end platform for building voice products on your terms. Unlike Alexa and Google services, Picovoice runs entirely on-device while being more accurate.

  5. Mycroft Precise

    A lightweight, simple-to-use, RNN wake word listener.

    Precise is a wake word listener. The software monitors an audio stream ( usually a microphone ) and when it recognizes a specific phrase it triggers an event. For example, at Mycroft AI the team has trained Precise to recognize the phrase "Hey, Mycroft". When the software recognizes this phrase it puts the rest of Mycroft's software into command mode and waits for a command from the person using the device. Mycroft Precise is fully open source and can be trined to recognize anything from a name to a cough.

Datesets

  1. Speech Commands

    • Homepage: Speech Commands: A Dataset for Limited-Vocabulary Speech Recognition

    • Description: An audio dataset of spoken words designed to help train and evaluate keyword spotting systems. Its primary goal is to provide a way to build and test small models that detect when a single word is spoken, from a set of ten target words, with as few false positives as possible from background noise or unrelated speech. Note that in the train and validation set, the label "unknown" is much more prevalent than the labels of the target words or background noise. One difference from the release version is the handling of silent segments. While in the test set the silence segments are regular 1 second files, in the training they are provided as long segments under "background_noise" folder. Here we split these background noise into 1 second clips, and also keep one of the files for the validation set.

    • Download:

  2. Mobvoi Hotwords

    • Homepage: Region Proposal Network Based Small-Footprint Keyword Spotting

    • Description:

      • The MobvoiHotwords is a corpus of wake-up words collected from a commercial smart speaker of Mobvoi. It consists of keyword and non-keyword utterances.
      • For keyword data, keyword utterances contain either 'Hi xiaowen' or 'Nihao Wenwen' are collected. For each keyword, there are about 36k utterances. All keyword data is collected from 788 subjects, ages 3-65, with different distances from the smart speaker (1, 3 and 5 meters). Different noises (typical home environment noises like music and TV) with varying sound pressure levels are played in the background during the collection. The keyword data is identical to the keyword data used in the paper below:
    • Download: MobvoiHotwords

  3. HI-MIA

    • Homepage: A far-field text-dependent speaker verification database for AISHELL Speaker Verification Challenge 2019

    • Description:

      • The data is used in AISHELL Speaker Verification Challenge 2019. It is extracted from a larger database called AISHELL-WakeUp-1.
      • The contents are wake-up words "Hi, Mia" in both Chinese and English. The data is collected in real home environment using microphone arrays and Hi-Fi microphone. The collection process and development of a baseline system was described in the paper below. The data used in the challenge is extracted from 1 Hi-Fi microphone and 16-channel circular microphone arrays for 1/3/5 meters. And the contents are the Chinese wake-up words. The whole set is divided into train (254 people), dev (42 people) and test (44 people) subsets. Test subset is provided with paired target/non-target answer to evaluate verification results.
    • Download: HI-MIA

Challenge

  1. AutoSpeech 2020 Challenge

    In this challenge, we further propose the Automated Speech (AutoSpeech) competition which aims at proposing automated solutions for speech-related tasks. This challenge is restricted to multi-label classification problems, which come from different speech classification domains. The provided solutions are expected to discover various kinds of paralinguistic speech attribute information, such as speaker, language, emotion, etc, when only raw data (speech features) and meta information are provided. There are two kinds of datasets, which correspond to public and private leaderboard respectively. Five public datasets (without labels in the testing part) are provided to the participants for developing AutoSpeech solutions. Afterward, solutions will be evaluated on private datasets without human intervention. The results of these private datasets determine the final ranking.

    Officical Code: AutoSpeech

  2. The 2020 Personalized Voice Trigger Challenge (PVTC2020)

    Recently, personalized voice trigger or wake-up word detection is gaining popularity among speech researchers and developers. Conventionally, the wake-up word detection and speaker verification are carried out separately in pipeline, where a wake-up word detection system is used to generate successful trigger followed by a speaker verification system used to perform identity authentication. In such case, the wake-up word detection system and the speaker verification system are optimized separately, not through an overall joint optimization with a unified goal. As a consequence, their respective network parameters and extracted information are not effectively shared and jointly utilized. Generally the wake-up word detection system needs to run all the time,but the network of speaker verification is relatively large and may not meet the requirements of computing resources on embedding devices. The joint learning or multi-task learning network might be either very light at a small scale as a single always on system, or with a much larger scale at the second stage after a successful wake-up by the first stage voice trigger.

    Paper: The DKU System Description for The Interspeech 2021 Auto-KWS Challenge

    Officical Code: PVTC2020

Leaderboard

  1. Keyword Spotting on Google Speech Commands
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].