All Projects → JaesungBae → Speech-Command-Recognition-with-Capsule-Network

JaesungBae / Speech-Command-Recognition-with-Capsule-Network

Licence: other
Speech command recognition with capsule network & various NNs / KWS on Google Speech Command Dataset.

Programming Languages

python
139335 projects - #7 most used programming language
matlab
3953 projects

Projects that are alternatives of or similar to Speech-Command-Recognition-with-Capsule-Network

multilingual kws
Few-shot Keyword Spotting in Any Language and Multilingual Spoken Word Corpus
Stars: ✭ 122 (+510%)
Mutual labels:  speech-recognition, keyword-spotting, kws
Multi-Hotword Spotting
Won't it be cool to build a speech assistant like Alexa or Siri yourself without voice API and network connection?
Stars: ✭ 31 (+55%)
Mutual labels:  speech-recognition, keyword-spotting
awesome-keyword-spotting
This repository is a curated list of awesome Speech Keyword Spotting (Wake-Up Word Detection).
Stars: ✭ 150 (+650%)
Mutual labels:  speech-recognition, keyword-spotting
VoiceCom
A Simple Voice Command Application powered by Java and Sphinx4 Speech Recognition Library
Stars: ✭ 17 (-15%)
Mutual labels:  speech-recognition
spokestack-android
Extensible Android mobile voice framework: wakeword, ASR, NLU, and TTS. Easily add voice to any Android app!
Stars: ✭ 52 (+160%)
Mutual labels:  speech-recognition
scim
[wip]Speech recognition tool-box written by Nim. Based on Arraymancer.
Stars: ✭ 17 (-15%)
Mutual labels:  speech-recognition
demo vietasr
Vietnamese Speech Recognition
Stars: ✭ 22 (+10%)
Mutual labels:  speech-recognition
porfir
Голосовой ассистент Порфирьевич
Stars: ✭ 23 (+15%)
Mutual labels:  speech-recognition
sova-asr
SOVA ASR (Automatic Speech Recognition)
Stars: ✭ 123 (+515%)
Mutual labels:  speech-recognition
musicologist
Music advice from a conversational interface powered by Algolia
Stars: ✭ 19 (-5%)
Mutual labels:  speech-recognition
leon
🧠 Leon is your open-source personal assistant.
Stars: ✭ 8,560 (+42700%)
Mutual labels:  speech-recognition
speech-to-text
mixlingual speech recognition system; hybrid (GMM+NNet) model; Kaldi + Keras
Stars: ✭ 61 (+205%)
Mutual labels:  speech-recognition
voicekit-examples
Examples on how to use Tinkoff Voicekit
Stars: ✭ 35 (+75%)
Mutual labels:  speech-recognition
kim-voice-assistant
Kim,你的私人语音助理。
Stars: ✭ 70 (+250%)
Mutual labels:  speech-recognition
Recording-Bot
A bot built to record and transcribe audio fragments from Discord.
Stars: ✭ 22 (+10%)
Mutual labels:  speech-recognition
htk
HTK Toolkit with Linux 64 bit and Docker support
Stars: ✭ 14 (-30%)
Mutual labels:  speech-recognition
kosr
Korean speech recognition based on transformer (트랜스포머 기반 한국어 음성 인식)
Stars: ✭ 25 (+25%)
Mutual labels:  speech-recognition
sepia-stt-server
SEPIA server to support open-source speech recognition via WebSocket connection.
Stars: ✭ 45 (+125%)
Mutual labels:  speech-recognition
download audioset
📁 This repo makes it easy to download the raw audio files from AudioSet (32.45 GB, 632 classes).
Stars: ✭ 53 (+165%)
Mutual labels:  speech-recognition
UnityASR
Automatic Speech Recognition in Unity.
Stars: ✭ 14 (-30%)
Mutual labels:  speech-recognition

End-to-End Speech Command Recognition with Capsule Network

INTERSPEECH 2018 paper: link

We apply the capsule network to capture the spatial relationship and pose information of speech spectrogram features in both frequency and time axes, and show that our proposed end-to-end SR system with capsule networks on one-second speech commands dataset achieves better results on both clean and noise-added test than baseline CNN models.

  • 20 JAN 2019: Other baseline Keyword Spotting(KWS) models are also provided in CNN code.

Getting Started

The code is implemented based on python2(2.7.12)

Prerequistes

You should be ready to import below libraries:

tqdm, numpy(1.14.1), termcolor, scipy, sklearn, scikits
tensorflow(1.6.0), keras(2.1.4)

pip install numpy
pip install termcolor
pip install scipy
pip install sklearn
pip install scikit-learn
pip install tensorflow-gpu==1.6.0
pip install keras==2.1.4

Speech Feature Generation

Dataset

We use 'Google Speech Command Dataset'. You could refer to blog and Download Link

  • Download the dataset from above link and unzip it. (In our case we will unzip it in the folder named 'Google_Speech_Command')

Adding noise

To add noise to the original dataset, we use MATLAB and voicebox which is MATLAB library. We run matlab code on local which is window base and upload it to server which is linux base.

  1. Unzip download google speech command dataset.

  2. Create new folder name 'Google_Speech_Command' and move command folders into it. Then the folder structure will be like

speech_commands_v0.01.tar
|-- [_backgorund_noise_]
|-- Google_Speech_Command
|   |-- bed
|   |-- bird
 :      :
|   '-- zero
|-- testing_list
|-- validation_list
'-- etc.
  1. Change 'data_path' in matlab code and run the matlab code. It will generate new folder and save the noise added audio files.
noise_wave_generate.m
  1. You could aslo change 'SNR' in the code and generate noise audio files as you want.

Feature Generation

Extract speech features from raw audio file and save them as .npy file. Please adjust '--noise_name' argument.

cd core
python feature_generation.py

Data folder structure

feature_saved
|-- TEST
|   |-- fbank
|   |   |-- clean
|   |   '-- [noise names]_SNR5
|   '-- label
|-- TRAIN
|   |-- fbank
|   |   |-- clean
|   |   '-- [noise names]_SNR5
|   '-- label
'-- VALID
    |-- fbank
    |   |-- clean
    |   '-- [noise names]_SNR5
    '-- label

Training & Testing

For training and testing go into 'CNN' or 'CapsNet' folder and run the code. You could change the mode with '--is_training' argument.

Training

cd CapsNet
python main.py -m=CapsNet --is_training='TRAIN' -ex='0320_digitvec4' -d=0 --kernel=19 --primary_channel=32  --primary_veclen=4 --digit_veclen=4

Testing

Note that you should set '--keep' argument to the number of epoch that you want to test.

cd CapsNet
python main.py -m=CapsNet --is_training='TEST' -ex='0320_digitvec4' -d=0 --kernel=19 --primary_channel=32  --primary_veclen=4 --digit_veclen=4 --SNR=5 --keep=?

Various Neural Networks base KWS models

KWS models based on various kinds of Neural Networks(NNs) are also provided in CNN/model.py

1. Deep Neural Network(DNN) base KWS model from

  • G. Chen, C. Parada, and G. Heigold, “Small-footprint keyword spotting using deep neural networks.” in ICASSP, vol. 14. Citeseer, 2014, pp. 4087–4091.
Contain 'ref_2014icassp_dnn' in ex_name to use DNN model. For example 
```
python main.py --model='CNN' --ex_name='ref_2014icassp_dnn512' --is_training='TRAIN' --model_size_info 512 512 512
```

2. CNN base KWS model from

  • T. N. Sainath and C. Parada, “Convolutional neural networks for small-footprint keyword spotting,” in Sixteenth Annual Conference of the International Speech Communication Association, 2015.
Contain 'ref_2015is_cnn' in ex_name to use CNN model. For example 
```
python main.py --model='CNN' --ex_name='ref_2015is_cnn' --is_training='TRAIN' --model_size_info 21 8 94 1 1 2 3 6 4 94 1 1 1 1 32
```

3. Long Short-Term Memory(LSTM) base KWS model form

  • M. Sun, A. Raju, G. Tucker, S. Panchapagesan, G. Fu, A. Mandal, S. Matsoukas, N. Strom, and S. Vitaladevuni, “Max-pooling loss training of long short-term memory networks for small-footprint keyword spotting,” in Spoken Language Technology Workshop (SLT), 2016 IEEE. IEEE, 2016, pp. 474–480.
Contain 'ref_rnn' in ex_name to use LSTM model. For example 
```
python main.py --model='CNN' -ex_name=ref_rnn_lstm --is_training='TRAIN' --model_size_info 64 32 0
```

4. Convolutional Recurrent Neural Network(CRNN) base KWS model from

  • S. O. Arik, M. Kliegl, R. Child, J. Hestness, A. Gibiansky, C. Fougner, R. Prenger, and A. Coates, “Convolutional recurrent neural networks for small-footprint keyword spotting,” arXiv preprint arXiv:1703.05390, 2017.
Contain 'ref_crnn' in ex_name to use CRNN model. For example 
```
python main.py --model='CNN' --ex_name=ref_crnn --is_training='TRAIN' --model_size_info 32 20 5 8 2 2 32 1 64
```

Reference

Preprocessing source code from https://github.com/zzw922cn/Automatic_Speech_Recognition.

Base capsule network keras source code from https://github.com/XifengGuo/CapsNet-Keras.

Authors

Jaesung Bae - Korea Advanced Institute of Science and Technology (KAIST)

contact: [email protected]

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].