All Projects → douglas125 → Speechcmdrecognition

douglas125 / Speechcmdrecognition

Licence: mit
A neural attention model for speech command recognition

Projects that are alternatives of or similar to Speechcmdrecognition

Rethinking Pyro
Statistical Rethinking with PyTorch and Pyro
Stars: ✭ 116 (+0%)
Mutual labels:  jupyter-notebook
Advanced training
Advanced Scikit-learn training session
Stars: ✭ 116 (+0%)
Mutual labels:  jupyter-notebook
Theseus growth
Theseus is a Python library for cohort analysis
Stars: ✭ 117 (+0.86%)
Mutual labels:  jupyter-notebook
Algorithms With Python
Solving the fundamentals of algorithms using Python
Stars: ✭ 116 (+0%)
Mutual labels:  jupyter-notebook
Exploringdatawithpython
Methods of data exploration and visualization using Python.
Stars: ✭ 116 (+0%)
Mutual labels:  jupyter-notebook
Python Ecology Lesson
Data Analysis and Visualization in Python for Ecologists
Stars: ✭ 116 (+0%)
Mutual labels:  jupyter-notebook
Pybatfish
Python client for Batfish: https://github.com/batfish/batfish
Stars: ✭ 116 (+0%)
Mutual labels:  jupyter-notebook
Objectdetection
Some experiments with object detection in PyTorch
Stars: ✭ 117 (+0.86%)
Mutual labels:  jupyter-notebook
Demo Docker
Demo notebooks inside a docker for end-to-end examples
Stars: ✭ 116 (+0%)
Mutual labels:  jupyter-notebook
Underwater detection
2020年全国水下机器人(湛江)大赛
Stars: ✭ 116 (+0%)
Mutual labels:  jupyter-notebook
Dogbreed gluon
kaggle Dog Breed Identification
Stars: ✭ 116 (+0%)
Mutual labels:  jupyter-notebook
Pygoturn
PyTorch implementation of GOTURN object tracker: Learning to Track at 100 FPS with Deep Regression Networks (ECCV 2016)
Stars: ✭ 116 (+0%)
Mutual labels:  jupyter-notebook
Data Science 45min Intros
Ipython notebook presentations for getting starting with basic programming, statistics and machine learning techniques
Stars: ✭ 1,513 (+1204.31%)
Mutual labels:  jupyter-notebook
Ds bowl 2018
Kaggle Data Science Bowl 2018
Stars: ✭ 116 (+0%)
Mutual labels:  jupyter-notebook
How To Build Own Text Summarizer Using Deep Learning
In this notebook, we will build an abstractive based text summarizer using deep learning from the scratch in python using keras
Stars: ✭ 117 (+0.86%)
Mutual labels:  jupyter-notebook
Sfd.pytorch
S3FD: single shot face detector in pytorch
Stars: ✭ 116 (+0%)
Mutual labels:  jupyter-notebook
Deeplearningmodels
Stars: ✭ 116 (+0%)
Mutual labels:  jupyter-notebook
Ruijin round1
瑞金医院MMC人工智能辅助构建知识图谱大赛初赛
Stars: ✭ 117 (+0.86%)
Mutual labels:  jupyter-notebook
Blog
Source code for my personal blog
Stars: ✭ 117 (+0.86%)
Mutual labels:  jupyter-notebook
Snns
Tutorials and implementations for "Self-normalizing networks"
Stars: ✭ 1,525 (+1214.66%)
Mutual labels:  jupyter-notebook

Speech Command Recognition

A Keras implementation of neural attention model for speech command recognition

This repository presents a recurrent attention model designed to identify keywords in short segments of audio. It has been tested using the Google Speech Command Datasets (v1 and v2). For a complete description of the architecture, please refer to our paper.

Our main contributions are:

  • A small footprint model (201K trainable parameters) that outperforms convolutional architectures for speech command recognition (AKA keyword spotting);
  • A SpeechGenerator.py script that enables loading WAV files saved in .npy format from disk (like a Keras image generator, but for audio files);
  • Attention outputs that make the model explainable (i.e., it is possible to identify what part of the audio was important to reach a conclusion).

Attention Model

One usual problem with deep learning models is that they are usually "black-box" in the sense that it is very difficult to explain why the model reaches a certain decision. Attention is a powerful tool to make deep neural network models explainable: the picture below demonstrates that the transition from phoneme /a/ to phoneme /i/ is the most relevant part of the audio that the model used to decide (correctly) that the word is "right". Please refer to our paper for confusion matrix and more attention plots.

Attention for word Right

How to use this code

The Demo notebook is preconfigured with a set of tasks: ['12cmd', 'leftright', '35word', '20cmd']. Each of these refer to how many commands should be recognized by the model. When loading the Google Speech Dataset, the user should also select which version to download and use by adjusting the following line:

gscInfo, nCategs = SpeechDownloader.PrepareGoogleSpeechCmd(version=1, task = '35word')

If you want a pretrained model, model-attRNN.h5 contains pre-trained weights for task 35word, version=2.

Cloning this repository

  • Download or clone this repository;
  • Open the Demo notebook;
  • Choose how many words should be recognized and the Google Speech Dataset version to use;
  • Run training and tests.

Using Google Colab

Google Colaboratory is an amazing tool for experimentation using a Jupyter Notebook environment with GPUs.

  • Open Colab: https://colab.research.google.com/ ;
  • Download and upload the notebood Speech_Recog_Demo.ipynb to Colab, then open it;
  • Enable GPU acceleration in menu Edit -> Notebook settings;
  • Set useColab = True;
  • Choose how many words should be recognized and the Google Speech Dataset version to use;
  • Run training and tests.

Train with your own data

If you want to train with your own data:

  • Use the audioUtily.py WAV2Numpy function to save your WAV files in numpy format. This speeds up loading considerably;
  • Create a list_IDs array containing the paths to all the numpy files and a labels array with corresponding labels (already converted to integers);
  • Instantiate a SpeechGenerator.py SpeechGen class;
  • Create your own Keras model for audio classification or use one provided in SpeechModels.py;
  • Train the model.

Final Words

We would like to thank Google for making such a great speech dataset available for public use, for making Colab available and for hosting the Kaggle competition Tensorflow Speech Recognition Challenge.

If you find this code useful, please cite our work:

@ARTICLE{2018arXiv180808929C,
   author = {{Coimbra de Andrade}, D. and {Leo}, S. and {Loesener Da Silva Viana}, M. and 
	{Bernkopf}, C.},
    title = "{A neural attention model for speech command recognition}",
  journal = {ArXiv e-prints},
archivePrefix = "arXiv",
   eprint = {1808.08929},
 keywords = {Electrical Engineering and Systems Science - Audio and Speech Processing, Computer Science - Sound},
     year = 2018,
    month = aug,
   adsurl = {http://adsabs.harvard.edu/abs/2018arXiv180808929C},
  adsnote = {Provided by the SAO/NASA Astrophysics Data System}
}
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].