Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → douglas125 → Speechcmdrecognition

douglas125 / Speechcmdrecognition

Licence: mit

A neural attention model for speech command recognition

Labels

jupyter-notebook

Projects that are alternatives of or similar to Speechcmdrecognition

Rethinking Pyro

Statistical Rethinking with PyTorch and Pyro

Stars: ✭ 116 (+0%)

Mutual labels: jupyter-notebook

Advanced training

Advanced Scikit-learn training session

Stars: ✭ 116 (+0%)

Mutual labels: jupyter-notebook

Theseus growth

Theseus is a Python library for cohort analysis

Stars: ✭ 117 (+0.86%)

Mutual labels: jupyter-notebook

Algorithms With Python

Solving the fundamentals of algorithms using Python

Stars: ✭ 116 (+0%)

Mutual labels: jupyter-notebook

Exploringdatawithpython

Methods of data exploration and visualization using Python.

Stars: ✭ 116 (+0%)

Mutual labels: jupyter-notebook

Python Ecology Lesson

Data Analysis and Visualization in Python for Ecologists

Stars: ✭ 116 (+0%)

Mutual labels: jupyter-notebook

Pybatfish

Python client for Batfish: https://github.com/batfish/batfish

Stars: ✭ 116 (+0%)

Mutual labels: jupyter-notebook

Objectdetection

Some experiments with object detection in PyTorch

Stars: ✭ 117 (+0.86%)

Mutual labels: jupyter-notebook

Demo Docker

Demo notebooks inside a docker for end-to-end examples

Stars: ✭ 116 (+0%)

Mutual labels: jupyter-notebook

Underwater detection

2020年全国水下机器人（湛江）大赛

Stars: ✭ 116 (+0%)

Mutual labels: jupyter-notebook

Dogbreed gluon

kaggle Dog Breed Identification

Stars: ✭ 116 (+0%)

Mutual labels: jupyter-notebook

Pygoturn

PyTorch implementation of GOTURN object tracker: Learning to Track at 100 FPS with Deep Regression Networks (ECCV 2016)

Stars: ✭ 116 (+0%)

Mutual labels: jupyter-notebook

Data Science 45min Intros

Ipython notebook presentations for getting starting with basic programming, statistics and machine learning techniques

Stars: ✭ 1,513 (+1204.31%)

Mutual labels: jupyter-notebook

Ds bowl 2018

Kaggle Data Science Bowl 2018

Stars: ✭ 116 (+0%)

Mutual labels: jupyter-notebook

How To Build Own Text Summarizer Using Deep Learning

In this notebook, we will build an abstractive based text summarizer using deep learning from the scratch in python using keras

Stars: ✭ 117 (+0.86%)

Mutual labels: jupyter-notebook

Sfd.pytorch

S3FD: single shot face detector in pytorch

Stars: ✭ 116 (+0%)

Mutual labels: jupyter-notebook

Deeplearningmodels

Stars: ✭ 116 (+0%)

Mutual labels: jupyter-notebook

Ruijin round1

瑞金医院MMC人工智能辅助构建知识图谱大赛初赛

Stars: ✭ 117 (+0.86%)

Mutual labels: jupyter-notebook

Blog

Source code for my personal blog

Stars: ✭ 117 (+0.86%)

Mutual labels: jupyter-notebook

Snns

Tutorials and implementations for "Self-normalizing networks"

Stars: ✭ 1,525 (+1214.66%)

Mutual labels: jupyter-notebook

View All Similar Projects ➔

Speech Command Recognition

A Keras implementation of neural attention model for speech command recognition

This repository presents a recurrent attention model designed to identify keywords in short segments of audio. It has been tested using the Google Speech Command Datasets (v1 and v2). For a complete description of the architecture, please refer to our paper.

Our main contributions are:

A small footprint model (201K trainable parameters) that outperforms convolutional architectures for speech command recognition (AKA keyword spotting);
A SpeechGenerator.py script that enables loading WAV files saved in .npy format from disk (like a Keras image generator, but for audio files);
Attention outputs that make the model explainable (i.e., it is possible to identify what part of the audio was important to reach a conclusion).

Attention Model

One usual problem with deep learning models is that they are usually "black-box" in the sense that it is very difficult to explain why the model reaches a certain decision. Attention is a powerful tool to make deep neural network models explainable: the picture below demonstrates that the transition from phoneme /a/ to phoneme /i/ is the most relevant part of the audio that the model used to decide (correctly) that the word is "right". Please refer to our paper for confusion matrix and more attention plots.

How to use this code

The Demo notebook is preconfigured with a set of tasks: ['12cmd', 'leftright', '35word', '20cmd']. Each of these refer to how many commands should be recognized by the model. When loading the Google Speech Dataset, the user should also select which version to download and use by adjusting the following line:

gscInfo, nCategs = SpeechDownloader.PrepareGoogleSpeechCmd(version=1, task = '35word')

If you want a pretrained model, model-attRNN.h5 contains pre-trained weights for task 35word, version=2.

Cloning this repository

Download or clone this repository;
Open the Demo notebook;
Choose how many words should be recognized and the Google Speech Dataset version to use;
Run training and tests.

Using Google Colab

Google Colaboratory is an amazing tool for experimentation using a Jupyter Notebook environment with GPUs.

Open Colab: https://colab.research.google.com/ ;
Download and upload the notebood Speech_Recog_Demo.ipynb to Colab, then open it;
Enable GPU acceleration in menu Edit -> Notebook settings;
Set useColab = True;
Choose how many words should be recognized and the Google Speech Dataset version to use;
Run training and tests.

Train with your own data

If you want to train with your own data:

Use the audioUtily.py WAV2Numpy function to save your WAV files in numpy format. This speeds up loading considerably;
Create a list_IDs array containing the paths to all the numpy files and a labels array with corresponding labels (already converted to integers);
Instantiate a SpeechGenerator.py SpeechGen class;
Create your own Keras model for audio classification or use one provided in SpeechModels.py;
Train the model.

Final Words

We would like to thank Google for making such a great speech dataset available for public use, for making Colab available and for hosting the Kaggle competition Tensorflow Speech Recognition Challenge.

If you find this code useful, please cite our work:

@ARTICLE{2018arXiv180808929C,
   author = {{Coimbra de Andrade}, D. and {Leo}, S. and {Loesener Da Silva Viana}, M. and 
	{Bernkopf}, C.},
    title = "{A neural attention model for speech command recognition}",
  journal = {ArXiv e-prints},
archivePrefix = "arXiv",
   eprint = {1808.08929},
 keywords = {Electrical Engineering and Systems Science - Audio and Speech Processing, Computer Science - Sound},
     year = 2018,
    month = aug,
   adsurl = {http://adsabs.harvard.edu/abs/2018arXiv180808929C},
  adsnote = {Provided by the SAO/NASA Astrophysics Data System}
}

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 116

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (0) 🔗