subho406 / TF-Speech-Recognition-Challenge-Solution

Licence: GPL-3.0 license

Source code of the model used in Tensorflow Speech Recognition Challenge (https://www.kaggle.com/c/tensorflow-speech-recognition-challenge). The solution ranked in top 5% in private leaderboard.

Programming Languages

Jupyter Notebook

11667 projects

python

139335 projects - #7 most used programming language

shell

77523 projects

Projects that are alternatives of or similar to TF-Speech-Recognition-Challenge-Solution

Pytorch Kaldi

pytorch-kaldi is a project for developing state-of-the-art DNN/RNN hybrid speech recognition systems. The DNN part is managed by pytorch, while feature extraction, label computation, and decoding are performed with the kaldi toolkit.

Stars: ✭ 2,097 (+3515.52%)

Mutual labels: speech, recurrent-neural-networks, speech-recognition

handson-ml

도서 "핸즈온 머신러닝"의 예제와 연습문제를 담은 주피터 노트북입니다.

Stars: ✭ 285 (+391.38%)

Mutual labels: scikit-learn, recurrent-neural-networks, ensemble-learning

pycobra

python library implementing ensemble methods for regression, classification and visualisation tools including Voronoi tesselations.

Stars: ✭ 111 (+91.38%)

Mutual labels: scikit-learn, ensemble-learning

audio noise clustering

https://dodiku.github.io/audio_noise_clustering/results/ ==> An experiment with a variety of clustering (and clustering-like) techniques to reduce noise on an audio speech recording.

Stars: ✭ 24 (-58.62%)

Mutual labels: scikit-learn, speech

python-machine-learning-book-2nd-edition

<머신러닝 교과서 with 파이썬, 사이킷런, 텐서플로>의 코드 저장소

Stars: ✭ 60 (+3.45%)

Mutual labels: scikit-learn, recurrent-neural-networks

Edgedict

Working online speech recognition based on RNN Transducer. ( Trained model release available in release )

Stars: ✭ 205 (+253.45%)

Mutual labels: speech, speech-recognition

Speechbrain.github.io

The SpeechBrain project aims to build a novel speech toolkit fully based on PyTorch. With SpeechBrain users can easily create speech processing systems, ranging from speech recognition (both HMM/DNN and end-to-end), speaker recognition, speech enhancement, speech separation, multi-microphone speech processing, and many others.

Stars: ✭ 242 (+317.24%)

Mutual labels: speech, speech-recognition

Machine-learning-toolkits-with-python

Machine learning toolkits with Python

Stars: ✭ 31 (-46.55%)

Mutual labels: scikit-learn, ensemble-learning

Libfaceid

libfaceid is a research framework for prototyping of face recognition solutions. It seamlessly integrates multiple detection, recognition and liveness models w/ speech synthesis and speech recognition.

Stars: ✭ 354 (+510.34%)

Mutual labels: scikit-learn, speech-recognition

Xcessiv

A web-based application for quick, scalable, and automated hyperparameter tuning and stacked ensembling in Python.

Stars: ✭ 1,255 (+2063.79%)

Mutual labels: scikit-learn, ensemble-learning

idear

🎙️ Handsfree Audio Development Interface

Stars: ✭ 84 (+44.83%)

Mutual labels: speech, speech-recognition

Lingvo

Stars: ✭ 2,361 (+3970.69%)

Mutual labels: speech, speech-recognition

Speechtotext Websockets Javascript

SDK & Sample to do speech recognition using websockets in Javascript

Stars: ✭ 191 (+229.31%)

Mutual labels: speech, speech-recognition

imbalanced-ensemble

Class-imbalanced / Long-tailed ensemble learning in Python. Modular, flexible, and extensible. | 模块化、灵活、易扩展的类别不平衡/长尾机器学习库

Stars: ✭ 199 (+243.1%)

Mutual labels: scikit-learn, ensemble-learning

End2end Asr Pytorch

End-to-End Automatic Speech Recognition on PyTorch

Stars: ✭ 175 (+201.72%)

Mutual labels: speech, speech-recognition

Stacking

Stacked Generalization (Ensemble Learning)

Stars: ✭ 173 (+198.28%)

Mutual labels: scikit-learn, ensemble-learning

Allosaurus

Allosaurus is a pretrained universal phone recognizer for more than 2000 languages

Stars: ✭ 135 (+132.76%)

Mutual labels: speech, speech-recognition

Tacotron asr

Speech Recognition Using Tacotron

Stars: ✭ 165 (+184.48%)

Mutual labels: speech, speech-recognition

Autogluon

AutoGluon: AutoML for Text, Image, and Tabular Data

Stars: ✭ 3,920 (+6658.62%)

Mutual labels: scikit-learn, ensemble-learning

Dat8

General Assembly's 2015 Data Science course in Washington, DC

Stars: ✭ 1,516 (+2513.79%)

Mutual labels: scikit-learn, ensemble-learning

View All Similar Projects ➔

TF Speech Recognition Challenge

Tensorflow Speech Recognition Challenge was a Kaggle competition organised by Google Brain to use the Speech Commands Dataset to build an algorithm that understands simple spoken commands. https://www.kaggle.com/c/tensorflow-speech-recognition-challenge

This solution achieved a rank of 63 on private leaderboard (top 5%).

Project Structure

data
- raw
  - train (Training audio files)
  - test (Test audio files used for evaluation
libs
- classification (All scripts used for training and evaluation)
notebooks
scripts (Executable scripts)
models (Pretrained Models)

Requirements

Tensorflow 1.4
librosa
scikit-learn
Python 3.x

Running

Download the Speech Commands Dataset and extract the dataset in the train folder. Test Audio can be placed in data/test/audio folder.

The notebooks can be run individually using Jupyter. To run the scripts from command line edit the notebooks using Jupyter and run:

./script/execute_notebook.py

and select the notebook to run. The results are stored in results/notebook_name.log

P0 Predict Test WAV.ipynb can be used to predict audio files using a trained graphdef model.

Architecture

Models used

A variant of Convolutional LSTM (https://arxiv.org/pdf/1610.00277.pdf)
LSTM-L (https://arxiv.org/pdf/1711.07128.pdf)
C-RNN (https://arxiv.org/pdf/1711.07128.pdf)
GRU-L (https://arxiv.org/pdf/1711.07128.pdf)
Resnet

Training

The model was trained using a GCP instance with the following specifications:

NVIDIA Tesla P100 X 1
16 GB RAM
35 GB SSD

Most of the models converged in 30k steps. Pseudo Labelling on test data was used to improve the model performance.

Prediction

The final model was a ensemble 13 models. Weighted Averaging and Stacking was used to generate the final predictions.

Aknowledgements

ML-KWS-for-MCU (https://github.com/ARM-software/ML-KWS-for-MCU)
Very Deep Convolutional Neural Network for Robust Speech Recognition (https://arxiv.org/pdf/1610.00277.pdf)
Speech Commands Dataset (https://research.googleblog.com/2017/08/launching-speech-commands-dataset.html)

If you like this project or have any queries don't hesitate to send an email to [email protected]

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

subho406 / TF-Speech-Recognition-Challenge-Solution

Programming Languages

Labels

Projects that are alternatives of or similar to TF-Speech-Recognition-Challenge-Solution

TF Speech Recognition Challenge

Project Structure

Requirements

Running

Architecture

Models used

Training

Prediction

Aknowledgements