Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → mostafaelaraby → Tensorflow-Keyword-Spotting

mostafaelaraby / Tensorflow-Keyword-Spotting

Licence: Apache-2.0 license

Keyword spotting using various architecture like convolutional vggnet , 1D convolutional network and CTC.

Programming Languages

139335 projects - #7 most used programming language

Labels

tensorflow speech-recognition

Projects that are alternatives of or similar to Tensorflow-Keyword-Spotting

audio processing module for pytorch:stft, istft

Stars: ✭ 33 (+22.22%)

Mutual labels: speech-recognition

kaldi-long-audio-alignment

Long audio alignment using Kaldi

Stars: ✭ 21 (-22.22%)

Mutual labels: speech-recognition

A PyTorch implementation of the Deep Audio-Visual Speech Recognition paper.

Stars: ✭ 104 (+285.19%)

Mutual labels: speech-recognition

syn-speech-samples

An application that demostrate the usage of Syn.Speech library for Speech Recognition

Stars: ✭ 24 (-11.11%)

Mutual labels: speech-recognition

Unity live caption

Use Google Speech-to-Text API to do real-time live stream caption on Unity! Best when combined with your virtual character!

Stars: ✭ 26 (-3.7%)

Mutual labels: speech-recognition

An React client library for Speechly API

Stars: ✭ 71 (+162.96%)

Mutual labels: speech-recognition

Chinese-automatic-speech-recognition

Chinese speech recognition

Stars: ✭ 147 (+444.44%)

Mutual labels: speech-recognition

Deep-learning-And-Paper

【仅作为交流学习使用】机器智能--相关书目及经典论文包括AutoML、情感分类、语音识别、声纹识别、语音合成实验代码等

Stars: ✭ 62 (+129.63%)

Mutual labels: speech-recognition

Speech-to-text based on wav2letter built for transfer learning

Stars: ✭ 92 (+240.74%)

Mutual labels: speech-recognition

Small Robot, Toy Robot platform

Stars: ✭ 29 (+7.41%)

Mutual labels: speech-recognition

open-speech-corpora

💎 A list of accessible speech corpora for ASR, TTS, and other Speech Technologies

Stars: ✭ 841 (+3014.81%)

Mutual labels: speech-recognition

Useful resources for Mongolian NLP

Stars: ✭ 119 (+340.74%)

Mutual labels: speech-recognition

On-device voice activity detection (VAD) powered by deep learning.

Stars: ✭ 76 (+181.48%)

Mutual labels: speech-recognition

Production First and Production Ready End-to-End Speech Recognition Toolkit

Stars: ✭ 2,384 (+8729.63%)

Mutual labels: speech-recognition

deepspeech.mxnet

A MXNet implementation of Baidu's DeepSpeech architecture

Stars: ✭ 82 (+203.7%)

Mutual labels: speech-recognition

favorite-research-papers

Listing my favorite research papers 📝 from different fields as I read them.

Stars: ✭ 12 (-55.56%)

Mutual labels: speech-recognition

srvk-eesen-offline-transcriber

Top level code to transcribe English audio/video files into text/subtitles

Stars: ✭ 22 (-18.52%)

Mutual labels: speech-recognition

A chronology of deep learning

Tracing back and exposing in chronological order the main ideas in the field of deep learning, to help everyone better understand the current intense research in AI.

Stars: ✭ 47 (+74.07%)

Mutual labels: speech-recognition

Node.js SDK for the Rev AI API

Stars: ✭ 21 (-22.22%)

Mutual labels: speech-recognition

Spokestack: give your iOS app a voice interface!

Stars: ✭ 27 (+0%)

Mutual labels: speech-recognition

View All Similar Projects ➔

TensorFlow Keyword Spotting

This project used to train a model which will be used to spot a set of specific keywords from input classes list, it used several techniques like wav data augmentation (time shift addition of background noise , speed and stretching of input frequency) with several type of models like baseline Conv , VGGNET and CTC model

Prerequisites

You will need python 2.7/3.0 and TensorFlow 1.4

Getting Started

You will need to write a yaml configuration file same as the configs for different models in example_config folder.

Parameters

general_params

seed used to randomize input data and input batches also used in the augmentation (randomly select the augmentation technique for each wav and its value)
unknown_percentage percentage of unknowns in the training set (keywords that are not in the classes set)
silence_percentage percentage of wavs not having a keyword to be spotted just background noise
validation_percentage percentage of validation set
testing_percentage percentage of test set in case of not using a testing list and having mode = 'test'

Paths

used to specify data paths same format of Speech Commands dataset each folder containing a set of wav and the parent folder name specify the keyword , and needs to specify model path tmp directory for logs and the test set path that needs prediction and the background noise folder name.

classes

the keywords that needs to be spotted from the provided dataset any other keywords available in the dataset will be treated as unknown

wav_reading_params

used to specify sampling rate (of input wav files ), time shift in millisecond , training clips duration in milliseconds , window size milliseconds , window stride milliseconds and finger print type (mfcc,mel and log_mel) and ctc flag to denote using of ctc or not.

model

used to specify model used Baseline as in http://www.isca-speech.org/archive/interspeech_2015/papers/i15_1478.pdf -
VGGNET workigng on 2D fingerprints
LSTM for CTC models

model_params

loss could be crossentropy or ctc for ctc models
Optimizedr can be SGD , Adam , Adagrad and Momentum
training_steps number of training list as string 1000,2000
learning_rate list of learning rate at each starting step from training_step list corresponding for example 0.01,0.001 will train till step 1K with learning 0.01 and from step 1K to 3K using 0.001 learning rate
save_eval_step_interval interval of train steps to save a checkpoint and evaluate
dropout keep probability dropout
batch_size size of training batch
rnd_mini_batches True to use random batches as the one used in https://github.com/tensorflow/tensorflow/blob/57b32eabca4597241120cb4aba8308a431853c30/tensorflow/examples/speech_commands/input_data.py#L398 False to ensure iterating over the whole dataset

augmentation

Ops : operations to be used speed and stretch
percentage of augmentation for each available class

mode

test to test your models and train to use all data

Results

using speech commands test set data

baseline.yml config test accuracy 90.4%
vggnet.yml config test accuracy 92.3%

TODO

Language model for CTC
Add Resnet model

References

CTC Tensorflow https://github.com/philipperemy/tensorflow-ctc-speech-recognition
Speech Recognition Tutorial by tensorflow https://www.tensorflow.org/versions/master/tutorials/audio_recognition

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 27

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (1) 🔗