All Projects → dspavankumar → Keras Kaldi

dspavankumar / Keras Kaldi

Licence: gpl-3.0
Keras Interface for Kaldi ASR

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Keras Kaldi

Vosk Api
Offline speech recognition API for Android, iOS, Raspberry Pi and servers with Python, Java, C# and Node
Stars: ✭ 1,357 (+994.35%)
Mutual labels:  deep-neural-networks, speech-recognition
Hey Jetson
Deep Learning based Automatic Speech Recognition with attention for the Nvidia Jetson.
Stars: ✭ 161 (+29.84%)
Mutual labels:  deep-neural-networks, speech-recognition
Voice activity detection
Voice Activity Detection based on Deep Learning & TensorFlow
Stars: ✭ 132 (+6.45%)
Mutual labels:  deep-neural-networks, speech-recognition
Speech To Text Benchmark
speech to text benchmark framework
Stars: ✭ 481 (+287.9%)
Mutual labels:  deep-neural-networks, speech-recognition
Kur
Descriptive Deep Learning
Stars: ✭ 811 (+554.03%)
Mutual labels:  deep-neural-networks, speech-recognition
Speech Emotion Analyzer
The neural network model is capable of detecting five different male/female emotions from audio speeches. (Deep Learning, NLP, Python)
Stars: ✭ 633 (+410.48%)
Mutual labels:  deep-neural-networks, speech-recognition
Pytorch Kaldi
pytorch-kaldi is a project for developing state-of-the-art DNN/RNN hybrid speech recognition systems. The DNN part is managed by pytorch, while feature extraction, label computation, and decoding are performed with the kaldi toolkit.
Stars: ✭ 2,097 (+1591.13%)
Mutual labels:  deep-neural-networks, speech-recognition
Deep Learning Drizzle
Drench yourself in Deep Learning, Reinforcement Learning, Machine Learning, Computer Vision, and NLP by learning from these exciting lectures!!
Stars: ✭ 9,717 (+7736.29%)
Mutual labels:  deep-neural-networks, speech-recognition
Chinese Speech To Text
Chinese Speech To Text Using Wavenet
Stars: ✭ 124 (+0%)
Mutual labels:  deep-neural-networks, speech-recognition
Nonautoreggenprogress
Tracking the progress in non-autoregressive generation (translation, transcription, etc.)
Stars: ✭ 118 (-4.84%)
Mutual labels:  speech-recognition
Lenet 5
PyTorch implementation of LeNet-5 with live visualization
Stars: ✭ 122 (-1.61%)
Mutual labels:  deep-neural-networks
Onnx
Open standard for machine learning interoperability
Stars: ✭ 11,829 (+9439.52%)
Mutual labels:  deep-neural-networks
Deephyper
DeepHyper: Scalable Asynchronous Neural Architecture and Hyperparameter Search for Deep Neural Networks
Stars: ✭ 117 (-5.65%)
Mutual labels:  deep-neural-networks
Trainer Mac
Trains a model, then generates a complete Xcode project that uses it - no code necessary
Stars: ✭ 122 (-1.61%)
Mutual labels:  deep-neural-networks
Ml Fraud Detection
Credit card fraud detection through logistic regression, k-means, and deep learning.
Stars: ✭ 117 (-5.65%)
Mutual labels:  deep-neural-networks
Wer are we
Attempt at tracking states of the arts and recent results (bibliography) on speech recognition.
Stars: ✭ 1,684 (+1258.06%)
Mutual labels:  speech-recognition
Tenginekit
TengineKit - Free, Fast, Easy, Real-Time Face Detection & Face Landmarks & Face Attributes & Hand Detection & Hand Landmarks & Body Detection & Body Landmarks & Iris Landmarks & Yolov5 SDK On Mobile.
Stars: ✭ 2,103 (+1595.97%)
Mutual labels:  deep-neural-networks
Tfg Voice Conversion
Deep Learning-based Voice Conversion system
Stars: ✭ 115 (-7.26%)
Mutual labels:  deep-neural-networks
Pytorch Asr
ASR with PyTorch
Stars: ✭ 124 (+0%)
Mutual labels:  speech-recognition
Pointwise
Code for Pointwise Convolutional Neural Networks, CVPR 2018
Stars: ✭ 123 (-0.81%)
Mutual labels:  deep-neural-networks

Keras Interface for Kaldi ASR

Why these Routines?

This code interfaces Kaldi tools for Speech Recognition and Keras tools for Deep Learning. Keras simplifies the latest deep learning implementations, unifies the two popular Theano and Tensorflow libraries, and has a growing user base. Kaldi, one of the best tools for ASR, thus needs an interface with Keras tools, and here is one. This code directly interacts with Kaldi style directories of data and alignments to build and test Deep Learning models in Keras.

Features

  1. Trains DNNs from Kaldi GMM system

  2. Works with standard Kaldi data and alignment directories

  3. Supports mini-batch training

  4. Supports LSTMs, maxout and dropout training

  5. Easily extendable to other deep learning implementations in Keras

  6. Decodes test utterances in Kaldi style

Dependencies

  1. Python 3.4+

  2. Keras with Tensorflow/Theano backend

  3. Kaldi

Using the Code

Train a GMM system in Kaldi. Place steps_kt and run_kt.sh in the working directory. Configure and run run_kt.sh. To train LSTMs, run run_kt_LSTM.sh.

Code Components

  1. train.py is the Keras training script. DNN structure (type of network, activations, number of hidden layers and nodes) can be configured in this script. train_LSTM.py trains LSTMs.

  2. dataGenerator.py provides an object that reads Kaldi data and alignment directories in batches and retrieves mini-batches for training. dataGenSequences.py retrieves 3D mini-batches for LSTM training.

  3. nnet-forward.py passes test features through the trained DNNs and outputs log probabilities (log of DNN outputs) in Kaldi format. nnet-forward-seq.py passes 3D arrays to LSTMs and outputs log probabilities.

  4. kaldiIO.py reads and writes Kaldi-type binary features.

  5. decode.py is the decoding script. decode_seq.py is the script for LSTMs.

  6. align.sh is the alignment script.

  7. compute_priors.py computes priors.

  8. saveModelNnet3.sh and saveModelNnet3Raw.py convert the trained feedforward DNNs into Kaldi's nnet3 format. They currently have limited functionality.

Training Schedule

The script uses stochastic gradient descent with 0.5 momentum. It starts with a learning rate of 0.1 for a minimum of 5 iterations. When the validation loss reduces by less than 0.002 between successive iterations, learning rate is halved, and is contined to be halved after each epoch, 18 times.

Results on Timit Phone Recognition

Timit database of 8 kHz sampling rate was used to train monophone, triphone (300 pdfs), LDA+MLLT (500 pdfs), DNN and LSTM models. Phone error rates are as follows:

  1. Monophone: 34.25%

  2. Triphone: 30.44%

  3. LDA+MLLT: 27.03%

  4. DNN (3 hidden layers of 1024 nodes, ReLU activations): 23.71%

  5. LSTM (3 hidden layers of 256 nodes, Tanh activations, LDA+MLLT alignments): 23.02%

Results on WSJ Corpus

WSJ SI-284 corpus of 8 kHz sampling rate was used to train monophone, triphone (1000 pdfs), DNN and LSTM models. Word error rates are as follows:

  1. Monophone - dev93: 37.76%, eval92: 27.95%

  2. Triphone - dev93: 23.78%, eval92: 16.37%

  3. DNN (3 hidden layers of 1024 nodes, ReLU activations) - dev93: 13.50%, eval92: 9.16%

  4. LSTM (3 hidden layers of 256 nodes, ReLU activations) - dev93: 13.25%, eval92: 9.16%

Notes

  1. If using ReLU activations in LSTM, use Tensorflow backend.

  2. Initialise Tensorflow with the correct GPU memory fraction.

Contributors

D S Pavan Kumar

dspavankumar [at] gmail [dot] com

##Acknowledgements

Thanks to Dan Povey, Ram Sundaram, Naresh Kumar, Tejas Godambe and Veera Raghavendra for suggesting improvements and debugging.

License

GNU GPL v3

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].