All Projects → yao-matrix → deepSpeech2

yao-matrix / deepSpeech2

Licence: BSD-3-Clause license
End-to-end speech recognition using TensorFlow

Programming Languages

python
139335 projects - #7 most used programming language
shell
77523 projects

Projects that are alternatives of or similar to deepSpeech2

URT
Fast Unit Root Tests and OLS regression in C++ with wrappers for R and Python
Stars: ✭ 70 (+45.83%)
Mutual labels:  mkl
mkl fft
NumPy-based Python interface to Intel (R) MKL FFT functionality
Stars: ✭ 52 (+8.33%)
Mutual labels:  mkl
intel-mkl-src
Redistribute Intel MKL as a crate
Stars: ✭ 52 (+8.33%)
Mutual labels:  mkl
MKLSparse.jl
Make available to Julia the sparse functionality in MKL
Stars: ✭ 42 (-12.5%)
Mutual labels:  mkl
daany
Daany - .NET DAta ANalYtics .NET library with the implementation of DataFrame, Time series decompositions and Linear Algebra routines BLASS and LAPACK.
Stars: ✭ 49 (+2.08%)
Mutual labels:  mkl
Neon
Intel® Nervana™ reference deep learning framework committed to best performance on all hardware
Stars: ✭ 3,855 (+7931.25%)
Mutual labels:  mkl
Arch-Data-Science
Archlinux PKGBUILDs for Data Science, Machine Learning, Deep Learning, NLP and Computer Vision
Stars: ✭ 92 (+91.67%)
Mutual labels:  mkl
monolish
monolish: MONOlithic LInear equation Solvers for Highly-parallel architecture
Stars: ✭ 166 (+245.83%)
Mutual labels:  mkl
sparse dot
Python wrapper for Intel Math Kernel Library (MKL) matrix multiplication
Stars: ✭ 38 (-20.83%)
Mutual labels:  mkl
DynAdjust
Least squares adjustment software
Stars: ✭ 43 (-10.42%)
Mutual labels:  mkl
dpnp
NumPy drop-in replacement for Intel(R) XPUs
Stars: ✭ 42 (-12.5%)
Mutual labels:  mkl
MLab
“云上炼丹师”中的云
Stars: ✭ 54 (+12.5%)
Mutual labels:  mkl
mkl-service
Python hooks for Intel(R) Math Kernel Library runtime control settings.
Stars: ✭ 45 (-6.25%)
Mutual labels:  mkl
deepspeech
A PyTorch implementation of DeepSpeech and DeepSpeech2.
Stars: ✭ 45 (-6.25%)
Mutual labels:  deepspeech2

TensorFlow implementation of DeepSpeech2

End-to-end speech recognition using TensorFlow

This repository contains TensorFlow code for an end-to-end speech recognition engine by implementing Baidu's DeepSpeech2 model on IA architectures. This work was based on the code developed by Ford[https://github.com/fordDeepDSP/deepSpeech] and many changes have been conducted to fin our solution.

This software is released under a BSD license. The license to this software does not apply to TensorFlow, which is available under the Apache 2.0 license, or the third party pre-requisites listed below, which are available under their own respective licenses.

Pre-requisites

  • TensorFlow - version: 1.1.0, 1.2.0
  • Python - version: 2.7
  • python-levenshtein - to compute Character-Error-Rate
  • python_speech_features - to generate mfcc features
  • PySoundFile - to read FLAC files
  • scipy - helper functions for windowing
  • tqdm - for displaying a progress bar

Getting started

Step 1: Install all dependencies.

$ yum install libsndfile
$ pip install python-Levenshtein
$ pip install python_speech_features
$ pip install PySoundFile
$ pip install scipy
$ pip install tqdm

# Install TensorFlow 1.2.0:
$ pip install 'tensorflow==1.2.0'

# [GPU ONLY] Update ~/.bashrc to reflect path for CUDA.
1. Add these lines to the ~/.bashrc:
export LD_LIBRARY_PATH="$LD_LIBRARY_PATH:/usr/local/cuda/lib64:/usr/local/cuda/extras/CUPTI/lib64"
export CUDA_HOME=/usr/local/cuda
2. Install TF GPU package
$ pip install --upgrade 'tensorflow-gpu==1.2.0'

Step 2: Clone this git repo.

$ git clone https://github.com/yao-matrix/deepSpeech2.git
$ cd deepSpeech

Preprocessing the data

Step 1: Download and unpack the LibriSpeech data

Inside the github repo that you have cloned run:

$ mkdir -p data/librispeech
$ cd data/librispeech
$ wget http://www.openslr.org/resources/12/train-clean-100.tar.gz
$ wget http://www.openslr.org/resources/12/dev-clean.tar.gz
$ wget http://www.openslr.org/resources/12/test-clean.tar.gz
$ mkdir audio
$ cd audio
$ tar xvzf ../train-clean-100.tar.gz LibriSpeech/train-clean-100 --strip-components=1
$ tar xvzf ../dev-clean.tar.gz LibriSpeech/dev-clean  --strip-components=1
$ tar xvzf ../test-clean.tar.gz LibriSpeech/test-clean  --strip-components=1
# delete audios which are too short
$ rm -rf LibriSpeech/train-clean-100/1578/6379/1578-6379-0029.flac
$ rm -rf LibriSpeech/train-clean-100/460/172359/460-172359-0090.flac

Step 2: Run this command to preprocess the audio and generate TFRecord files.

The computed mfcc features will be stored within TFRecords files inside data/librispeech/processed/

$ cd ./src
$ python preprocess_LibriSpeech.py

Training a model w/ dummy data

$ cd ./src
$ vim ./train.sh
# let dummy=1 in train.sh
$ ./train.sh

Training a model w/ real data

# To continue training from a saved checkpoint file
$ cd ./src
$ vim ./train.sh
# let dummy=0 in train.sh
$ ./train.sh

The script train.sh contains commands to train on utterances in sorted order for the first epoch and then to resume training on shuffled utterances. Note that during the first epoch, the cost will increase and it will take longer to train on later steps because the utterances are presented in sorted order to the network.

Monitoring training

Since the training data is fed through a shuffled queue, to check validation loss a separate graph needs to be set up in a different session. This graph is fed with the valildation data to compute predictions. The deepSpeech_test.py script initializes the graph from a previously saved checkpoint file and computes the CER on the eval_data every 5 minutes by default. It saves the computed CER values in the models/librispeech/eval folder. By calling tensorboard with logdir set to models/librispeech, it is possible to monitor validation CER and training loss during training.

$ cd ./src
$ ./validation.sh
$ tensorboard --logdir PATH_TO_SUMMARY

Testing a model

$ cd ./src
$ ./test.sh

Thanks

Thanks to Aswathy for helping refine the README

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].