All Projects → NervanaSystems → Deepspeech

NervanaSystems / Deepspeech

Licence: apache-2.0
DeepSpeech neon implementation

Labels

Projects that are alternatives of or similar to Deepspeech

Pytorch Spynet
a reimplementation of Optical Flow Estimation using a Spatial Pyramid Network in PyTorch
Stars: ✭ 190 (-14.8%)
Mutual labels:  cuda
Cunn
Stars: ✭ 205 (-8.07%)
Mutual labels:  cuda
Tigre
TIGRE: Tomographic Iterative GPU-based Reconstruction Toolbox
Stars: ✭ 215 (-3.59%)
Mutual labels:  cuda
Timemory
Modular C++ Toolkit for Performance Analysis and Logging. Profiling API and Tools for C, C++, CUDA, Fortran, and Python. The C++ template API is essentially a framework to creating tools: it is designed to provide a unifying interface for recording various performance measurements alongside data logging and interfaces to other tools.
Stars: ✭ 192 (-13.9%)
Mutual labels:  cuda
Pine
🌲 Aimbot powered by real-time object detection with neural networks, GPU accelerated with Nvidia. Optimized for use with CS:GO.
Stars: ✭ 202 (-9.42%)
Mutual labels:  cuda
Hip
HIP: C++ Heterogeneous-Compute Interface for Portability
Stars: ✭ 2,609 (+1069.96%)
Mutual labels:  cuda
Nvidia Docker
Build and run Docker containers leveraging NVIDIA GPUs
Stars: ✭ 13,961 (+6160.54%)
Mutual labels:  cuda
Softmax Splatting
an implementation of softmax splatting for differentiable forward warping using PyTorch
Stars: ✭ 218 (-2.24%)
Mutual labels:  cuda
Oneflow
OneFlow is a performance-centered and open-source deep learning framework.
Stars: ✭ 2,868 (+1186.1%)
Mutual labels:  cuda
Genomeworks
SDK for GPU accelerated genome assembly and analysis
Stars: ✭ 215 (-3.59%)
Mutual labels:  cuda
Viseron
Self-hosted NVR with object detection
Stars: ✭ 192 (-13.9%)
Mutual labels:  cuda
Simplegpuhashtable
A simple GPU hash table implemented in CUDA using lock free techniques
Stars: ✭ 198 (-11.21%)
Mutual labels:  cuda
Bohrium
Automatic parallelization of Python/NumPy, C, and C++ codes on Linux and MacOSX
Stars: ✭ 209 (-6.28%)
Mutual labels:  cuda
Ck Caffe
Collective Knowledge workflow for Caffe to automate installation across diverse platforms and to collaboratively evaluate and optimize Caffe-based workloads across diverse hardware, software and data sets (compilers, libraries, tools, models, inputs):
Stars: ✭ 192 (-13.9%)
Mutual labels:  cuda
Nicehashquickminer
Super simple & easy Windows 10 cryptocurrency miner made by NiceHash.
Stars: ✭ 211 (-5.38%)
Mutual labels:  cuda
Macos Egpu Cuda Guide
Set up CUDA for machine learning (and gaming) on macOS using a NVIDIA eGPU
Stars: ✭ 187 (-16.14%)
Mutual labels:  cuda
Amgx
Distributed multigrid linear solver library on GPU
Stars: ✭ 207 (-7.17%)
Mutual labels:  cuda
Pedestrian alignment
TCSVT2018 Pedestrian Alignment Network for Large-scale Person Re-identification
Stars: ✭ 223 (+0%)
Mutual labels:  cuda
Relion
Image-processing software for cryo-electron microscopy
Stars: ✭ 219 (-1.79%)
Mutual labels:  cuda
Haste
Haste: a fast, simple, and open RNN library
Stars: ✭ 214 (-4.04%)
Mutual labels:  cuda

Implementation of Deep Speech 2 in neon

This repository contains an implementation of Baidu SVAIL's Deep Speech 2 model in neon. Much of the model is readily available in mainline neon; to also support the CTC cost function, we have included a neon-compatible wrapper for Baidu's Warp-CTC.

Deep Speech 2 models are computationally intensive, and thus they can require long periods of time to run. Even with near-perfect GPU utilization, the model can take up to 1 week to train on large enough datasets to see respectable performance. Please keep this in mind when exploring this repo.

We have used this code to train models on both the Wall Street Journal (81 hours) and Librispeech (1000 hours) datasets. The WSJ dataset is available through the LDC only; however, Librispeech can be freely acquired from Librispeech corpus.

The model presented here uses a basic argmax-based decoder:

  • Choose the most probable character in each frame
  • Collapse the resulting output string according to CTC's rules: remove repeat characters first, remove blank characters next.

After decoding, you might expect outputs like this when trained on WSJ data:

Ground truth Model output
united presidential is a life insurance company younited presidentiol is a lefe in surance company
that was certainly true last week that was sertainly true last week
we're not ready to say we're in technical default a spokesman said we're now ready to say we're intechnical default a spokesman said

Or outputs like this when trained on Librispeech (see "Decoding and evaluating a trained model"):

Ground truth Model output
this had some effect in calming him this had some offectind calming him
he went in and examined his letters but there was nothing from carrie he went in an examined his letters but there was nothing from carry
the design was different but the thing was clearly the same the design was differampat that thing was clarly the same

Getting Started

  1. neon 2.3.0 and the aeon dataloader (v1.0.0) must both be installed.

  2. Clone the repo: git clone https://github.com/NervanaSystems/deepspeech.git && cd deepspeech.

  3. Within a neon virtualenv, run pip install -r requirements.txt.

  4. Run make to build warp-ctc.

Training a model

1. Prepare a manifest file for your dataset.

The details on how to go about doing this are determined by the specifics of the dataset.

Example: Librispeech recipe

A recipe for ingesting Librispeech data is provided in data/ingest_librispeech.py. Note that Librispeech provides distinct datasets for training and validation, and each set must be ingested separately. Additionally, we'll have to get around the quirky way that the Librispeech data is distributed; after "unpacking" the archives, we should re-pack them in a consistent manner.

To be more precise, Librispeech data is distributed in zipped tar files, e.g. train-clean-100.tar.gz for training and dev-clean.tar.gz for validation. Upon unpacking, each archive creates a directory named LibriSpeech, so trying to unpack both files together in the same directory is a bad idea. To get around this, try something like:

$ mkdir librispeech && cd librispeech
$ wget http://www.openslr.org/resources/12/train-clean-100.tar.gz
$ wget http://www.openslr.org/resources/12/dev-clean.tar.gz
$ tar xvzf dev-clean.tar.gz LibriSpeech/dev-clean  --strip-components=1
$ tar xvzf train-clean-100.tar.gz LibriSpeech/train-clean-100  --strip-components=1

Follow the above prescription and you will have the training data as a subdirectory librispeech/train-clean-100 and the validation data in a subdirectory librispeech/dev-clean. To ingest the data, you would then run the python script on the directory where you've unpacked the clean training data, followed by directions to where you want the script to write the transcripts and training mainfests for that dataset:

$ python data/ingest_librispeech.py <absolute path to train-clean-100 directory> <absolute path to directory to write transcripts to> <absolute path to where to write training manifest to>

For example, if the absolute path to the train-clean-100 directory is located in /usr/local/data/librispeech/train-clean-100, run:

$ python data/ingest_librispeech.py  /usr/local/data/librispeech/train-clean-100  /usr/local/data/librispeech/train-clean-100/transcripts_dir  /usr/local/data/librispeech/train-clean-100/train-manifest.csv

which would create a training manifest file named train-manifest.csv. Similarly, if the absolute path to the dev-clean directory is located at /usr/local/data/librispeech/dev-clean, run:

$ python data/ingest_librispeech.py  /usr/local/data/librispeech/dev-clean  /usr/local/data/librispeech/dev-clean/transcripts_dir  /usr/local/data/librispeech/train-clean-100/val-manifest.csv

To train on the full 1000 hours, execute the same commands for the 360 hour and 540 hour training datasets as well. The manifest files can then be concatenated with a simple:

$ cat /path/to/100_hour_manifest.csv /path/to/360_hour_manifest.csv /path/to/540_hour_manifest.csv > /path/to/1000_hour_manifest.csv

2a. Train a new model

$ python train.py --manifest train:<training manifest> --manifest val:<validation manifest> -e <num_epochs> -z <batch_size> -s </path/to/model_output.pkl> [-b <backend>] 

where <training manifest> is the path to the training manifest file produced in the ingest. For the example above, that path is /usr/local/data/librispeech/train-clean-100/train-manifest.csv) and <validation manifest> is the path to the validation manifest file.

2b. Continue training after pause on a previous model

For a previously-trained model that wasn't trained for the full time needed, it's possible to resume training by passing the --model_file </path/to/pre-trained_model> argument to train.py. For example, you could continue training a pre-trained model from our Model Zoo sample. This particular model was trained using 1000 hours of speech data from the Librispeech corpus. The model was trained for 16 epochs after attaining a Character Error Rate (CER) of 14% without using a language model. You could continue training it for, say, an additional 4 epochs, by calling:

$ python train.py --manifest train:<training manifest> --manifest val:<validation manifest> -e20  -z <batch_size> -s </path/to/model_output.prm> --model_file </path/to/pre-trained_model> [-b <backend>] 

which will save a new model to model_output.prm.

Decoding and evaluating a trained model

After you have a trained model, it's easy to evaluate its performance on any given dataset. Simply create a manifest file and then call:

$ python evaluate.py --manifest val:/path/to/manifest.csv --model_file /path/to/saved_model.prm

replacing the file paths as needed. It prints CERs (Character Error Rates) by default. To instead print WERs (Word Error Rates), include the argument --use_wer.

For example, you could evaluate our pre-trained model from our Model Zoo. To evaluate the pre-trained model, follow these steps:

  1. Download some test data from the Librispeech ASR corpus and prepare a manifest file for the dataset that follows the prescription provided above.

  2. Download the pre-trained DS2 model from our Model Zoo.

  3. Subject the pre-trained model and the manifest file for the test data to the evaluate.py script, as described above.

  4. Optionally inspect the transcripts produced by the trained model; this can be done by appending it with the argument --inference_file <name_of_file_to_save_results_to.pkl>. The result dumps the model transcripts together with the corresponding "ground truth" transcripts to a pickle file.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].