All Projects → samsungsds-rnd → deepspeech.mxnet

samsungsds-rnd / deepspeech.mxnet

Licence: Apache-2.0 license
A MXNet implementation of Baidu's DeepSpeech architecture

Programming Languages

python
139335 projects - #7 most used programming language
shell
77523 projects

Projects that are alternatives of or similar to deepspeech.mxnet

sova-asr
SOVA ASR (Automatic Speech Recognition)
Stars: ✭ 123 (+50%)
Mutual labels:  speech, speech-recognition, speech-to-text, stt
simple-obs-stt
Speech-to-text and keyboard input captions for OBS.
Stars: ✭ 89 (+8.54%)
Mutual labels:  speech, speech-recognition, speech-to-text, stt
Audio Pretrained Model
A collection of Audio and Speech pre-trained models.
Stars: ✭ 61 (-25.61%)
Mutual labels:  mxnet, speech-recognition, speech-to-text
anycontrol
Voice control for your websites and applications
Stars: ✭ 53 (-35.37%)
Mutual labels:  speech, speech-recognition, speech-to-text
open-speech-corpora
💎 A list of accessible speech corpora for ASR, TTS, and other Speech Technologies
Stars: ✭ 841 (+925.61%)
Mutual labels:  speech-recognition, speech-to-text, stt
Edgedict
Working online speech recognition based on RNN Transducer. ( Trained model release available in release )
Stars: ✭ 205 (+150%)
Mutual labels:  speech, speech-recognition, speech-to-text
Kerasdeepspeech
A Keras CTC implementation of Baidu's DeepSpeech for model experimentation
Stars: ✭ 245 (+198.78%)
Mutual labels:  speech, baidu, speech-to-text
kaldi ag training
Docker image and scripts for training finetuned or completely personal Kaldi speech models. Particularly for use with kaldi-active-grammar.
Stars: ✭ 14 (-82.93%)
Mutual labels:  speech, speech-recognition, speech-to-text
Kaldi
kaldi-asr/kaldi is the official location of the Kaldi project.
Stars: ✭ 11,151 (+13498.78%)
Mutual labels:  speech, speech-recognition, speech-to-text
ASR-Audio-Data-Links
A list of publically available audio data that anyone can download for ASR or other speech activities
Stars: ✭ 179 (+118.29%)
Mutual labels:  speech, speech-recognition, speech-to-text
wav2vec2-live
A live speech recognition using Facebooks wav2vec 2.0 model.
Stars: ✭ 205 (+150%)
Mutual labels:  speech, speech-recognition, speech-to-text
opensource-voice-tools
A repo listing known open source voice tools, ordered by where they sit in the voice stack
Stars: ✭ 21 (-74.39%)
Mutual labels:  speech, speech-recognition, stt
Lingvo
Lingvo
Stars: ✭ 2,361 (+2779.27%)
Mutual labels:  speech, speech-recognition, speech-to-text
Tacotron asr
Speech Recognition Using Tacotron
Stars: ✭ 165 (+101.22%)
Mutual labels:  speech, speech-recognition, speech-to-text
Speechbrain.github.io
The SpeechBrain project aims to build a novel speech toolkit fully based on PyTorch. With SpeechBrain users can easily create speech processing systems, ranging from speech recognition (both HMM/DNN and end-to-end), speaker recognition, speech enhancement, speech separation, multi-microphone speech processing, and many others.
Stars: ✭ 242 (+195.12%)
Mutual labels:  speech, speech-recognition, speech-to-text
Asr audio data links
A list of publically available audio data that anyone can download for ASR or other speech activities
Stars: ✭ 128 (+56.1%)
Mutual labels:  speech, speech-recognition, speech-to-text
speech-recognition-evaluation
Evaluate results from ASR/Speech-to-Text quickly
Stars: ✭ 25 (-69.51%)
Mutual labels:  speech-recognition, speech-to-text, stt
Openasr
A pytorch based end2end speech recognition system.
Stars: ✭ 69 (-15.85%)
Mutual labels:  speech, speech-recognition, speech-to-text
Deepspeech
A PaddlePaddle implementation of ASR.
Stars: ✭ 1,219 (+1386.59%)
Mutual labels:  speech, speech-recognition, speech-to-text
leopard
On-device speech-to-text engine powered by deep learning
Stars: ✭ 354 (+331.71%)
Mutual labels:  speech-recognition, speech-to-text, stt

deepSpeech.mxnet: Rich Speech Example

This example based on DeepSpeech2 of Baidu helps you to build Speech-To-Text (STT) models at scale using

  • CNNs, fully connected networks, (Bi-) RNNs, (Bi-) LSTMs, and (Bi-) GRUs for network layers,
  • batch-normalization and drop-outs for training efficiency,
  • and a Warp CTC for loss calculations.

In order to make your own STT models, besides, all you need is to just edit a configuration file not actual codes.


Motivation

This example is intended to guide people who want to making practical STT models with MXNet. With rich functionalities and convenience explained above, you can build your own speech recognition models with it easier than former examples.


Environments

  • MXNet version: 0.9.5+
  • GPU memory size: 2.4GB+
  • Install tensorboard for logging
pip install tensorboard
pip install soundfile
  • Warp CTC: Follow this instruction to install Baidu's Warp CTC.
  • We strongly recommend that you first test a model of small networks.

How it works

Preparing data

Input data are described in a JSON file Libri_sample.json as followed.

{"duration": 2.9450625, "text": "and sharing her house which was near by", "key": "./Libri_sample/3830-12531-0030.wav"}
{"duration": 3.94, "text": "we were able to impart the information that we wanted", "key": "./Libri_sample/3830-12529-0005.wav"}

You can download two wave files above from this. Put them under /path/to/yourproject/Libri_sample/.

Setting the configuration file

[Notice] The configuration file "default.cfg" included describes DeepSpeech2 with slight changes. You can test the original DeepSpeech2("deepspeech.cfg") with a few line changes to the cfg file:


[common]
...
learning_rate = 0.0003
# constant learning rate annealing by factor
learning_rate_annealing = 1.1
optimizer = sgd
...
is_bi_graphemes = True
...
[arch]
...
num_rnn_layer = 7
num_hidden_rnn_list = [1760, 1760, 1760, 1760, 1760, 1760, 1760]
num_hidden_proj = 0
num_rear_fc_layers = 1
num_hidden_rear_fc_list = [1760]
act_type_rear_fc_list = ["relu"]
...
[train]
...
learning_rate = 0.0003
# constant learning rate annealing by factor
learning_rate_annealing = 1.1
optimizer = sgd
...

Run the example

Train

cd /path/to/your/project/
mkdir checkpoints
mkdir log
python main.py --configfile default.cfg

Checkpoints of the model will be saved at every n-th epoch.

Load

You can (re-) train (saved) models by loading checkpoints (starting from 0). For this, you need to modify only two lines of the file "default.cfg".

...
[common]
# mode can be one of the followings - train, predict, load
mode = load
...
model_file = 'file name of your model saved'
...

Predict

You can predict (or test) audios by specifying the mode, model, and test data in the file "default.cfg".

...
[common]
# mode can be one of the followings - train, predict, load
mode = predict
...
model_file = 'file name of your model to be tested'
...
[data]
...
test_json = 'a json file described test audios'
...

Run the following line after all modification explained above.
python main.py --configfile default.cfg

Train and test your own models

Train and test your own models by preparing two files.

  1. A new configuration file, i.e., custom.cfg, corresponding to the file 'default.cfg'. The new file should specify the items below the '[arch]' section of the original file.
  2. A new implementation file, i.e., arch_custom.py, corresponding to the file 'arch_deepspeech.py'. The new file should implement two functions, prepare_data() and arch(), for building networks described in the new configuration file.

Run the following line after preparing the files.

python main.py --configfile custom.cfg --archfile arch_custom

Further more

You can prepare full LibriSpeech dataset by following the instruction on https://github.com/baidu-research/ba-dls-deepspeech
Change flac_to_wav.sh script of baidu to flac_to_wav.sh in repository to avoid bug

git clone https://github.com/baidu-research/ba-dls-deepspeech
cd ba-dls-deepspeech
./download.sh
cp -f /path/to/example/flac_to_wav.sh ./
./flac_to_wav.sh
python create_desc_json.py /path/to/ba-dls-deepspeech/LibriSpeech/train-clean-100 train_corpus.json
python create_desc_json.py /path/to/ba-dls-deepspeech/LibriSpeech/dev-clean validation_corpus.json
python create_desc_json.py /path/to/ba-dls-deepspeech/LibriSpeech/test-clean test_corpus.json
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].