HawkAaron / Rnn Transducer
MXNet implementation of RNN Transducer (Graves 2012): Sequence Transduction with Recurrent Neural Networks
Stars: ✭ 114
Programming Languages
python
139335 projects - #7 most used programming language
Projects that are alternatives of or similar to Rnn Transducer
Kospeech
Open-Source Toolkit for End-to-End Korean Automatic Speech Recognition.
Stars: ✭ 190 (+66.67%)
Mutual labels: speech-recognition, asr, end-to-end
Espresso
Espresso: A Fast End-to-End Neural Speech Recognition Toolkit
Stars: ✭ 808 (+608.77%)
Mutual labels: speech-recognition, asr, end-to-end
End2end Asr Pytorch
End-to-End Automatic Speech Recognition on PyTorch
Stars: ✭ 175 (+53.51%)
Mutual labels: speech-recognition, asr, end-to-end
kospeech
Open-Source Toolkit for End-to-End Korean Automatic Speech Recognition leveraging PyTorch and Hydra.
Stars: ✭ 456 (+300%)
Mutual labels: end-to-end, speech-recognition, asr
End-to-End-Mandarin-ASR
End-to-end speech recognition on AISHELL dataset.
Stars: ✭ 20 (-82.46%)
Mutual labels: end-to-end, speech-recognition, asr
Tensorflow end2end speech recognition
End-to-End speech recognition implementation base on TensorFlow (CTC, Attention, and MTL training)
Stars: ✭ 305 (+167.54%)
Mutual labels: speech-recognition, asr, end-to-end
kosr
Korean speech recognition based on transformer (트랜스포머 기반 한국어 음성 인식)
Stars: ✭ 25 (-78.07%)
Mutual labels: end-to-end, speech-recognition, asr
E2e Asr
PyTorch Implementations for End-to-End Automatic Speech Recognition
Stars: ✭ 106 (-7.02%)
Mutual labels: speech-recognition, asr, end-to-end
Sincnet
SincNet is a neural architecture for efficiently processing raw audio samples.
Stars: ✭ 764 (+570.18%)
Mutual labels: speech-recognition, asr
Keras Sincnet
Keras (tensorflow) implementation of SincNet (Mirco Ravanelli, Yoshua Bengio - https://github.com/mravanelli/SincNet)
Stars: ✭ 47 (-58.77%)
Mutual labels: speech-recognition, asr
Bigcidian
Pronunciation lexicon covering both English and Chinese languages for Automatic Speech Recognition.
Stars: ✭ 99 (-13.16%)
Mutual labels: speech-recognition, asr
Vosk Api
Offline speech recognition API for Android, iOS, Raspberry Pi and servers with Python, Java, C# and Node
Stars: ✭ 1,357 (+1090.35%)
Mutual labels: speech-recognition, asr
Syn Speech
Syn.Speech is a flexible speaker independent continuous speech recognition engine for Mono and .NET framework
Stars: ✭ 57 (-50%)
Mutual labels: speech-recognition, asr
Eesen
The official repository of the Eesen project
Stars: ✭ 738 (+547.37%)
Mutual labels: speech-recognition, asr
Openasr
A pytorch based end2end speech recognition system.
Stars: ✭ 69 (-39.47%)
Mutual labels: speech-recognition, asr
Wav2letter
Facebook AI Research's Automatic Speech Recognition Toolkit
Stars: ✭ 5,907 (+5081.58%)
Mutual labels: speech-recognition, end-to-end
Audio Pretrained Model
A collection of Audio and Speech pre-trained models.
Stars: ✭ 61 (-46.49%)
Mutual labels: speech-recognition, mxnet
Deepspeechrecognition
A Chinese Deep Speech Recognition System 包括基于深度学习的声学模型和基于深度学习的语言模型
Stars: ✭ 1,421 (+1146.49%)
Mutual labels: speech-recognition, asr
Speech Transformer Tf2.0
transformer for ASR-systerm (via tensorflow2.0)
Stars: ✭ 90 (-21.05%)
Mutual labels: speech-recognition, end-to-end
End-to-End Speech Recognition using RNN-Transducer
File description
- eval.py: rnnt joint model decode
- model.py: rnnt model, which contains acoustic / phoneme model
- model2012.py: rnnt model refer to Graves2012
- seq2seq/*: seq2seq with attention
- rnnt_np.py: rnnt loss function implementation on mxnet, support for both symbol and gluon refer to PyTorch implementation
- DataLoader.py: data process
- train.py: rnnt training script, can be initialized from CTC and PM model
- train_ctc.py: ctc training script
- train_att.py: attention training script
Directory description
- conf: kaldi feature extraction config
Reference Paper
- RNN Transducer (Graves 2012): Sequence Transduction with Recurrent Neural Networks
- RNNT joint (Graves 2013): Speech Recognition with Deep Recurrent Neural Networks
- E2E criterion comparison (Baidu 2017): Exploring Neural Transducers for End-to-End Speech Recognition
- Seq2Seq-Attention: Attention-Based Models for Speech Recognition
Run
-
Compile RNNT Loss Follow the instructions in here to compile MXNET with RNNT loss.
-
Extract feature link kaldi timit example dirs (
local
steps
utils
) excuterun.sh
to extract 40 dim fbank feature runfeature_transform.sh
to get 123 dim feature as described in Graves2013 -
Train RNNT model:
python train.py --lr 1e-3 --bi --dropout .5 --out exp/rnnt_bi_lr1e-3 --schedule
Evaluation
Default only for RNNT
- Greedy decoding:
python eval.py <path to best model parameters> --bi
- Beam search:
python eval.py <path to best model parameters> --bi --beam <beam size>
Results
-
CTC
Decode PER greedy 20.36 beam 100 20.03 -
Transducer
Decode PER greedy 20.74 beam 40 19.84
Requirements
- Python 3.6
- MxNet 1.1.0
- numpy 1.14
TODO
- beam serach accelaration
- Seq2Seq with attention
Note that the project description data, including the texts, logos, images, and/or trademarks,
for each open source project belongs to its rightful owner.
If you wish to add or remove any projects, please contact us at [email protected].