All Projects → HawkAaron → Rnn Transducer

HawkAaron / Rnn Transducer

MXNet implementation of RNN Transducer (Graves 2012): Sequence Transduction with Recurrent Neural Networks

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Rnn Transducer

Kospeech
Open-Source Toolkit for End-to-End Korean Automatic Speech Recognition.
Stars: ✭ 190 (+66.67%)
Mutual labels:  speech-recognition, asr, end-to-end
Espresso
Espresso: A Fast End-to-End Neural Speech Recognition Toolkit
Stars: ✭ 808 (+608.77%)
Mutual labels:  speech-recognition, asr, end-to-end
End2end Asr Pytorch
End-to-End Automatic Speech Recognition on PyTorch
Stars: ✭ 175 (+53.51%)
Mutual labels:  speech-recognition, asr, end-to-end
kospeech
Open-Source Toolkit for End-to-End Korean Automatic Speech Recognition leveraging PyTorch and Hydra.
Stars: ✭ 456 (+300%)
Mutual labels:  end-to-end, speech-recognition, asr
End-to-End-Mandarin-ASR
End-to-end speech recognition on AISHELL dataset.
Stars: ✭ 20 (-82.46%)
Mutual labels:  end-to-end, speech-recognition, asr
Tensorflow end2end speech recognition
End-to-End speech recognition implementation base on TensorFlow (CTC, Attention, and MTL training)
Stars: ✭ 305 (+167.54%)
Mutual labels:  speech-recognition, asr, end-to-end
kosr
Korean speech recognition based on transformer (트랜스포머 기반 한국어 음성 인식)
Stars: ✭ 25 (-78.07%)
Mutual labels:  end-to-end, speech-recognition, asr
E2e Asr
PyTorch Implementations for End-to-End Automatic Speech Recognition
Stars: ✭ 106 (-7.02%)
Mutual labels:  speech-recognition, asr, end-to-end
Sincnet
SincNet is a neural architecture for efficiently processing raw audio samples.
Stars: ✭ 764 (+570.18%)
Mutual labels:  speech-recognition, asr
Keras Sincnet
Keras (tensorflow) implementation of SincNet (Mirco Ravanelli, Yoshua Bengio - https://github.com/mravanelli/SincNet)
Stars: ✭ 47 (-58.77%)
Mutual labels:  speech-recognition, asr
Bigcidian
Pronunciation lexicon covering both English and Chinese languages for Automatic Speech Recognition.
Stars: ✭ 99 (-13.16%)
Mutual labels:  speech-recognition, asr
Pykaldi
A Python wrapper for Kaldi
Stars: ✭ 756 (+563.16%)
Mutual labels:  speech-recognition, asr
Vosk Api
Offline speech recognition API for Android, iOS, Raspberry Pi and servers with Python, Java, C# and Node
Stars: ✭ 1,357 (+1090.35%)
Mutual labels:  speech-recognition, asr
Syn Speech
Syn.Speech is a flexible speaker independent continuous speech recognition engine for Mono and .NET framework
Stars: ✭ 57 (-50%)
Mutual labels:  speech-recognition, asr
Eesen
The official repository of the Eesen project
Stars: ✭ 738 (+547.37%)
Mutual labels:  speech-recognition, asr
Openasr
A pytorch based end2end speech recognition system.
Stars: ✭ 69 (-39.47%)
Mutual labels:  speech-recognition, asr
Wav2letter
Facebook AI Research's Automatic Speech Recognition Toolkit
Stars: ✭ 5,907 (+5081.58%)
Mutual labels:  speech-recognition, end-to-end
Audio Pretrained Model
A collection of Audio and Speech pre-trained models.
Stars: ✭ 61 (-46.49%)
Mutual labels:  speech-recognition, mxnet
Deepspeechrecognition
A Chinese Deep Speech Recognition System 包括基于深度学习的声学模型和基于深度学习的语言模型
Stars: ✭ 1,421 (+1146.49%)
Mutual labels:  speech-recognition, asr
Speech Transformer Tf2.0
transformer for ASR-systerm (via tensorflow2.0)
Stars: ✭ 90 (-21.05%)
Mutual labels:  speech-recognition, end-to-end

End-to-End Speech Recognition using RNN-Transducer

File description

  • eval.py: rnnt joint model decode
  • model.py: rnnt model, which contains acoustic / phoneme model
  • model2012.py: rnnt model refer to Graves2012
  • seq2seq/*: seq2seq with attention
  • rnnt_np.py: rnnt loss function implementation on mxnet, support for both symbol and gluon refer to PyTorch implementation
  • DataLoader.py: data process
  • train.py: rnnt training script, can be initialized from CTC and PM model
  • train_ctc.py: ctc training script
  • train_att.py: attention training script

Directory description

  • conf: kaldi feature extraction config

Reference Paper

Run

  • Compile RNNT Loss Follow the instructions in here to compile MXNET with RNNT loss.

  • Extract feature link kaldi timit example dirs (local steps utils ) excute run.sh to extract 40 dim fbank feature run feature_transform.sh to get 123 dim feature as described in Graves2013

  • Train RNNT model:

python train.py --lr 1e-3 --bi --dropout .5 --out exp/rnnt_bi_lr1e-3 --schedule

Evaluation

Default only for RNNT

  • Greedy decoding:
python eval.py <path to best model parameters> --bi
  • Beam search:
python eval.py <path to best model parameters> --bi --beam <beam size>

Results

  • CTC

    Decode PER
    greedy 20.36
    beam 100 20.03
  • Transducer

    Decode PER
    greedy 20.74
    beam 40 19.84

Requirements

  • Python 3.6
  • MxNet 1.1.0
  • numpy 1.14

TODO

  • beam serach accelaration
  • Seq2Seq with attention
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].