Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → HawkAaron → Rnn Transducer

HawkAaron / Rnn Transducer

MXNet implementation of RNN Transducer (Graves 2012): Sequence Transduction with Recurrent Neural Networks

Programming Languages

python

139335 projects - #7 most used programming language

Labels

speech-recognition mxnet asr end-to-end

Projects that are alternatives of or similar to Rnn Transducer

Kospeech

Open-Source Toolkit for End-to-End Korean Automatic Speech Recognition.

Stars: ✭ 190 (+66.67%)

Mutual labels: speech-recognition, asr, end-to-end

Espresso

Espresso: A Fast End-to-End Neural Speech Recognition Toolkit

Stars: ✭ 808 (+608.77%)

Mutual labels: speech-recognition, asr, end-to-end

End2end Asr Pytorch

End-to-End Automatic Speech Recognition on PyTorch

Stars: ✭ 175 (+53.51%)

Mutual labels: speech-recognition, asr, end-to-end

kospeech

Open-Source Toolkit for End-to-End Korean Automatic Speech Recognition leveraging PyTorch and Hydra.

Stars: ✭ 456 (+300%)

Mutual labels: end-to-end, speech-recognition, asr

End-to-End-Mandarin-ASR

End-to-end speech recognition on AISHELL dataset.

Stars: ✭ 20 (-82.46%)

Mutual labels: end-to-end, speech-recognition, asr

Tensorflow end2end speech recognition

End-to-End speech recognition implementation base on TensorFlow (CTC, Attention, and MTL training)

Stars: ✭ 305 (+167.54%)

Mutual labels: speech-recognition, asr, end-to-end

kosr

Korean speech recognition based on transformer (트랜스포머 기반 한국어 음성 인식)

Stars: ✭ 25 (-78.07%)

Mutual labels: end-to-end, speech-recognition, asr

E2e Asr

PyTorch Implementations for End-to-End Automatic Speech Recognition

Stars: ✭ 106 (-7.02%)

Mutual labels: speech-recognition, asr, end-to-end

Sincnet

SincNet is a neural architecture for efficiently processing raw audio samples.

Stars: ✭ 764 (+570.18%)

Mutual labels: speech-recognition, asr

Keras Sincnet

Keras (tensorflow) implementation of SincNet (Mirco Ravanelli, Yoshua Bengio - https://github.com/mravanelli/SincNet)

Stars: ✭ 47 (-58.77%)

Mutual labels: speech-recognition, asr

Bigcidian

Pronunciation lexicon covering both English and Chinese languages for Automatic Speech Recognition.

Stars: ✭ 99 (-13.16%)

Mutual labels: speech-recognition, asr

Pykaldi

A Python wrapper for Kaldi

Stars: ✭ 756 (+563.16%)

Mutual labels: speech-recognition, asr

Vosk Api

Offline speech recognition API for Android, iOS, Raspberry Pi and servers with Python, Java, C# and Node

Stars: ✭ 1,357 (+1090.35%)

Mutual labels: speech-recognition, asr

Syn Speech

Syn.Speech is a flexible speaker independent continuous speech recognition engine for Mono and .NET framework

Stars: ✭ 57 (-50%)

Mutual labels: speech-recognition, asr

Eesen

The official repository of the Eesen project

Stars: ✭ 738 (+547.37%)

Mutual labels: speech-recognition, asr

Openasr

A pytorch based end2end speech recognition system.

Stars: ✭ 69 (-39.47%)

Mutual labels: speech-recognition, asr

Wav2letter

Facebook AI Research's Automatic Speech Recognition Toolkit

Stars: ✭ 5,907 (+5081.58%)

Mutual labels: speech-recognition, end-to-end

Audio Pretrained Model

A collection of Audio and Speech pre-trained models.

Stars: ✭ 61 (-46.49%)

Mutual labels: speech-recognition, mxnet

Deepspeechrecognition

A Chinese Deep Speech Recognition System 包括基于深度学习的声学模型和基于深度学习的语言模型

Stars: ✭ 1,421 (+1146.49%)

Mutual labels: speech-recognition, asr

Speech Transformer Tf2.0

transformer for ASR-systerm (via tensorflow2.0)

Stars: ✭ 90 (-21.05%)

Mutual labels: speech-recognition, end-to-end

View All Similar Projects ➔

End-to-End Speech Recognition using RNN-Transducer

File description

eval.py: rnnt joint model decode
model.py: rnnt model, which contains acoustic / phoneme model
model2012.py: rnnt model refer to Graves2012
seq2seq/*: seq2seq with attention
rnnt_np.py: rnnt loss function implementation on mxnet, support for both symbol and gluon refer to PyTorch implementation
DataLoader.py: data process
train.py: rnnt training script, can be initialized from CTC and PM model
train_ctc.py: ctc training script
train_att.py: attention training script

Directory description

conf: kaldi feature extraction config

Reference Paper

RNN Transducer (Graves 2012): Sequence Transduction with Recurrent Neural Networks
RNNT joint (Graves 2013): Speech Recognition with Deep Recurrent Neural Networks
E2E criterion comparison (Baidu 2017): Exploring Neural Transducers for End-to-End Speech Recognition
Seq2Seq-Attention: Attention-Based Models for Speech Recognition

Run

Compile RNNT Loss Follow the instructions in here to compile MXNET with RNNT loss.
Extract feature link kaldi timit example dirs (local steps utils ) excute run.sh to extract 40 dim fbank feature run feature_transform.sh to get 123 dim feature as described in Graves2013
Train RNNT model:

python train.py --lr 1e-3 --bi --dropout .5 --out exp/rnnt_bi_lr1e-3 --schedule

Evaluation

Default only for RNNT

Greedy decoding:

python eval.py <path to best model parameters> --bi

Beam search:

python eval.py <path to best model parameters> --bi --beam <beam size>

Results

CTC

Decode PER

greedy 20.36

beam 100 20.03
Transducer

Decode PER

greedy 20.74

beam 40 19.84

Decode	PER
greedy	20.36
beam 100	20.03

Decode	PER
greedy	20.74
beam 40	19.84

Requirements

Python 3.6
MxNet 1.1.0
numpy 1.14

TODO

beam serach accelaration
Seq2Seq with attention

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 114

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (8) 🔗