HawkAaron / E2e Asr
PyTorch Implementations for End-to-End Automatic Speech Recognition
Stars: ✭ 106
Programming Languages
python
139335 projects - #7 most used programming language
Projects that are alternatives of or similar to E2e Asr
End2end Asr Pytorch
End-to-End Automatic Speech Recognition on PyTorch
Stars: ✭ 175 (+65.09%)
Mutual labels: speech-recognition, asr, end-to-end
Espresso
Espresso: A Fast End-to-End Neural Speech Recognition Toolkit
Stars: ✭ 808 (+662.26%)
Mutual labels: speech-recognition, asr, end-to-end
Rnn Transducer
MXNet implementation of RNN Transducer (Graves 2012): Sequence Transduction with Recurrent Neural Networks
Stars: ✭ 114 (+7.55%)
Mutual labels: speech-recognition, asr, end-to-end
Kospeech
Open-Source Toolkit for End-to-End Korean Automatic Speech Recognition.
Stars: ✭ 190 (+79.25%)
Mutual labels: speech-recognition, asr, end-to-end
kospeech
Open-Source Toolkit for End-to-End Korean Automatic Speech Recognition leveraging PyTorch and Hydra.
Stars: ✭ 456 (+330.19%)
Mutual labels: end-to-end, speech-recognition, asr
kosr
Korean speech recognition based on transformer (트랜스포머 기반 한국어 음성 인식)
Stars: ✭ 25 (-76.42%)
Mutual labels: end-to-end, speech-recognition, asr
End-to-End-Mandarin-ASR
End-to-end speech recognition on AISHELL dataset.
Stars: ✭ 20 (-81.13%)
Mutual labels: end-to-end, speech-recognition, asr
Tensorflow end2end speech recognition
End-to-End speech recognition implementation base on TensorFlow (CTC, Attention, and MTL training)
Stars: ✭ 305 (+187.74%)
Mutual labels: speech-recognition, asr, end-to-end
Wav2letter
Facebook AI Research's Automatic Speech Recognition Toolkit
Stars: ✭ 5,907 (+5472.64%)
Mutual labels: speech-recognition, end-to-end
Eesen
The official repository of the Eesen project
Stars: ✭ 738 (+596.23%)
Mutual labels: speech-recognition, asr
Mongolian Speech Recognition
Mongolian speech recognition with PyTorch
Stars: ✭ 97 (-8.49%)
Mutual labels: speech-recognition, asr
Libreasr
💬 An On-Premises, Streaming Speech Recognition System
Stars: ✭ 633 (+497.17%)
Mutual labels: speech-recognition, asr
Ktspeechcrawler
Automatically constructing corpus for automatic speech recognition from YouTube videos
Stars: ✭ 92 (-13.21%)
Mutual labels: speech-recognition, asr
Wenet
Production First and Production Ready End-to-End Speech Recognition Toolkit
Stars: ✭ 617 (+482.08%)
Mutual labels: speech-recognition, asr
Bigcidian
Pronunciation lexicon covering both English and Chinese languages for Automatic Speech Recognition.
Stars: ✭ 99 (-6.6%)
Mutual labels: speech-recognition, asr
Speech Transformer
A PyTorch implementation of Speech Transformer, an End-to-End ASR with Transformer network on Mandarin Chinese.
Stars: ✭ 565 (+433.02%)
Mutual labels: asr, end-to-end
Sincnet
SincNet is a neural architecture for efficiently processing raw audio samples.
Stars: ✭ 764 (+620.75%)
Mutual labels: speech-recognition, asr
Delta
DELTA is a deep learning based natural language and speech processing platform.
Stars: ✭ 1,479 (+1295.28%)
Mutual labels: speech-recognition, asr
Vosk Api
Offline speech recognition API for Android, iOS, Raspberry Pi and servers with Python, Java, C# and Node
Stars: ✭ 1,357 (+1180.19%)
Mutual labels: speech-recognition, asr
Graves 2013 experiments
File description
- model.py: rnnt joint model
- model2012.py: graves2012 model
- train_rnnt.py: rnnt training script
- train_ctc.py: ctc acoustic model training script
- eval.py: rnnt & ctc decode
- DataLoader.py: kaldi feature loader
Run
-
Extract feature link kaldi timit example dirs (
local
steps
utils
) excuterun.sh
to extract 40 dim fbank feature runfeature_transform.sh
to get 123 dim feature as described in Graves2013 -
Train CTC acoustic model
python train_ctc.py --lr 1e-3 --bi --dropout 0.5 --out exp/ctc_bi_lr1e-3 --schedule
- Train RNNT joint model
python train_rnnt.py --lr 4e-4 --bi --dropout 0.5 --out exp/rnnt_bi_lr4e-4 --schedule
- Decode
python eval.py <path to best model> [--ctc] --bi
Results
Model | PER |
---|---|
CTC | 21.38 |
RNN-T | 20.59 |
Requirements
- Python 3.6
- PyTorch >= 0.4
- numpy 1.14
- warp-transducer
Reference
- RNN Transducer (Graves 2012): Sequence Transduction with Recurrent Neural Networks
- RNNT joint (Graves 2013): Speech Recognition with Deep Recurrent Neural Networks
- (PyTorch End-to-End Models for ASR)[https://github.com/awni/speech]
- (A Fast Sequence Transducer GPU Implementation with PyTorch Bindings)[https://github.com/HawkAaron/warp-transducer/tree/add_network_accelerate]
Note that the project description data, including the texts, logos, images, and/or trademarks,
for each open source project belongs to its rightful owner.
If you wish to add or remove any projects, please contact us at [email protected].