hirofumi0810 / Neural_sp
Licence: apache-2.0
End-to-end ASR/LM implementation with PyTorch
Stars: ✭ 408
Programming Languages
python
139335 projects - #7 most used programming language
Labels
Projects that are alternatives of or similar to Neural sp
Athena
an open-source implementation of sequence-to-sequence based speech processing engine
Stars: ✭ 542 (+32.84%)
Mutual labels: speech-recognition, asr, sequence-to-sequence, ctc, transformer
Lingvo
Lingvo
Stars: ✭ 2,361 (+478.68%)
Mutual labels: speech-recognition, seq2seq, speech, language-model, asr
Delta
DELTA is a deep learning based natural language and speech processing platform.
Stars: ✭ 1,479 (+262.5%)
Mutual labels: speech-recognition, seq2seq, speech, asr, sequence-to-sequence
Tensorflow end2end speech recognition
End-to-End speech recognition implementation base on TensorFlow (CTC, Attention, and MTL training)
Stars: ✭ 305 (-25.25%)
Mutual labels: speech-recognition, attention-mechanism, asr, ctc
Pytorch Seq2seq
Tutorials on implementing a few sequence-to-sequence (seq2seq) models with PyTorch and TorchText.
Stars: ✭ 3,418 (+737.75%)
Mutual labels: attention, seq2seq, sequence-to-sequence, transformer
Awesome Speech Recognition Speech Synthesis Papers
Automatic Speech Recognition (ASR), Speaker Verification, Speech Synthesis, Text-to-Speech (TTS), Language Modelling, Singing Voice Synthesis (SVS), Voice Conversion (VC)
Stars: ✭ 2,085 (+411.03%)
Mutual labels: attention-mechanism, seq2seq, language-model, speech-recognition
Sockeye
Sequence-to-sequence framework with a focus on Neural Machine Translation based on Apache MXNet
Stars: ✭ 990 (+142.65%)
Mutual labels: attention-mechanism, seq2seq, sequence-to-sequence, transformer
Openasr
A pytorch based end2end speech recognition system.
Stars: ✭ 69 (-83.09%)
Mutual labels: speech-recognition, speech, asr, transformer
Openseq2seq
Toolkit for efficient experimentation with Speech Recognition, Text2Speech and NLP
Stars: ✭ 1,378 (+237.75%)
Mutual labels: speech-recognition, seq2seq, language-model, sequence-to-sequence
End2end Asr Pytorch
End-to-End Automatic Speech Recognition on PyTorch
Stars: ✭ 175 (-57.11%)
Mutual labels: speech-recognition, speech, asr, transformer
Pytorch Asr
ASR with PyTorch
Stars: ✭ 124 (-69.61%)
Mutual labels: speech-recognition, speech, asr, ctc
Kospeech
Open-Source Toolkit for End-to-End Korean Automatic Speech Recognition.
Stars: ✭ 190 (-53.43%)
Mutual labels: speech-recognition, seq2seq, asr, transformer
Transformers
🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
Stars: ✭ 55,742 (+13562.25%)
Mutual labels: language-model, transformer, speech-recognition, seq2seq
Pykaldi
A Python wrapper for Kaldi
Stars: ✭ 756 (+85.29%)
Mutual labels: speech-recognition, speech, language-model, asr
kospeech
Open-Source Toolkit for End-to-End Korean Automatic Speech Recognition leveraging PyTorch and Hydra.
Stars: ✭ 456 (+11.76%)
Mutual labels: transformer, speech-recognition, seq2seq, asr
torch-asg
Auto Segmentation Criterion (ASG) implemented in pytorch
Stars: ✭ 42 (-89.71%)
Mutual labels: speech, seq2seq, asr, ctc
A-Persona-Based-Neural-Conversation-Model
No description or website provided.
Stars: ✭ 22 (-94.61%)
Mutual labels: seq2seq, sequence-to-sequence, attention-mechanism
transformer
A PyTorch Implementation of "Attention Is All You Need"
Stars: ✭ 28 (-93.14%)
Mutual labels: transformer, seq2seq, attention
Nlp Tutorials
Simple implementations of NLP models. Tutorials are written in Chinese on my website https://mofanpy.com
Stars: ✭ 394 (-3.43%)
Mutual labels: attention, seq2seq, transformer
wenet
Production First and Production Ready End-to-End Speech Recognition Toolkit
Stars: ✭ 2,384 (+484.31%)
Mutual labels: transformer, speech-recognition, asr
NeuralSP: Neural network based Speech Processing
How to install
# Set path to CUDA, NCCL
CUDAROOT=/usr/local/cuda
NCCL_ROOT=/usr/local/nccl
export CPATH=$NCCL_ROOT/include:$CPATH
export LD_LIBRARY_PATH=$NCCL_ROOT/lib/:$CUDAROOT/lib64:$LD_LIBRARY_PATH
export LIBRARY_PATH=$NCCL_ROOT/lib/:$LIBRARY_PATH
export CUDA_HOME=$CUDAROOT
export CUDA_PATH=$CUDAROOT
export CPATH=$CUDA_PATH/include:$CPATH # for warp-rnnt
# Install miniconda, python libraries, and other tools
cd tools
make KALDI=/path/to/kaldi
Key features
Corpus
-
ASR
- AISHELL-1
- CSJ
- Librispeech
- Switchboard (+ Fisher)
- TEDLIUM2/TEDLIUM3
- TIMIT
- WSJ
-
LM
- Penn Tree Bank
- WikiText2
Front-end
Encoder
- RNN encoder
- Transformer encoder [link]
- Conformer encoder [link]
- Time-depth separable (TDS) convolution encoder [link] [line]
- Gated CNN encoder (GLU) [link]
Connectionist Temporal Classification (CTC) decoder
- Beam search
- Shallow fusion
- Forced alignment
link]
RNN-Transducer (RNN-T) decoder [- Beam search
- Shallow fusion
Attention-based decoder
- RNN decoder
- Attention type
- location-based
- content-based
- dot-product
- GMM attention
- Streaming RNN decoder specific
- Transformer decoder [link]
- Streaming Transformer decoder specific
Language model (LM)
- RNNLM (recurrent neural network language model)
- Gated convolutional LM [link]
- Transformer LM
- Transformer-XL LM [link]
- Adaptive softmax [link]
Output units
- Phoneme
- Grapheme
- Wordpiece (BPE, sentencepiece)
- Word
- Word-char mix
Multi-task learning (MTL)
Multi-task learning (MTL) with different units are supported to alleviate data sparseness.
- Hybrid CTC/attention [link]
- Hierarchical Attention (e.g., word attention + character attention) [link]
- Hierarchical CTC (e.g., word CTC + character CTC) [link]
- Hierarchical CTC+Attention (e.g., word attention + character CTC) [link]
- Forward-backward attention [link]
- LM objective
ASR Performance
AISHELL-1 (CER)
Model | dev | test |
---|---|---|
Transformer | 5.0 | 5.4 |
Conformer | 4.7 | 5.2 |
Streaming MMA | 5.5 | 6.1 |
CSJ (WER)
Model | eval1 | eval2 | eval3 |
---|---|---|---|
BLSTM LAS | 6.5 | 5.1 | 5.6 |
LC-BLSTM MoChA | 7.4 | 5.6 | 6.4 |
Switchboard 300h (WER)
Model | SWB | CH |
---|---|---|
BLSTM LAS | 9.1 | 18.8 |
Switchboard+Fisher 2000h (WER)
Model | SWB | CH |
---|---|---|
BLSTM LAS | 7.8 | 13.8 |
Librispeech (WER)
Model | dev-clean | dev-other | test-clean | test-other |
---|---|---|---|---|
BLSTM LAS | 2.5 | 7.2 | 2.6 | 7.5 |
BLSTM RNN-T | 2.9 | 8.5 | 3.2 | 9.0 |
Transformer | 2.1 | 5.3 | 2.4 | 5.7 |
UniLSTM RNN-T | 3.7 | 11.7 | 4.0 | 11.6 |
UniLSTM MoChA | 4.1 | 11.0 | 4.2 | 11.2 |
LC-BLSTM RNN-T | 3.3 | 9.8 | 3.5 | 10.2 |
LC-BLSTM MoChA | 3.3 | 8.8 | 3.5 | 9.1 |
Streaming MMA | 2.5 | 6.9 | 2.7 | 7.1 |
TEDLIUM2 (WER)
Model | dev | test |
---|---|---|
BLSTM LAS | 8.1 | 7.5 |
LC-BLSTM RNN-T | 8.9 | 8.5 |
LC-BLSTM MoChA | 10.6 | 8.6 |
UniLSTM RNN-T | 11.6 | 11.7 |
UniLSTM MoChA | 13.6 | 11.6 |
WSJ (WER)
Model | test_dev93 | test_eval92 |
---|---|---|
BLSTM LAS | 8.8 | 6.2 |
LM Performance
Penn Tree Bank (PPL)
Model | valid | test |
---|---|---|
RNNLM | 87.99 | 86.06 |
+ cache=100 | 79.58 | 79.12 |
+ cache=500 | 77.36 | 76.94 |
WikiText2 (PPL)
Model | valid | test |
---|---|---|
RNNLM | 104.53 | 98.73 |
+ cache=100 | 90.86 | 85.87 |
+ cache=2000 | 76.10 | 72.77 |
Reference
- https://github.com/kaldi-asr/kaldi
- https://github.com/espnet/espnet
- https://github.com/awni/speech
- https://github.com/HawkAaron/E2E-ASR
Dependency
Note that the project description data, including the texts, logos, images, and/or trademarks,
for each open source project belongs to its rightful owner.
If you wish to add or remove any projects, please contact us at [email protected].