All Projects → hirofumi0810 → Neural_sp

hirofumi0810 / Neural_sp

Licence: apache-2.0
End-to-end ASR/LM implementation with PyTorch

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Neural sp

Athena
an open-source implementation of sequence-to-sequence based speech processing engine
Stars: ✭ 542 (+32.84%)
Mutual labels:  speech-recognition, asr, sequence-to-sequence, ctc, transformer
Lingvo
Lingvo
Stars: ✭ 2,361 (+478.68%)
Mutual labels:  speech-recognition, seq2seq, speech, language-model, asr
Delta
DELTA is a deep learning based natural language and speech processing platform.
Stars: ✭ 1,479 (+262.5%)
Mutual labels:  speech-recognition, seq2seq, speech, asr, sequence-to-sequence
Tensorflow end2end speech recognition
End-to-End speech recognition implementation base on TensorFlow (CTC, Attention, and MTL training)
Stars: ✭ 305 (-25.25%)
Mutual labels:  speech-recognition, attention-mechanism, asr, ctc
Pytorch Seq2seq
Tutorials on implementing a few sequence-to-sequence (seq2seq) models with PyTorch and TorchText.
Stars: ✭ 3,418 (+737.75%)
Mutual labels:  attention, seq2seq, sequence-to-sequence, transformer
Awesome Speech Recognition Speech Synthesis Papers
Automatic Speech Recognition (ASR), Speaker Verification, Speech Synthesis, Text-to-Speech (TTS), Language Modelling, Singing Voice Synthesis (SVS), Voice Conversion (VC)
Stars: ✭ 2,085 (+411.03%)
Mutual labels:  attention-mechanism, seq2seq, language-model, speech-recognition
Sockeye
Sequence-to-sequence framework with a focus on Neural Machine Translation based on Apache MXNet
Stars: ✭ 990 (+142.65%)
Mutual labels:  attention-mechanism, seq2seq, sequence-to-sequence, transformer
Openasr
A pytorch based end2end speech recognition system.
Stars: ✭ 69 (-83.09%)
Mutual labels:  speech-recognition, speech, asr, transformer
Openseq2seq
Toolkit for efficient experimentation with Speech Recognition, Text2Speech and NLP
Stars: ✭ 1,378 (+237.75%)
Mutual labels:  speech-recognition, seq2seq, language-model, sequence-to-sequence
End2end Asr Pytorch
End-to-End Automatic Speech Recognition on PyTorch
Stars: ✭ 175 (-57.11%)
Mutual labels:  speech-recognition, speech, asr, transformer
Pytorch Asr
ASR with PyTorch
Stars: ✭ 124 (-69.61%)
Mutual labels:  speech-recognition, speech, asr, ctc
Kospeech
Open-Source Toolkit for End-to-End Korean Automatic Speech Recognition.
Stars: ✭ 190 (-53.43%)
Mutual labels:  speech-recognition, seq2seq, asr, transformer
Transformers
🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
Stars: ✭ 55,742 (+13562.25%)
Mutual labels:  language-model, transformer, speech-recognition, seq2seq
Pykaldi
A Python wrapper for Kaldi
Stars: ✭ 756 (+85.29%)
Mutual labels:  speech-recognition, speech, language-model, asr
kospeech
Open-Source Toolkit for End-to-End Korean Automatic Speech Recognition leveraging PyTorch and Hydra.
Stars: ✭ 456 (+11.76%)
Mutual labels:  transformer, speech-recognition, seq2seq, asr
torch-asg
Auto Segmentation Criterion (ASG) implemented in pytorch
Stars: ✭ 42 (-89.71%)
Mutual labels:  speech, seq2seq, asr, ctc
A-Persona-Based-Neural-Conversation-Model
No description or website provided.
Stars: ✭ 22 (-94.61%)
Mutual labels:  seq2seq, sequence-to-sequence, attention-mechanism
transformer
A PyTorch Implementation of "Attention Is All You Need"
Stars: ✭ 28 (-93.14%)
Mutual labels:  transformer, seq2seq, attention
Nlp Tutorials
Simple implementations of NLP models. Tutorials are written in Chinese on my website https://mofanpy.com
Stars: ✭ 394 (-3.43%)
Mutual labels:  attention, seq2seq, transformer
wenet
Production First and Production Ready End-to-End Speech Recognition Toolkit
Stars: ✭ 2,384 (+484.31%)
Mutual labels:  transformer, speech-recognition, asr

Build Status codecov

NeuralSP: Neural network based Speech Processing

How to install

# Set path to CUDA, NCCL
CUDAROOT=/usr/local/cuda
NCCL_ROOT=/usr/local/nccl

export CPATH=$NCCL_ROOT/include:$CPATH
export LD_LIBRARY_PATH=$NCCL_ROOT/lib/:$CUDAROOT/lib64:$LD_LIBRARY_PATH
export LIBRARY_PATH=$NCCL_ROOT/lib/:$LIBRARY_PATH
export CUDA_HOME=$CUDAROOT
export CUDA_PATH=$CUDAROOT
export CPATH=$CUDA_PATH/include:$CPATH  # for warp-rnnt

# Install miniconda, python libraries, and other tools
cd tools
make KALDI=/path/to/kaldi

Key features

Corpus

  • ASR

    • AISHELL-1
    • CSJ
    • Librispeech
    • Switchboard (+ Fisher)
    • TEDLIUM2/TEDLIUM3
    • TIMIT
    • WSJ
  • LM

    • Penn Tree Bank
    • WikiText2

Front-end

  • Frame stacking
  • Sequence summary network [link]
  • SpecAugment [link]
  • Adaptive SpecAugment [link]

Encoder

  • RNN encoder
    • (CNN-)BLSTM, (CNN-)LSTM, (CNN-)BLGRU, (CNN-)LGRU
    • Latency-controlled BRNN [link]
    • Random state passing (RSP) [link]
  • Transformer encoder [link]
    • Chunk hopping mechanism [link]
    • Relative positional encoding [link]
    • Causal mask
  • Conformer encoder [link]
  • Time-depth separable (TDS) convolution encoder [link] [line]
  • Gated CNN encoder (GLU) [link]

Connectionist Temporal Classification (CTC) decoder

  • Beam search
  • Shallow fusion
  • Forced alignment

RNN-Transducer (RNN-T) decoder [link]

  • Beam search
  • Shallow fusion

Attention-based decoder

  • RNN decoder
    • Shallow fusion
    • Cold fusion [link]
    • Deep fusion [link]
    • Forward-backward attention decoding [link]
    • Ensemble decoding
  • Attention type
    • location-based
    • content-based
    • dot-product
    • GMM attention
  • Streaming RNN decoder specific
    • Hard monotonic attention [link]
    • Monotonic chunkwise attention (MoChA) [link]
    • Delay constrained training (DeCoT) [link]
    • Minimum latency training (MinLT) [link]
    • CTC-synchronous training (CTC-ST) [link]
  • Transformer decoder [link]
  • Streaming Transformer decoder specific
    • Monotonic Multihead Attention [link] [link]

Language model (LM)

  • RNNLM (recurrent neural network language model)
  • Gated convolutional LM [link]
  • Transformer LM
  • Transformer-XL LM [link]
  • Adaptive softmax [link]

Output units

  • Phoneme
  • Grapheme
  • Wordpiece (BPE, sentencepiece)
  • Word
  • Word-char mix

Multi-task learning (MTL)

Multi-task learning (MTL) with different units are supported to alleviate data sparseness.

  • Hybrid CTC/attention [link]
  • Hierarchical Attention (e.g., word attention + character attention) [link]
  • Hierarchical CTC (e.g., word CTC + character CTC) [link]
  • Hierarchical CTC+Attention (e.g., word attention + character CTC) [link]
  • Forward-backward attention [link]
  • LM objective

ASR Performance

AISHELL-1 (CER)

Model dev test
Transformer 5.0 5.4
Conformer 4.7 5.2
Streaming MMA 5.5 6.1

CSJ (WER)

Model eval1 eval2 eval3
BLSTM LAS 6.5 5.1 5.6
LC-BLSTM MoChA 7.4 5.6 6.4

Switchboard 300h (WER)

Model SWB CH
BLSTM LAS 9.1 18.8

Switchboard+Fisher 2000h (WER)

Model SWB CH
BLSTM LAS 7.8 13.8

Librispeech (WER)

Model dev-clean dev-other test-clean test-other
BLSTM LAS 2.5 7.2 2.6 7.5
BLSTM RNN-T 2.9 8.5 3.2 9.0
Transformer 2.1 5.3 2.4 5.7
UniLSTM RNN-T 3.7 11.7 4.0 11.6
UniLSTM MoChA 4.1 11.0 4.2 11.2
LC-BLSTM RNN-T 3.3 9.8 3.5 10.2
LC-BLSTM MoChA 3.3 8.8 3.5 9.1
Streaming MMA 2.5 6.9 2.7 7.1

TEDLIUM2 (WER)

Model dev test
BLSTM LAS 8.1 7.5
LC-BLSTM RNN-T 8.9 8.5
LC-BLSTM MoChA 10.6 8.6
UniLSTM RNN-T 11.6 11.7
UniLSTM MoChA 13.6 11.6

WSJ (WER)

Model test_dev93 test_eval92
BLSTM LAS 8.8 6.2

LM Performance

Penn Tree Bank (PPL)

Model valid test
RNNLM 87.99 86.06
+ cache=100 79.58 79.12
+ cache=500 77.36 76.94

WikiText2 (PPL)

Model valid test
RNNLM 104.53 98.73
+ cache=100 90.86 85.87
+ cache=2000 76.10 72.77

Reference

Dependency

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].