Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Automatic Speech Recognition (ASR), Speaker Verification, Speech Synthesis, Text-to-Speech (TTS), Language Modelling, Singing Voice Synthesis (SVS), Voice Conversion (VC)

Stars: ✭ 2,085 (+411.03%)

Mutual labels: attention-mechanism, seq2seq, language-model, speech-recognition

Sockeye

Sequence-to-sequence framework with a focus on Neural Machine Translation based on Apache MXNet

Stars: ✭ 990 (+142.65%)

Mutual labels: attention-mechanism, seq2seq, sequence-to-sequence, transformer

Openasr

A pytorch based end2end speech recognition system.

Stars: ✭ 69 (-83.09%)

Mutual labels: speech-recognition, speech, asr, transformer

Openseq2seq

Toolkit for efficient experimentation with Speech Recognition, Text2Speech and NLP

Stars: ✭ 1,378 (+237.75%)

Mutual labels: speech-recognition, seq2seq, language-model, sequence-to-sequence

End2end Asr Pytorch

End-to-End Automatic Speech Recognition on PyTorch

Stars: ✭ 175 (-57.11%)

Mutual labels: speech-recognition, speech, asr, transformer

Pytorch Asr

ASR with PyTorch

Stars: ✭ 124 (-69.61%)

Mutual labels: speech-recognition, speech, asr, ctc

Kospeech

Open-Source Toolkit for End-to-End Korean Automatic Speech Recognition.

Stars: ✭ 190 (-53.43%)

Mutual labels: speech-recognition, seq2seq, asr, transformer

Transformers

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.

Stars: ✭ 55,742 (+13562.25%)

Mutual labels: language-model, transformer, speech-recognition, seq2seq

Pykaldi

A Python wrapper for Kaldi

Stars: ✭ 756 (+85.29%)

Mutual labels: speech-recognition, speech, language-model, asr

kospeech

Open-Source Toolkit for End-to-End Korean Automatic Speech Recognition leveraging PyTorch and Hydra.

Stars: ✭ 456 (+11.76%)

Mutual labels: transformer, speech-recognition, seq2seq, asr

torch-asg

Auto Segmentation Criterion (ASG) implemented in pytorch

Stars: ✭ 42 (-89.71%)

Mutual labels: speech, seq2seq, asr, ctc

A-Persona-Based-Neural-Conversation-Model

No description or website provided.

Stars: ✭ 22 (-94.61%)

Mutual labels: seq2seq, sequence-to-sequence, attention-mechanism

transformer

A PyTorch Implementation of "Attention Is All You Need"

Stars: ✭ 28 (-93.14%)

Mutual labels: transformer, seq2seq, attention

Nlp Tutorials

Simple implementations of NLP models. Tutorials are written in Chinese on my website https://mofanpy.com

Stars: ✭ 394 (-3.43%)

Mutual labels: attention, seq2seq, transformer

wenet

Production First and Production Ready End-to-End Speech Recognition Toolkit

Stars: ✭ 2,384 (+484.31%)

Mutual labels: transformer, speech-recognition, asr

View All Similar Projects ➔

NeuralSP: Neural network based Speech Processing

How to install

# Set path to CUDA, NCCL
CUDAROOT=/usr/local/cuda
NCCL_ROOT=/usr/local/nccl

export CPATH=$NCCL_ROOT/include:$CPATH
export LD_LIBRARY_PATH=$NCCL_ROOT/lib/:$CUDAROOT/lib64:$LD_LIBRARY_PATH
export LIBRARY_PATH=$NCCL_ROOT/lib/:$LIBRARY_PATH
export CUDA_HOME=$CUDAROOT
export CUDA_PATH=$CUDAROOT
export CPATH=$CUDA_PATH/include:$CPATH  # for warp-rnnt

# Install miniconda, python libraries, and other tools
cd tools
make KALDI=/path/to/kaldi

Key features

Corpus

ASR
- AISHELL-1
- CSJ
- Librispeech
- Switchboard (+ Fisher)
- TEDLIUM2/TEDLIUM3
- TIMIT
- WSJ
LM
- Penn Tree Bank
- WikiText2

Front-end

Frame stacking
Sequence summary network [link]
SpecAugment [link]
Adaptive SpecAugment [link]

Encoder

RNN encoder
- (CNN-)BLSTM, (CNN-)LSTM, (CNN-)BLGRU, (CNN-)LGRU
- Latency-controlled BRNN [link]
- Random state passing (RSP) [link]
Transformer encoder [link]
- Chunk hopping mechanism [link]
- Relative positional encoding [link]
- Causal mask
Conformer encoder [link]
Time-depth separable (TDS) convolution encoder [link] [line]
Gated CNN encoder (GLU) [link]

Connectionist Temporal Classification (CTC) decoder

Beam search
Shallow fusion
Forced alignment

RNN-Transducer (RNN-T) decoder [link]

Beam search
Shallow fusion

Attention-based decoder

RNN decoder
- Shallow fusion
- Cold fusion [link]
- Deep fusion [link]
- Forward-backward attention decoding [link]
- Ensemble decoding
Attention type
- location-based
- content-based
- dot-product
- GMM attention
Streaming RNN decoder specific
- Hard monotonic attention [link]
- Monotonic chunkwise attention (MoChA) [link]
- Delay constrained training (DeCoT) [link]
- Minimum latency training (MinLT) [link]
- CTC-synchronous training (CTC-ST) [link]
Transformer decoder [link]
Streaming Transformer decoder specific
- Monotonic Multihead Attention [link] [link]

Language model (LM)

RNNLM (recurrent neural network language model)
Gated convolutional LM [link]
Transformer LM
Transformer-XL LM [link]
Adaptive softmax [link]

Output units

Phoneme
Grapheme
Wordpiece (BPE, sentencepiece)
Word
Word-char mix

Multi-task learning (MTL)

Multi-task learning (MTL) with different units are supported to alleviate data sparseness.

Hybrid CTC/attention [link]
Hierarchical Attention (e.g., word attention + character attention) [link]
Hierarchical CTC (e.g., word CTC + character CTC) [link]
Hierarchical CTC+Attention (e.g., word attention + character CTC) [link]
Forward-backward attention [link]
LM objective

ASR Performance

AISHELL-1 (CER)

Model	dev	test
Transformer	5.0	5.4
Conformer	4.7	5.2
Streaming MMA	5.5	6.1

CSJ (WER)

Model	eval1	eval2	eval3
BLSTM LAS	6.5	5.1	5.6
LC-BLSTM MoChA	7.4	5.6	6.4

Switchboard 300h (WER)

Model	SWB	CH
BLSTM LAS	9.1	18.8

Switchboard+Fisher 2000h (WER)

Model	SWB	CH
BLSTM LAS	7.8	13.8

Librispeech (WER)

Model	dev-clean	dev-other	test-clean	test-other
BLSTM LAS	2.5	7.2	2.6	7.5
BLSTM RNN-T	2.9	8.5	3.2	9.0
Transformer	2.1	5.3	2.4	5.7
UniLSTM RNN-T	3.7	11.7	4.0	11.6
UniLSTM MoChA	4.1	11.0	4.2	11.2
LC-BLSTM RNN-T	3.3	9.8	3.5	10.2
LC-BLSTM MoChA	3.3	8.8	3.5	9.1
Streaming MMA	2.5	6.9	2.7	7.1

TEDLIUM2 (WER)

Model	dev	test
BLSTM LAS	8.1	7.5
LC-BLSTM RNN-T	8.9	8.5
LC-BLSTM MoChA	10.6	8.6
UniLSTM RNN-T	11.6	11.7
UniLSTM MoChA	13.6	11.6

WSJ (WER)

Model	test_dev93	test_eval92
BLSTM LAS	8.8	6.2

LM Performance

Penn Tree Bank (PPL)

Model	valid	test
RNNLM	87.99	86.06
+ cache=100	79.58	79.12
+ cache=500	77.36	76.94

WikiText2 (PPL)

Model	valid	test
RNNLM	104.53	98.73
+ cache=100	90.86	85.87
+ cache=2000	76.10	72.77

Reference

Dependency

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 408

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (41) 🔗