All Projects → qute012 → kosr

qute012 / kosr

Licence: other
Korean speech recognition based on transformer (트랜스포머 기반 한국어 음성 인식)

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to kosr

kospeech
Open-Source Toolkit for End-to-End Korean Automatic Speech Recognition leveraging PyTorch and Hydra.
Stars: ✭ 456 (+1724%)
Mutual labels:  end-to-end, transformer, speech-recognition, asr, ksponspeech
Transformer-Transducer
PyTorch implementation of "Transformer Transducer: A Streamable Speech Recognition Model with Transformer Encoders and RNN-T Loss" (ICASSP 2020)
Stars: ✭ 61 (+144%)
Mutual labels:  end-to-end, transformer, speech-recognition, transformer-transducer
Kospeech
Open-Source Toolkit for End-to-End Korean Automatic Speech Recognition.
Stars: ✭ 190 (+660%)
Mutual labels:  end-to-end, transformer, speech-recognition, asr
End2end Asr Pytorch
End-to-End Automatic Speech Recognition on PyTorch
Stars: ✭ 175 (+600%)
Mutual labels:  end-to-end, transformer, speech-recognition, asr
Wenet
Production First and Production Ready End-to-End Speech Recognition Toolkit
Stars: ✭ 617 (+2368%)
Mutual labels:  transformer, speech-recognition, asr
End-to-End-Mandarin-ASR
End-to-end speech recognition on AISHELL dataset.
Stars: ✭ 20 (-20%)
Mutual labels:  end-to-end, speech-recognition, asr
wenet
Production First and Production Ready End-to-End Speech Recognition Toolkit
Stars: ✭ 2,384 (+9436%)
Mutual labels:  transformer, speech-recognition, asr
Tensorflow end2end speech recognition
End-to-End speech recognition implementation base on TensorFlow (CTC, Attention, and MTL training)
Stars: ✭ 305 (+1120%)
Mutual labels:  end-to-end, speech-recognition, asr
speech-transformer
Transformer implementation speciaized in speech recognition tasks using Pytorch.
Stars: ✭ 40 (+60%)
Mutual labels:  end-to-end, transformer, asr
Speech Transformer Tf2.0
transformer for ASR-systerm (via tensorflow2.0)
Stars: ✭ 90 (+260%)
Mutual labels:  end-to-end, transformer, speech-recognition
Espresso
Espresso: A Fast End-to-End Neural Speech Recognition Toolkit
Stars: ✭ 808 (+3132%)
Mutual labels:  end-to-end, speech-recognition, asr
E2e Asr
PyTorch Implementations for End-to-End Automatic Speech Recognition
Stars: ✭ 106 (+324%)
Mutual labels:  end-to-end, speech-recognition, asr
Athena
an open-source implementation of sequence-to-sequence based speech processing engine
Stars: ✭ 542 (+2068%)
Mutual labels:  transformer, speech-recognition, asr
Neural sp
End-to-end ASR/LM implementation with PyTorch
Stars: ✭ 408 (+1532%)
Mutual labels:  transformer, speech-recognition, asr
Openasr
A pytorch based end2end speech recognition system.
Stars: ✭ 69 (+176%)
Mutual labels:  transformer, speech-recognition, asr
Speech Transformer
A PyTorch implementation of Speech Transformer, an End-to-End ASR with Transformer network on Mandarin Chinese.
Stars: ✭ 565 (+2160%)
Mutual labels:  end-to-end, transformer, asr
Rnn Transducer
MXNet implementation of RNN Transducer (Graves 2012): Sequence Transduction with Recurrent Neural Networks
Stars: ✭ 114 (+356%)
Mutual labels:  end-to-end, speech-recognition, asr
PCPM
Presenting Collection of Pretrained Models. Links to pretrained models in NLP and voice.
Stars: ✭ 21 (-16%)
Mutual labels:  speech-recognition, asr
rustfst
Rust re-implementation of OpenFST - library for constructing, combining, optimizing, and searching weighted finite-state transducers (FSTs). A Python binding is also available.
Stars: ✭ 104 (+316%)
Mutual labels:  speech-recognition, asr
ctc-asr
End-to-end trained speech recognition system, based on RNNs and the connectionist temporal classification (CTC) cost function.
Stars: ✭ 112 (+348%)
Mutual labels:  speech-recognition, asr

Korean Online Speech Recognition

KOSR provides model implements based on transformer for end-to-end korean speech recognition. And you can train KsponSpeech dataset was processed by referring to here.

This project includes the models below.

Update

Preparation

You can download dataset at AI-Hub. And the structure of the directory should be prepared for getting started as shown below. Preprocesses were used ESPnet for normalizing text from KsponSpeech recipe. It is provided simply as .trn extention files.

root
└─ KsponSpeech_01
└─ KsponSpeech_02
└─ KsponSpeech_03
└─ KsponSpeech_04
└─ KsponSpeech_05
└─ KsponSpeech_eval
└─ scripts

Environment

For training transformer and joint CTC, it requires belows. python>=3.6 & pytorch >= 1.7.0 & torchaudio >= 0.7.0

pip install torch==1.7.0+cu101 torchaudio==0.7.0 -f https://download.pytorch.org/whl/torch_stable.html

If you want to train transformer-transducer, follow the directions below. Warp-transducer needs to install gcc++5 and export CUDA environment variable. It's not tested yet.

CUDA_HOME settings

export CUDA_HOME=$HOME/tools/cuda-9.0 # change to your path
export CUDA_TOOLKIT_ROOT_DIR=$CUDA_HOME
export LD_LIBRARY_PATH="$CUDA_HOME/extras/CUPTI/lib64:$LD_LIBRARY_PATH"
export LIBRARY_PATH=$CUDA_HOME/lib64:$LIBRARY_PATH
export LD_LIBRARY_PATH=$CUDA_HOME/lib64:$LD_LIBRARY_PATH
export CFLAGS="-I$CUDA_HOME/include $CFLAGS"

Install gcc++5 and update alternatives

sudo add-apt-repository ppa:ubuntu-toolchain-r/test
sudo apt-get update
sudo apt-get install gcc-5 g++-5
sudo update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-5 1

Usage

Before training, you should already get Ai-Hub dataset. And you needs to check configuration in conf directory and set batch size as fittable as your gpu environment. If you want to use custom configuration, use conf option(default: config/ksponspeech_transducer_base.yaml).

python train.py [--conf config-path]

Checkpoint directory will be created automatically after training. You can check saved model at checkpoint directory. If you want to train continuosly, use continue_from option.

python train.py --conf model-configuration --load_model saved-model-path

Transformer-ls

python train.py --conf conf/ksponspeech_transformer_base.yaml

Transformer jointed CTC

python train.py --conf conf/ksponspeech_transformer_joint_ctc_base.yaml

Results

Paper used 3-grams language model. You can build N-grams using KenLM.

Data Model CER WER Preprocessing
Eval-Clean Transformer (β=6) 14% 32% Filter Bank + SpecAugment

Author

Email: [email protected]

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].