All Projects → by2101 → Openasr

by2101 / Openasr

Licence: apache-2.0
A pytorch based end2end speech recognition system.

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Openasr

Neural sp
End-to-end ASR/LM implementation with PyTorch
Stars: ✭ 408 (+491.3%)
Mutual labels:  speech-recognition, speech, asr, transformer
Asr audio data links
A list of publically available audio data that anyone can download for ASR or other speech activities
Stars: ✭ 128 (+85.51%)
Mutual labels:  speech-recognition, speech, speech-to-text, asr
sova-asr
SOVA ASR (Automatic Speech Recognition)
Stars: ✭ 123 (+78.26%)
Mutual labels:  speech, speech-recognition, speech-to-text, asr
Lingvo
Lingvo
Stars: ✭ 2,361 (+3321.74%)
Mutual labels:  speech-recognition, speech, speech-to-text, asr
wav2vec2-live
A live speech recognition using Facebooks wav2vec 2.0 model.
Stars: ✭ 205 (+197.1%)
Mutual labels:  speech, speech-recognition, speech-to-text, asr
Edgedict
Working online speech recognition based on RNN Transducer. ( Trained model release available in release )
Stars: ✭ 205 (+197.1%)
Mutual labels:  speech-recognition, speech, speech-to-text, asr
End2end Asr Pytorch
End-to-End Automatic Speech Recognition on PyTorch
Stars: ✭ 175 (+153.62%)
Mutual labels:  speech-recognition, speech, asr, transformer
ASR-Audio-Data-Links
A list of publically available audio data that anyone can download for ASR or other speech activities
Stars: ✭ 179 (+159.42%)
Mutual labels:  speech, speech-recognition, speech-to-text, asr
Syn Speech
Syn.Speech is a flexible speaker independent continuous speech recognition engine for Mono and .NET framework
Stars: ✭ 57 (-17.39%)
Mutual labels:  speech-recognition, speech, speech-to-text, asr
Eesen
The official repository of the Eesen project
Stars: ✭ 738 (+969.57%)
Mutual labels:  speech-recognition, speech-to-text, asr
demo vietasr
Vietnamese Speech Recognition
Stars: ✭ 22 (-68.12%)
Mutual labels:  speech-recognition, speech-to-text, asr
Annyang
💬 Speech recognition for your site
Stars: ✭ 6,216 (+8908.7%)
Mutual labels:  speech-recognition, speech, speech-to-text
Pykaldi
A Python wrapper for Kaldi
Stars: ✭ 756 (+995.65%)
Mutual labels:  speech-recognition, speech, asr
spokestack-android
Extensible Android mobile voice framework: wakeword, ASR, NLU, and TTS. Easily add voice to any Android app!
Stars: ✭ 52 (-24.64%)
Mutual labels:  speech, speech-recognition, asr
kosr
Korean speech recognition based on transformer (트랜스포머 기반 한국어 음성 인식)
Stars: ✭ 25 (-63.77%)
Mutual labels:  transformer, speech-recognition, asr
Cheetah
On-device streaming speech-to-text engine powered by deep learning
Stars: ✭ 383 (+455.07%)
Mutual labels:  speech-recognition, speech-to-text, asr
Awesome Kaldi
This is a list of features, scripts, blogs and resources for better using Kaldi ( http://kaldi-asr.org/ )
Stars: ✭ 393 (+469.57%)
Mutual labels:  speech-recognition, speech, speech-to-text
speech-recognition
SDKs and docs for Skit's speech to text service
Stars: ✭ 20 (-71.01%)
Mutual labels:  speech-recognition, speech-to-text, asr
Tensorflow end2end speech recognition
End-to-End speech recognition implementation base on TensorFlow (CTC, Attention, and MTL training)
Stars: ✭ 305 (+342.03%)
Mutual labels:  speech-recognition, speech-to-text, asr
Discordspeechbot
A speech-to-text bot for discord with music commands and more using NodeJS. Ideally for controlling your Discord server using voice commands, can also be useful for hearing-impaired people.
Stars: ✭ 35 (-49.28%)
Mutual labels:  speech-recognition, speech, speech-to-text

OpenASR

A pytorch based end2end speech recognition system. The main architecture is Speech-Transformer.

中文说明

Features

  1. Minimal Dependency. The system does not depend on external softwares for feature extraction or decoding. Users just install PyTorch deep learning framework.
  2. Good Performance. The system includes advanced algorithms, such as Label Smoothing, SpecAug, LST, and achieves good performance on ASHELL1. The baseline CER on AISHELL1 test is 6.6, which is better than ESPNet.
  3. Modular Design. We divided the system into several modules, such as trainer, metric, schedule, models. It is easy for extension and adding features.
  4. End2End. The feature extraction and tokenization are online. The system directly processes wave file. So, the procedure is much simpified.

Dependency

  • python >= 3.6
  • pytorch >= 1.1
  • pyyaml >= 5.1
  • tensorflow and tensorboardX for visualization. (if you do not need visualize the results, you can set TENSORBOARD_LOGGING to 0 in src/utils.py)

Usage

We use KALDI style example organization. The example directory include top-level shell scripts, data directory, exp directory. We provide an AISHELL-1 example. The path is ROOT/egs/aishell1/s5.

Data Preparation

The data preparation script is prep_data.sh. It will automaticlly download AISHELL-1 dataset, and format it into KALDI style data directory. Then, it will generate json files, and grapheme vocabulary. You can set corpusdir for storing dataset.

bash prep_data.sh

Then, it will generate data directory and exp directory.

Train Models

We use yaml files for parameter configuration. We provide 3 examples.

config_base.yaml  # baseline ASR system
config_lm_lstm.yaml  # LSTM language model
config_lst.yaml  # training ASR with LST

Run train.sh script for training baseline system.

bash train.sh

Model Averaging

Average checkpoints for improving performance.

bash avg.sh

Decoding and Scoring

Run decode_test.sh script for decoding test set.

bash decode_test.sh
bash score.sh data/test/text exp/exp1/decode_test_avg-last10

Visualization

We provide TensorboardX based visualization. The event files are stored in $expdir/log. You can use tensorboard to visualize the training procedure.

tensorboard --logdir=$expdir --bind_all

Then you can see procedures in browser (http://localhost:6006).

Examples:

per token loss in batch

encoder attention

encoder-decoder attention

Acknowledgement

This system is implemented with PyTorch. We use wave reading codes from SciPy. We use SCTK software for scoring. Thanks to Dan Povey's team and their KALDI software. I learn ASR concept, and example organization from KALDI. And thanks to Google Lingvo Team. I learn the modular design from Lingvo.

Bib

@article{bai2019learn, title={Learn Spelling from Teachers: Transferring Knowledge from Language Models to Sequence-to-Sequence Speech Recognition}, author={Bai, Ye and Yi, Jiangyan and Tao, Jianhua and Tian, Zhengkun and Wen, Zhengqi}, year={2019} }

References

Dong, Linhao, Shuang Xu, and Bo Xu. "Speech-transformer: a no-recurrence sequence-to-sequence model for speech recognition." 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2018. Zhou, Shiyu, et al. "Syllable-based sequence-to-sequence speech recognition with the transformer in mandarin chinese." arXiv preprint arXiv:1804.10752 (2018).

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].