All Projects → dangvansam98 → demo_vietasr

dangvansam98 / demo_vietasr

Licence: other
Vietnamese Speech Recognition

Programming Languages

C++
36643 projects - #6 most used programming language
python
139335 projects - #7 most used programming language
Makefile
30231 projects
Cuda
1817 projects
shell
77523 projects
Jupyter Notebook
11667 projects

Projects that are alternatives of or similar to demo vietasr

leopard
On-device speech-to-text engine powered by deep learning
Stars: ✭ 354 (+1509.09%)
Mutual labels:  speech-recognition, automatic-speech-recognition, speech-to-text, stt, asr
sova-asr
SOVA ASR (Automatic Speech Recognition)
Stars: ✭ 123 (+459.09%)
Mutual labels:  speech-recognition, automatic-speech-recognition, speech-to-text, stt, asr
kaldi-long-audio-alignment
Long audio alignment using Kaldi
Stars: ✭ 21 (-4.55%)
Mutual labels:  speech-recognition, automatic-speech-recognition, speech-to-text, asr
speech-recognition-evaluation
Evaluate results from ASR/Speech-to-Text quickly
Stars: ✭ 25 (+13.64%)
Mutual labels:  speech-recognition, speech-to-text, stt, asr
opensource-voice-tools
A repo listing known open source voice tools, ordered by where they sit in the voice stack
Stars: ✭ 21 (-4.55%)
Mutual labels:  speech-recognition, stt, asr
react-native-spokestack
Spokestack: give your React Native app a voice interface!
Stars: ✭ 53 (+140.91%)
Mutual labels:  speech-recognition, speech-to-text, asr
simple-obs-stt
Speech-to-text and keyboard input captions for OBS.
Stars: ✭ 89 (+304.55%)
Mutual labels:  speech-recognition, speech-to-text, stt
speech-recognition
SDKs and docs for Skit's speech to text service
Stars: ✭ 20 (-9.09%)
Mutual labels:  speech-recognition, speech-to-text, asr
megs
A merged version of multiple open-source German speech datasets.
Stars: ✭ 21 (-4.55%)
Mutual labels:  speech-recognition, speech-to-text, asr
PCPM
Presenting Collection of Pretrained Models. Links to pretrained models in NLP and voice.
Stars: ✭ 21 (-4.55%)
Mutual labels:  speech-recognition, speech-to-text, asr
vosk-asterisk
Speech Recognition in Asterisk with Vosk Server
Stars: ✭ 52 (+136.36%)
Mutual labels:  speech-recognition, speech-to-text, asr
ASR-Audio-Data-Links
A list of publically available audio data that anyone can download for ASR or other speech activities
Stars: ✭ 179 (+713.64%)
Mutual labels:  speech-recognition, speech-to-text, asr
wav2vec2-live
A live speech recognition using Facebooks wav2vec 2.0 model.
Stars: ✭ 205 (+831.82%)
Mutual labels:  speech-recognition, speech-to-text, asr
open-speech-corpora
💎 A list of accessible speech corpora for ASR, TTS, and other Speech Technologies
Stars: ✭ 841 (+3722.73%)
Mutual labels:  speech-recognition, speech-to-text, stt
spokestack-ios
Spokestack: give your iOS app a voice interface!
Stars: ✭ 27 (+22.73%)
Mutual labels:  speech-recognition, speech-to-text, asr
scripty
Speech to text bot for Discord using Mozilla's DeepSpeech
Stars: ✭ 14 (-36.36%)
Mutual labels:  speech-recognition, speech-to-text, stt
Lingvo
Lingvo
Stars: ✭ 2,361 (+10631.82%)
Mutual labels:  speech-recognition, speech-to-text, asr
Edgedict
Working online speech recognition based on RNN Transducer. ( Trained model release available in release )
Stars: ✭ 205 (+831.82%)
Mutual labels:  speech-recognition, speech-to-text, asr
wenet
Production First and Production Ready End-to-End Speech Recognition Toolkit
Stars: ✭ 2,384 (+10736.36%)
Mutual labels:  speech-recognition, automatic-speech-recognition, asr
deep avsr
A PyTorch implementation of the Deep Audio-Visual Speech Recognition paper.
Stars: ✭ 104 (+372.73%)
Mutual labels:  speech-recognition, automatic-speech-recognition, speech-to-text

VietASR (NVIDIA NeMo ToolKit)

Some experiment with NeMo

Result

  • Model: QuartzNet is a smaller version of Jaser model
  • I list the word error rate (WER) with and without LM of major ASR tasks.
Task CER (%) WER (%) +LM WER (%)
VIVOS (TEST) 6.80 18.02 15.72
VLSP2018 6.87 16.26 N/A
VLSP2020 T1 14.73 30.96 N/A
VLSP2020 T2 41.67 69.15 N/A

Model was trained with ~500 hours Vietnamese speech dataset, was collected from youtube, radio, call center(8k), text to speech data and some public dataset (vlsp, vivos, fpt). It is very small model (13M parameters) make it inference so fast

Installation

  • ctcdecoder, kemlm for LM Decode
    pip install ds-ctcdecoder
  • and some python libraries: torch, numpy, librosa, flask, flask_socketio, requests,...

Run Demo

TODO

  • Conformer Model
  • Transformer LM instead of kenlm
  • Data augumentation: speed, noise, pitch shift, time shift,...
  • FastAPI
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].