Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → dangvansam98 → demo_vietasr

dangvansam98 / demo_vietasr

Licence: other

Vietnamese Speech Recognition

Programming Languages

36643 projects - #6 most used programming language

139335 projects - #7 most used programming language

30231 projects

1817 projects

77523 projects

Jupyter Notebook

11667 projects

Labels

speech-recognition automatic-speech-recognition speech-to-text stt asr vietnamese-nlp ctc-loss vietnamese-language ctc-decode vietnamese-speech-recognition

Projects that are alternatives of or similar to demo vietasr

On-device speech-to-text engine powered by deep learning

Stars: ✭ 354 (+1509.09%)

Mutual labels: speech-recognition, automatic-speech-recognition, speech-to-text, stt, asr

SOVA ASR (Automatic Speech Recognition)

Stars: ✭ 123 (+459.09%)

Mutual labels: speech-recognition, automatic-speech-recognition, speech-to-text, stt, asr

kaldi-long-audio-alignment

Long audio alignment using Kaldi

Stars: ✭ 21 (-4.55%)

Mutual labels: speech-recognition, automatic-speech-recognition, speech-to-text, asr

speech-recognition-evaluation

Evaluate results from ASR/Speech-to-Text quickly

Stars: ✭ 25 (+13.64%)

Mutual labels: speech-recognition, speech-to-text, stt, asr

opensource-voice-tools

A repo listing known open source voice tools, ordered by where they sit in the voice stack

Stars: ✭ 21 (-4.55%)

Mutual labels: speech-recognition, stt, asr

react-native-spokestack

Spokestack: give your React Native app a voice interface!

Stars: ✭ 53 (+140.91%)

Mutual labels: speech-recognition, speech-to-text, asr

Speech-to-text and keyboard input captions for OBS.

Stars: ✭ 89 (+304.55%)

Mutual labels: speech-recognition, speech-to-text, stt

speech-recognition

SDKs and docs for Skit's speech to text service

Stars: ✭ 20 (-9.09%)

Mutual labels: speech-recognition, speech-to-text, asr

A merged version of multiple open-source German speech datasets.

Stars: ✭ 21 (-4.55%)

Mutual labels: speech-recognition, speech-to-text, asr

Presenting Collection of Pretrained Models. Links to pretrained models in NLP and voice.

Stars: ✭ 21 (-4.55%)

Mutual labels: speech-recognition, speech-to-text, asr

Speech Recognition in Asterisk with Vosk Server

Stars: ✭ 52 (+136.36%)

Mutual labels: speech-recognition, speech-to-text, asr

ASR-Audio-Data-Links

A list of publically available audio data that anyone can download for ASR or other speech activities

Stars: ✭ 179 (+713.64%)

Mutual labels: speech-recognition, speech-to-text, asr

A live speech recognition using Facebooks wav2vec 2.0 model.

Stars: ✭ 205 (+831.82%)

Mutual labels: speech-recognition, speech-to-text, asr

open-speech-corpora

💎 A list of accessible speech corpora for ASR, TTS, and other Speech Technologies

Stars: ✭ 841 (+3722.73%)

Mutual labels: speech-recognition, speech-to-text, stt

Spokestack: give your iOS app a voice interface!

Stars: ✭ 27 (+22.73%)

Mutual labels: speech-recognition, speech-to-text, asr

Speech to text bot for Discord using Mozilla's DeepSpeech

Stars: ✭ 14 (-36.36%)

Mutual labels: speech-recognition, speech-to-text, stt

Lingvo

Stars: ✭ 2,361 (+10631.82%)

Mutual labels: speech-recognition, speech-to-text, asr

Working online speech recognition based on RNN Transducer. ( Trained model release available in release )

Stars: ✭ 205 (+831.82%)

Mutual labels: speech-recognition, speech-to-text, asr

Production First and Production Ready End-to-End Speech Recognition Toolkit

Stars: ✭ 2,384 (+10736.36%)

Mutual labels: speech-recognition, automatic-speech-recognition, asr

A PyTorch implementation of the Deep Audio-Visual Speech Recognition paper.

Stars: ✭ 104 (+372.73%)

Mutual labels: speech-recognition, automatic-speech-recognition, speech-to-text

View All Similar Projects ➔

VietASR (NVIDIA NeMo ToolKit)

⚡ Some experiment with NeMo ⚡

Result

Model: QuartzNet is a smaller version of Jaser model
I list the word error rate (WER) with and without LM of major ASR tasks.

Task	CER (%)	WER (%)	+LM WER (%)
VIVOS (TEST)	6.80	18.02	15.72
VLSP2018	6.87	16.26	N/A
VLSP2020 T1	14.73	30.96	N/A
VLSP2020 T2	41.67	69.15	N/A

Model was trained with ~500 hours Vietnamese speech dataset, was collected from youtube, radio, call center(8k), text to speech data and some public dataset (vlsp, vivos, fpt). It is very small model (13M parameters) make it inference so fast ⚡

Installation

ctcdecoder, kemlm for LM Decode
pip install ds-ctcdecoder
and some python libraries: torch, numpy, librosa, flask, flask_socketio, requests,...

Run Demo

Vietnamese Model (pretrained): python flask_upload_record_vn.py
Video demo in Youtube: https://youtu.be/P3mhEngL1us
English Model (pretrained): python flask_upload_record_en.py

TODO

Conformer Model
Transformer LM instead of kenlm
Data augumentation: speed, noise, pitch shift, time shift,...
FastAPI

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 22

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (3) 🔗