Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → srvk → Eesen

srvk / Eesen

Licence: apache-2.0

The official repository of the Eesen project

Labels

tensorflow speech-recognition speech-to-text asr kaldi ctc

Projects that are alternatives of or similar to Eesen

kaldi-long-audio-alignment

Long audio alignment using Kaldi

Stars: ✭ 21 (-97.15%)

Mutual labels: speech-recognition, speech-to-text, kaldi, asr

Tensorflow end2end speech recognition

End-to-End speech recognition implementation base on TensorFlow (CTC, Attention, and MTL training)

Stars: ✭ 305 (-58.67%)

Mutual labels: speech-recognition, speech-to-text, asr, ctc

Pytorch Asr

ASR with PyTorch

Stars: ✭ 124 (-83.2%)

Mutual labels: speech-recognition, asr, kaldi, ctc

Vosk Api

Offline speech recognition API for Android, iOS, Raspberry Pi and servers with Python, Java, C# and Node

Stars: ✭ 1,357 (+83.88%)

Mutual labels: speech-recognition, speech-to-text, asr, kaldi

Speech To Text Russian

Проект для распознавания речи на русском языке на основе pykaldi.

Stars: ✭ 151 (-79.54%)

Mutual labels: speech-recognition, speech-to-text, asr, kaldi

Neural sp

End-to-end ASR/LM implementation with PyTorch

Stars: ✭ 408 (-44.72%)

Mutual labels: speech-recognition, asr, ctc

Tensorflowasr

⚡️ TensorFlowASR: Almost State-of-the-art Automatic Speech Recognition in Tensorflow 2. Supported languages that can use characters or subwords

Stars: ✭ 400 (-45.8%)

Mutual labels: speech-recognition, speech-to-text, ctc

Asrt speechrecognition

A Deep-Learning-Based Chinese Speech Recognition System 基于深度学习的中文语音识别系统

Stars: ✭ 4,943 (+569.78%)

Mutual labels: speech-recognition, speech-to-text, ctc

sova-asr

SOVA ASR (Automatic Speech Recognition)

Stars: ✭ 123 (-83.33%)

Mutual labels: speech-recognition, speech-to-text, asr

speech-recognition

SDKs and docs for Skit's speech to text service

Stars: ✭ 20 (-97.29%)

Mutual labels: speech-recognition, speech-to-text, asr

demo vietasr

Vietnamese Speech Recognition

Stars: ✭ 22 (-97.02%)

Mutual labels: speech-recognition, speech-to-text, asr

vosk-model-ru-adaptation

No description or website provided.

Stars: ✭ 19 (-97.43%)

Mutual labels: speech-recognition, kaldi, asr

vosk-asterisk

Speech Recognition in Asterisk with Vosk Server

Stars: ✭ 52 (-92.95%)

Mutual labels: speech-recognition, speech-to-text, asr

Silero Models

Silero Models: pre-trained STT models and benchmarks made embarrassingly simple

Stars: ✭ 522 (-29.27%)

Mutual labels: speech-recognition, speech-to-text, asr

spokestack-ios

Spokestack: give your iOS app a voice interface!

Stars: ✭ 27 (-96.34%)

Mutual labels: speech-recognition, speech-to-text, asr

speech-to-text

mixlingual speech recognition system; hybrid (GMM+NNet) model; Kaldi + Keras

Stars: ✭ 61 (-91.73%)

Mutual labels: speech-recognition, speech-to-text, kaldi

Vosk Server

WebSocket, gRPC and WebRTC speech recognition server based on Vosk and Kaldi libraries

Stars: ✭ 277 (-62.47%)

Mutual labels: speech-recognition, asr, kaldi

Zamia Speech

Open tools and data for cloudless automatic speech recognition

Stars: ✭ 374 (-49.32%)

Mutual labels: speech-recognition, asr, kaldi

Athena

an open-source implementation of sequence-to-sequence based speech processing engine

Stars: ✭ 542 (-26.56%)

Mutual labels: speech-recognition, asr, ctc

kaldi ag training

Docker image and scripts for training finetuned or completely personal Kaldi speech models. Particularly for use with kaldi-active-grammar.

Stars: ✭ 14 (-98.1%)

Mutual labels: speech-recognition, speech-to-text, kaldi

View All Similar Projects ➔

Eesen

Eesen is to simplify the existing complicated, expertise-intensive ASR pipeline into a straightforward sequence learning problem. Acoustic modeling in Eesen involves training a single recurrent neural network (RNN) to model the mapping from speech to text. Eesen abandons the following elements required by the existing ASR pipeline:

Hidden Markov models (HMMs)
Gaussian mixture models (GMMs)
Decision trees and phonetic questions
Dictionary, if characters are used as the modeling units
...

Eesen was created by Yajie Miao with inspiration from the Kaldi toolkit. Thank you, Yajie!

Key Components

Eesen contains 4 key components to enable end-to-end ASR:

Acoustic Model -- Bi-directional RNNs with LSTM units.
Training -- Connectionist temporal classification (CTC) as the training objective.
WFST Decoding -- A principled decoding approach based on Weighted Finite-State Transducers (WFSTs), or
RNN-LM Decoding -- Decoding based on (character) RNN language models, when using Tensorflow (currently its own branch)

Highlights of Eesen

The WFST-based decoding approach can incorporate lexicons and language models into CTC decoding in an effective and efficient way.
The RNN-LM decoding approach does not require a fixed lexicon.
GPU implementation of LSTM model training and CTC learning, now also using Tensorflow.
Multiple utterances are processed in parallel for training speed-up.
Fully-fledged example setups to demonstrate end-to-end system building, with both phonemes and characters as labels, following Kaldi recipes and conventions.

Experimental Results

Refer to RESULTS under each example setup.

References

For more information, please refer to the following paper(s):

Yajie Miao, Mohammad Gowayyed, and Florian Metze, "EESEN: End-to-End Speech Recognition using Deep RNN Models and WFST-based Decoding," in Proc. Automatic Speech Recognition and Understanding Workshop (ASRU), Scottsdale, AZ; U.S.A., December 2015. IEEE.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 738

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (61) 🔗