All Projects → srvk → Eesen

srvk / Eesen

Licence: apache-2.0
The official repository of the Eesen project

Projects that are alternatives of or similar to Eesen

kaldi-long-audio-alignment
Long audio alignment using Kaldi
Stars: ✭ 21 (-97.15%)
Mutual labels:  speech-recognition, speech-to-text, kaldi, asr
Tensorflow end2end speech recognition
End-to-End speech recognition implementation base on TensorFlow (CTC, Attention, and MTL training)
Stars: ✭ 305 (-58.67%)
Mutual labels:  speech-recognition, speech-to-text, asr, ctc
Pytorch Asr
ASR with PyTorch
Stars: ✭ 124 (-83.2%)
Mutual labels:  speech-recognition, asr, kaldi, ctc
Vosk Api
Offline speech recognition API for Android, iOS, Raspberry Pi and servers with Python, Java, C# and Node
Stars: ✭ 1,357 (+83.88%)
Mutual labels:  speech-recognition, speech-to-text, asr, kaldi
Speech To Text Russian
Проект для распознавания речи на русском языке на основе pykaldi.
Stars: ✭ 151 (-79.54%)
Mutual labels:  speech-recognition, speech-to-text, asr, kaldi
Neural sp
End-to-end ASR/LM implementation with PyTorch
Stars: ✭ 408 (-44.72%)
Mutual labels:  speech-recognition, asr, ctc
Tensorflowasr
⚡️ TensorFlowASR: Almost State-of-the-art Automatic Speech Recognition in Tensorflow 2. Supported languages that can use characters or subwords
Stars: ✭ 400 (-45.8%)
Mutual labels:  speech-recognition, speech-to-text, ctc
Asrt speechrecognition
A Deep-Learning-Based Chinese Speech Recognition System 基于深度学习的中文语音识别系统
Stars: ✭ 4,943 (+569.78%)
Mutual labels:  speech-recognition, speech-to-text, ctc
sova-asr
SOVA ASR (Automatic Speech Recognition)
Stars: ✭ 123 (-83.33%)
Mutual labels:  speech-recognition, speech-to-text, asr
speech-recognition
SDKs and docs for Skit's speech to text service
Stars: ✭ 20 (-97.29%)
Mutual labels:  speech-recognition, speech-to-text, asr
demo vietasr
Vietnamese Speech Recognition
Stars: ✭ 22 (-97.02%)
Mutual labels:  speech-recognition, speech-to-text, asr
vosk-model-ru-adaptation
No description or website provided.
Stars: ✭ 19 (-97.43%)
Mutual labels:  speech-recognition, kaldi, asr
vosk-asterisk
Speech Recognition in Asterisk with Vosk Server
Stars: ✭ 52 (-92.95%)
Mutual labels:  speech-recognition, speech-to-text, asr
Silero Models
Silero Models: pre-trained STT models and benchmarks made embarrassingly simple
Stars: ✭ 522 (-29.27%)
Mutual labels:  speech-recognition, speech-to-text, asr
spokestack-ios
Spokestack: give your iOS app a voice interface!
Stars: ✭ 27 (-96.34%)
Mutual labels:  speech-recognition, speech-to-text, asr
speech-to-text
mixlingual speech recognition system; hybrid (GMM+NNet) model; Kaldi + Keras
Stars: ✭ 61 (-91.73%)
Mutual labels:  speech-recognition, speech-to-text, kaldi
Vosk Server
WebSocket, gRPC and WebRTC speech recognition server based on Vosk and Kaldi libraries
Stars: ✭ 277 (-62.47%)
Mutual labels:  speech-recognition, asr, kaldi
Zamia Speech
Open tools and data for cloudless automatic speech recognition
Stars: ✭ 374 (-49.32%)
Mutual labels:  speech-recognition, asr, kaldi
Athena
an open-source implementation of sequence-to-sequence based speech processing engine
Stars: ✭ 542 (-26.56%)
Mutual labels:  speech-recognition, asr, ctc
kaldi ag training
Docker image and scripts for training finetuned or completely personal Kaldi speech models. Particularly for use with kaldi-active-grammar.
Stars: ✭ 14 (-98.1%)
Mutual labels:  speech-recognition, speech-to-text, kaldi

Eesen

Eesen is to simplify the existing complicated, expertise-intensive ASR pipeline into a straightforward sequence learning problem. Acoustic modeling in Eesen involves training a single recurrent neural network (RNN) to model the mapping from speech to text. Eesen abandons the following elements required by the existing ASR pipeline:

  • Hidden Markov models (HMMs)
  • Gaussian mixture models (GMMs)
  • Decision trees and phonetic questions
  • Dictionary, if characters are used as the modeling units
  • ...

Eesen was created by Yajie Miao with inspiration from the Kaldi toolkit. Thank you, Yajie!

Key Components

Eesen contains 4 key components to enable end-to-end ASR:

  • Acoustic Model -- Bi-directional RNNs with LSTM units.
  • Training -- Connectionist temporal classification (CTC) as the training objective.
  • WFST Decoding -- A principled decoding approach based on Weighted Finite-State Transducers (WFSTs), or
  • RNN-LM Decoding -- Decoding based on (character) RNN language models, when using Tensorflow (currently its own branch)

Highlights of Eesen

  • The WFST-based decoding approach can incorporate lexicons and language models into CTC decoding in an effective and efficient way.
  • The RNN-LM decoding approach does not require a fixed lexicon.
  • GPU implementation of LSTM model training and CTC learning, now also using Tensorflow.
  • Multiple utterances are processed in parallel for training speed-up.
  • Fully-fledged example setups to demonstrate end-to-end system building, with both phonemes and characters as labels, following Kaldi recipes and conventions.

Experimental Results

Refer to RESULTS under each example setup.

References

For more information, please refer to the following paper(s):

Yajie Miao, Mohammad Gowayyed, and Florian Metze, "EESEN: End-to-End Speech Recognition using Deep RNN Models and WFST-based Decoding," in Proc. Automatic Speech Recognition and Understanding Workshop (ASRU), Scottsdale, AZ; U.S.A., December 2015. IEEE.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].