All Projects → jimbozhang → Kaldi Gop

jimbozhang / Kaldi Gop

Licence: other
Computes the GMM-based Goodness of Pronunciation (GOP). Bases on Kaldi.

Projects that are alternatives of or similar to Kaldi Gop

Factorized Tdnn
PyTorch implementation of the Factorized TDNN (TDNN-F) from "Semi-Orthogonal Low-Rank Matrix Factorization for Deep Neural Networks" and Kaldi
Stars: ✭ 98 (-5.77%)
Mutual labels:  speech-recognition, kaldi
Vosk Api
Offline speech recognition API for Android, iOS, Raspberry Pi and servers with Python, Java, C# and Node
Stars: ✭ 1,357 (+1204.81%)
Mutual labels:  speech-recognition, kaldi
speech-to-text
mixlingual speech recognition system; hybrid (GMM+NNet) model; Kaldi + Keras
Stars: ✭ 61 (-41.35%)
Mutual labels:  speech-recognition, kaldi
kaldi-long-audio-alignment
Long audio alignment using Kaldi
Stars: ✭ 21 (-79.81%)
Mutual labels:  speech-recognition, kaldi
Eesen
The official repository of the Eesen project
Stars: ✭ 738 (+609.62%)
Mutual labels:  speech-recognition, kaldi
srvk-eesen-offline-transcriber
Top level code to transcribe English audio/video files into text/subtitles
Stars: ✭ 22 (-78.85%)
Mutual labels:  speech-recognition, kaldi
Vosk Server
WebSocket, gRPC and WebRTC speech recognition server based on Vosk and Kaldi libraries
Stars: ✭ 277 (+166.35%)
Mutual labels:  speech-recognition, kaldi
Kaldi Active Grammar
Python Kaldi speech recognition with grammars that can be set active/inactive dynamically at decode-time
Stars: ✭ 196 (+88.46%)
Mutual labels:  speech-recognition, kaldi
Awesome Kaldi
This is a list of features, scripts, blogs and resources for better using Kaldi ( http://kaldi-asr.org/ )
Stars: ✭ 393 (+277.88%)
Mutual labels:  speech-recognition, kaldi
Zamia Speech
Open tools and data for cloudless automatic speech recognition
Stars: ✭ 374 (+259.62%)
Mutual labels:  speech-recognition, kaldi
rustfst
Rust re-implementation of OpenFST - library for constructing, combining, optimizing, and searching weighted finite-state transducers (FSTs). A Python binding is also available.
Stars: ✭ 104 (+0%)
Mutual labels:  speech-recognition, kaldi
Espresso
Espresso: A Fast End-to-End Neural Speech Recognition Toolkit
Stars: ✭ 808 (+676.92%)
Mutual labels:  speech-recognition, kaldi
kaldi ag training
Docker image and scripts for training finetuned or completely personal Kaldi speech models. Particularly for use with kaldi-active-grammar.
Stars: ✭ 14 (-86.54%)
Mutual labels:  speech-recognition, kaldi
vosk-model-ru-adaptation
No description or website provided.
Stars: ✭ 19 (-81.73%)
Mutual labels:  speech-recognition, kaldi
Zeroth
Kaldi-based Korean ASR (한국어 음성인식) open-source project
Stars: ✭ 248 (+138.46%)
Mutual labels:  speech-recognition, kaldi
Vosk Android Demo
Offline speech recognition for Android with Vosk library.
Stars: ✭ 271 (+160.58%)
Mutual labels:  speech-recognition, kaldi
Pytorch Kaldi
pytorch-kaldi is a project for developing state-of-the-art DNN/RNN hybrid speech recognition systems. The DNN part is managed by pytorch, while feature extraction, label computation, and decoding are performed with the kaldi toolkit.
Stars: ✭ 2,097 (+1916.35%)
Mutual labels:  speech-recognition, kaldi
Kaldi Onnx
Kaldi model converter to ONNX
Stars: ✭ 174 (+67.31%)
Mutual labels:  speech-recognition, kaldi
Espnet
End-to-End Speech Processing Toolkit
Stars: ✭ 4,533 (+4258.65%)
Mutual labels:  speech-recognition, kaldi
Pykaldi
A Python wrapper for Kaldi
Stars: ✭ 756 (+626.92%)
Mutual labels:  speech-recognition, kaldi

kaldi-gop

This project computes GMM-based GOP (Goodness of Pronunciation) using Kaldi.

Notes about the DNN-based implementation

This implementation is GMM-based. For DNN-based implementation, please check Kaldi's official repository:

https://github.com/kaldi-asr/kaldi/tree/master/egs/gop_speechocean762

The performance of GOP-DNN should be much better than GOP-GMM.

How to build

./build.sh

Run the example

cd egs/gop-compute
./run.sh

Theory

In the conventional GMM-HMM based system, GOP was first proposed in (Witt et al., 2000). It was defined as the duration normalised log of the posterior:

$$ GOP(p)=\frac{1}{t_e-t_s+1} \log p(p|\mathbf o) $$

where $\mathbf o$ is the input observations, $p$ is the canonical phone, $t_s, t_e$ are the start and end frame indexes.

Assuming $p(q_i)\approx p(q_j)$ for any $q_i, q_j$, we have:

$$ \log p(p|\mathbf o)=\frac{p(\mathbf o|p)p(p)}{\sum_{q\in Q} p(\mathbf o|q)p(q)} \approx\frac{p(\mathbf o|p)}{\sum_{q\in Q} p(\mathbf o|q)} $$

where $Q$ is the whole phone set.

The numerator of the equation is calculated from forced alignment result and the denominator is calculated from a Viterbi decoding with an unconstrained phone loop.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].