Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

THEANO-KALDI-RNNs is a project implementing various Recurrent Neural Networks (RNNs) for RNN-HMM speech recognition. The Theano Code is coupled with the Kaldi decoder.

Stars: ✭ 31 (-80.38%)

Mutual labels: kaldi

Ivector Xvector

Extract xvector and ivector under kaldi

Stars: ✭ 67 (-57.59%)

Mutual labels: kaldi

Elpis

🙊 WIP software for creating speech recognition models.

Stars: ✭ 101 (-36.08%)

Mutual labels: kaldi

Eesen

The official repository of the Eesen project

Stars: ✭ 738 (+367.09%)

Mutual labels: kaldi

Eend

End-to-End Neural Diarization

Stars: ✭ 153 (-3.16%)

Mutual labels: kaldi

Plda

An LDA/PLDA estimator using KALDI in python for speaker verification tasks

Stars: ✭ 85 (-46.2%)

Mutual labels: kaldi

Tf Kaldi Speaker

Neural speaker recognition/verification system based on Kaldi and Tensorflow

Stars: ✭ 117 (-25.95%)

Mutual labels: kaldi

Voxceleb Ivector

Voxceleb1 i-vector based speaker recognition system

Stars: ✭ 36 (-77.22%)

Mutual labels: kaldi

Dragonfire

the open-source virtual assistant for Ubuntu based Linux distributions

Stars: ✭ 1,120 (+608.86%)

Mutual labels: kaldi

Vosk Api

Offline speech recognition API for Android, iOS, Raspberry Pi and servers with Python, Java, C# and Node

Stars: ✭ 1,357 (+758.86%)

Mutual labels: kaldi

Kaldi Io

c++ Kaldi IO lib (static and dynamic).

Stars: ✭ 22 (-86.08%)

Mutual labels: kaldi

Kaldi

kaldi-asr/kaldi is the official location of the Kaldi project.

Stars: ✭ 11,151 (+6957.59%)

Mutual labels: kaldi

Pykaldi

A Python wrapper for Kaldi

Stars: ✭ 756 (+378.48%)

Mutual labels: kaldi

Pytorch Kaldi Neural Speaker Embeddings

A light weight neural speaker embeddings extraction based on Kaldi and PyTorch.

Stars: ✭ 99 (-37.34%)

Mutual labels: kaldi

Py Kaldi Asr

Some simple wrappers around kaldi-asr intended to make using kaldi's (online) decoders as convenient as possible.

Stars: ✭ 156 (-1.27%)

Mutual labels: kaldi

Speech To Text Russian

Проект для распознавания речи на русском языке на основе pykaldi.

Stars: ✭ 151 (-4.43%)

Mutual labels: kaldi

Kaldi Gop

Computes the GMM-based Goodness of Pronunciation (GOP). Bases on Kaldi.

Stars: ✭ 104 (-34.18%)

Mutual labels: kaldi

View All Similar Projects ➔

pykaldi2

PyKaldi2 is a speech toolkit that is built based on Kaldi and PyTorch. It relies on PyKaldi - the Python wrapper of Kaldi, to access Kaldi functionalities. The key features of PyKaldi2 are one-the-fly lattice generation for lattice-based sequence training, on-the-fly data simulation and on-the-fly alignment gereation. A beta version lattice-free MMI (LFMMI) training script is also provided.

How to install

PyKaldi2 runs on top of the Horovod and PyKaldi libraries. The dockerfile is provided to customarize the envriorment. To use the repo, do the following three steps.

Clone the repo by

  git clone https://github.com/jzlianglu/pykaldi2.git

Build the docker image, simply run

  docker build -t horovod-pykaldi docker

Alternatively, you can download the docker image by

  docker pull pykaldi2docker/horovod-pykaldi:torch1.2

Activate the docker image, for example

  NV_GPU=0,1,2,3 nvidia-docker run -v `pwd`:`pwd` -w `pwd` --shm-size=32G -i -t horovod-pykaldi

If you want to run multi-GPU jobs using Horovod on a single machine, the command is like

  horovodrun -np 4 -H localhost:4 sh run_ce.sh

Please refer to Horovod for running cross-machine distributed jobs.

Training speed

We measured the training speed of PyKaldi2 on Librispeech dataset with Tesla V100 GPUs. We used BLSTM acoustic models with 3 hidden layers and each layer has 512 hidden units. The table below shows the training speed in our experiments. iRTF means the inverted real time factor, i.e., the amount of data (in terms of hours) we can process per hour. The minibatch is of the shape as batch-size x seq-len x feat-dim. For CE training, the seq-len is 80, which means we cut the utterance into chunks of 80 frames. For MMI training, the sequence length is variable, so it is denoted as *.

model	loss	bsxlen	#GPUs	iRTF
	CE	64x80	1	190
	CE	64x80	4	220
	CE	256x80	4	520
BLSTM	CE	1024x80	16	1356
	MMI	1 x *	1	11.6
	MMI	4 x *	1	16.7
	MMI	4 x *	4	34.5
	MMI	16 x *	4	50.4
	MMI	64 x *	16	176

Example

To use PyKaldi2, you need to run the Kaldi speech toolkit up to the end of GMM training stages. PyKaldi2 will rely on the alignments and the denominator graph from the GMM system for CE and SE training. An example of the Librispeech system is given in the example directory.

Future works

Currently, the toolkit is still in the early stage, and we are still improving it. The dimensions that we are looking at include

More efficent dataloader, to support large-scale dataset.
More efficent distributed training. Horovod has sub-linear speedup when it runs on the cross-machine distributed training mode, which could be improved.
Lattice-free MMI, the state-of-the-art approach in Kaldi
Joint frontend and backend optimization.
Support more neural network models

If you are intersted to contribute to this line of research, please contact Liang Lu (email address is provided in the arxiv paper).

Disclaimer

This is not an official Microsoft product

Reference

Liang Lu, Xiong Xiao, Zhuo Chen, Yifan Gong, "PyKaldi2: Yet another speech toolkit based on Kaldi and PyTorch", arxiv, 2019

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 158

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (9) 🔗