All Projects → jzlianglu → Pykaldi2

jzlianglu / Pykaldi2

Licence: mit
Yet another speech toolkit based on Kaldi and PyTorch

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Pykaldi2

Espresso
Espresso: A Fast End-to-End Neural Speech Recognition Toolkit
Stars: ✭ 808 (+411.39%)
Mutual labels:  kaldi
Factorized Tdnn
PyTorch implementation of the Factorized TDNN (TDNN-F) from "Semi-Orthogonal Low-Rank Matrix Factorization for Deep Neural Networks" and Kaldi
Stars: ✭ 98 (-37.97%)
Mutual labels:  kaldi
Pytorch Asr
ASR with PyTorch
Stars: ✭ 124 (-21.52%)
Mutual labels:  kaldi
Theano Kaldi Rnn
THEANO-KALDI-RNNs is a project implementing various Recurrent Neural Networks (RNNs) for RNN-HMM speech recognition. The Theano Code is coupled with the Kaldi decoder.
Stars: ✭ 31 (-80.38%)
Mutual labels:  kaldi
Ivector Xvector
Extract xvector and ivector under kaldi
Stars: ✭ 67 (-57.59%)
Mutual labels:  kaldi
Elpis
🙊 WIP software for creating speech recognition models.
Stars: ✭ 101 (-36.08%)
Mutual labels:  kaldi
Eesen
The official repository of the Eesen project
Stars: ✭ 738 (+367.09%)
Mutual labels:  kaldi
Eend
End-to-End Neural Diarization
Stars: ✭ 153 (-3.16%)
Mutual labels:  kaldi
Plda
An LDA/PLDA estimator using KALDI in python for speaker verification tasks
Stars: ✭ 85 (-46.2%)
Mutual labels:  kaldi
Tf Kaldi Speaker
Neural speaker recognition/verification system based on Kaldi and Tensorflow
Stars: ✭ 117 (-25.95%)
Mutual labels:  kaldi
Voxceleb Ivector
Voxceleb1 i-vector based speaker recognition system
Stars: ✭ 36 (-77.22%)
Mutual labels:  kaldi
Dragonfire
the open-source virtual assistant for Ubuntu based Linux distributions
Stars: ✭ 1,120 (+608.86%)
Mutual labels:  kaldi
Vosk Api
Offline speech recognition API for Android, iOS, Raspberry Pi and servers with Python, Java, C# and Node
Stars: ✭ 1,357 (+758.86%)
Mutual labels:  kaldi
Kaldi Io
c++ Kaldi IO lib (static and dynamic).
Stars: ✭ 22 (-86.08%)
Mutual labels:  kaldi
Kaldi
kaldi-asr/kaldi is the official location of the Kaldi project.
Stars: ✭ 11,151 (+6957.59%)
Mutual labels:  kaldi
Pykaldi
A Python wrapper for Kaldi
Stars: ✭ 756 (+378.48%)
Mutual labels:  kaldi
Pytorch Kaldi Neural Speaker Embeddings
A light weight neural speaker embeddings extraction based on Kaldi and PyTorch.
Stars: ✭ 99 (-37.34%)
Mutual labels:  kaldi
Py Kaldi Asr
Some simple wrappers around kaldi-asr intended to make using kaldi's (online) decoders as convenient as possible.
Stars: ✭ 156 (-1.27%)
Mutual labels:  kaldi
Speech To Text Russian
Проект для распознавания речи на русском языке на основе pykaldi.
Stars: ✭ 151 (-4.43%)
Mutual labels:  kaldi
Kaldi Gop
Computes the GMM-based Goodness of Pronunciation (GOP). Bases on Kaldi.
Stars: ✭ 104 (-34.18%)
Mutual labels:  kaldi

pykaldi2

PyKaldi2 is a speech toolkit that is built based on Kaldi and PyTorch. It relies on PyKaldi - the Python wrapper of Kaldi, to access Kaldi functionalities. The key features of PyKaldi2 are one-the-fly lattice generation for lattice-based sequence training, on-the-fly data simulation and on-the-fly alignment gereation. A beta version lattice-free MMI (LFMMI) training script is also provided.

How to install

PyKaldi2 runs on top of the Horovod and PyKaldi libraries. The dockerfile is provided to customarize the envriorment. To use the repo, do the following three steps.

  1. Clone the repo by
  git clone https://github.com/jzlianglu/pykaldi2.git
  1. Build the docker image, simply run
  docker build -t horovod-pykaldi docker 

Alternatively, you can download the docker image by

  docker pull pykaldi2docker/horovod-pykaldi:torch1.2
  1. Activate the docker image, for example
  NV_GPU=0,1,2,3 nvidia-docker run -v `pwd`:`pwd` -w `pwd` --shm-size=32G -i -t horovod-pykaldi

If you want to run multi-GPU jobs using Horovod on a single machine, the command is like

  horovodrun -np 4 -H localhost:4 sh run_ce.sh 

Please refer to Horovod for running cross-machine distributed jobs.

Training speed

We measured the training speed of PyKaldi2 on Librispeech dataset with Tesla V100 GPUs. We used BLSTM acoustic models with 3 hidden layers and each layer has 512 hidden units. The table below shows the training speed in our experiments. iRTF means the inverted real time factor, i.e., the amount of data (in terms of hours) we can process per hour. The minibatch is of the shape as batch-size x seq-len x feat-dim. For CE training, the seq-len is 80, which means we cut the utterance into chunks of 80 frames. For MMI training, the sequence length is variable, so it is denoted as *.

model loss bsxlen #GPUs iRTF
CE 64x80 1 190
CE 64x80 4 220
CE 256x80 4 520
BLSTM CE 1024x80 16 1356
MMI 1 x * 1 11.6
MMI 4 x * 1 16.7
MMI 4 x * 4 34.5
MMI 16 x * 4 50.4
MMI 64 x * 16 176

Example

To use PyKaldi2, you need to run the Kaldi speech toolkit up to the end of GMM training stages. PyKaldi2 will rely on the alignments and the denominator graph from the GMM system for CE and SE training. An example of the Librispeech system is given in the example directory.

Future works

Currently, the toolkit is still in the early stage, and we are still improving it. The dimensions that we are looking at include

  1. More efficent dataloader, to support large-scale dataset.
  2. More efficent distributed training. Horovod has sub-linear speedup when it runs on the cross-machine distributed training mode, which could be improved.
  3. Lattice-free MMI, the state-of-the-art approach in Kaldi
  4. Joint frontend and backend optimization.
  5. Support more neural network models

If you are intersted to contribute to this line of research, please contact Liang Lu (email address is provided in the arxiv paper).

Disclaimer

This is not an official Microsoft product

Reference

Liang Lu, Xiong Xiao, Zhuo Chen, Yifan Gong, "PyKaldi2: Yet another speech toolkit based on Kaldi and PyTorch", arxiv, 2019

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].