Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

THEANO-KALDI-RNNs is a project implementing various Recurrent Neural Networks (RNNs) for RNN-HMM speech recognition. The Theano Code is coupled with the Kaldi decoder.

Stars: ✭ 31 (-68.69%)

Mutual labels: kaldi

Nhyai

AI智能审查，支持色情识别、暴恐识别、语言识别、敏感文字检测和视频检测等功能，以及各种OCR识别能力，如身份证、驾照、行驶证、营业执照、银行卡、手写体、车牌和名片识别等功能，可以访问网站体验功能。

Stars: ✭ 60 (-39.39%)

Mutual labels: kaldi

Kaldi Io

c++ Kaldi IO lib (static and dynamic).

Stars: ✭ 22 (-77.78%)

Mutual labels: kaldi

Sptk

A modified version of Speech Signal Processing Toolkit (SPTK)

Stars: ✭ 71 (-28.28%)

Mutual labels: speech-processing

Sincnet

SincNet is a neural architecture for efficiently processing raw audio samples.

Stars: ✭ 764 (+671.72%)

Mutual labels: speech-processing

Voxceleb Ivector

Voxceleb1 i-vector based speaker recognition system

Stars: ✭ 36 (-63.64%)

Mutual labels: kaldi

Fullsubnet

PyTorch implementation of "A Full-Band and Sub-Band Fusion Model for Real-Time Single-Channel Speech Enhancement."

Stars: ✭ 51 (-48.48%)

Mutual labels: speech-processing

Factorized Tdnn

PyTorch implementation of the Factorized TDNN (TDNN-F) from "Semi-Orthogonal Low-Rank Matrix Factorization for Deep Neural Networks" and Kaldi

Stars: ✭ 98 (-1.01%)

Mutual labels: kaldi

Vokaturiandroid

Emotion recognition by speech in android.

Stars: ✭ 79 (-20.2%)

Mutual labels: speech-processing

View All Similar Projects ➔

pytorch-kaldi-neural-speaker-embeddings

A light weight neural speaker embeddings extraction based on Kaldi and PyTorch.
The repository serves as a starting point for users to reproduce and experiment several recent advances in speaker recognition literature. Kaldi is used for pre-processing and post-processing and PyTorch is used for training the neural speaker embeddings. I want to note that this repo is not meant for keeping track of state-of-the-art on speaker recognition, and most likely the models will be considered outdated in a few months (or sooner :().

This repository contains a PyTorch+Kaldi pipeline to reproduce the core results for:

With some modifications, you can easily adapt the pipeline for:

If one wants to go further, take a look at our recent work on multi-speaker text-to-speech, where the same speaker embeddings are employed to model speaker characterisitcs in a text-to-speech system.

Lastly, kindly cite our paper(s) if you find this repository useful. Cite both if you are kind enough!

@article{villalba2019state,
  title={State-of-the-art speaker recognition with neural network embeddings in nist sre18 and speakers in the wild evaluations},
  author={Villalba, Jes{\'u}s and Chen, Nanxin and Snyder, David and Garcia-Romero, Daniel and McCree, Alan and Sell, Gregory and Borgstrom, Jonas and Garc{\'\i}a-Perera, Leibny Paola and Richardson, Fred and Dehak, R{\'e}da and others},
  journal={Computer Speech \& Language},
  pages={101026},
  year={2019},
  publisher={Elsevier}
}

@article{cooper2019zero,
  title={Zero-Shot Multi-Speaker Text-To-Speech with State-of-the-art Neural Speaker Embeddings},
  author={Cooper, Erica and Lai, Cheng-I and Yasuda, Yusuke and Fang, Fuming and Wang, Xin and Chen, Nanxin and Yamagishi, Junichi},
  journal={arXiv preprint arXiv:1910.10838},
  year={2019}
}

One should also check out the very nicely written TensorFlow version by Yi Lu.

Overview

Neural speaker embeddings: Encoder --> Pooling --> Classification
LDE pooling method illustration:

Requirements

pip install -r requirements.txt Please also download and properly setup Kaldi. If you are stuck in this phase, this repository is liekly not for you.

Getting Started

The bash file pipeline.sh contains the 12-stage speaker recognition pipeline, including feature extraction, the neural model training and decoding/evaluation. A more detailed description of each step is described in pipeline.sh. To get started, simply run: ./pipeline.sh

Datasets

The models are trained on VoxCeleb I+II, which is free for downloads (the trial lists are also there). One can easily adapt pipeline.sh for different datasets.

Pre-Trained Models

Due to Youtube's privacy policy, unfortunately I am not allowed to upload pre-trained models for VoxCeleb I+II.

Benchmarking Speaker Verification EERs

Embedding name	dimension	normalization	pooling type	train objective	EER	DCF^min_0.01
i-vectors	400	no	mean	EM	5.329	0.493
x-vectors	512	no	mean, std	Softmax	3.298	0.343
x-vectors^N	512	yes	mean, std	Softmax	3.213	0.342
LDE-1	512	no	mean	Softmax	3.415	0.366
LDE-1^N	512	yes	mean	Softmax	3.446	0.365
LDE-2	512	no	mean	ASoftmax (m=2)	3.674	0.364
LDE-2^N	512	yes	mean	ASoftmax (m=2)	3.664	0.386
LDE-3	512	no	mean	ASoftmax (m=3)	3.033	0.314
LDE-3^N	512	yes	mean	ASoftmax (m=3)	3.171	0.327
LDE-4	512	no	mean	ASoftmax (m=4)	3.112	0.315
LDE-4^N	512	yes	mean	ASoftmax (m=4)	3.271	0.327
LDE-5	256	no	mean	ASoftmax (m=2)	3.287	0.343
LDE-5^N	256	yes	mean	ASoftmax (m=2)	3.367	0.351
LDE-6	200	no	mean	ASoftmax (m=2)	3.266	0.396
LDE-6^N	200	yes	mean	ASoftmax (m=2)	3.266	0.396
LDE-7	512	no	mean, std	ASoftmax (m=2)	3.091	0.303
LDE-7^N	512	yes	mean, std	ASoftmax (m=2)	3.171	0.328

Using Speaker Embeddings for Tacotron2 Speaker Adaptation

Speaker Embedding Space Visualization (cluster by speakers)

i-vectors (baseline)

LDE

Benchmarking TTS MOS scores

Embedding name	Naturalness dev	Naturalness test	Similarity dev	Similarity test
vocoded	3.41	3.55	2.79	2.82
x-vectors^N	3.19	3.19	1.86	2.37
LDE-1	3.16	3.21	2.05	2.34
LDE-1^N	3.13	3.46	1.97	2.45
LDE-2	3.28	3.35	2.00	2.37
LDE-2^N	3.19	3.33	2.00	2.35
LDE-3	3.24	3.48	1.88	2.46
LDE-3^N	3.16	3.33	2.00	2.37
LDE-4	3.10	3.29	2.00	2.31
LDE-4^N	3.20	3.29	1.98	2.39
LDE-5	3.26	3.40	1.99	2.45
LDE-5^N	3.07	3.37	2.02	2.41
LDE-6	3.25	3.33	1.95	2.43
LDE-6^N	3.29	3.23	1.94	2.39
LDE-7	3.03	3.18	1.86	2.28
LDE-7^N	3.02	3.24	2.02	2.42

Credits

Base code written by Nanxin Chen, Johns Hopkins University
Experiments done by Cheng-I Lai, MIT

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 99

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (2) 🔗