Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → qianhwan → KaldiBasedSpeakerVerification

qianhwan / KaldiBasedSpeakerVerification

Licence: other

Kaldi based speaker verification

Programming Languages

C++

36643 projects - #6 most used programming language

shell

77523 projects

Makefile

30231 projects

Labels

kaldi speaker-recognition speaker-verification speaker-identification

Projects that are alternatives of or similar to KaldiBasedSpeakerVerification

dropclass speaker

DropClass and DropAdapt - repository for the paper accepted to Speaker Odyssey 2020

Stars: ✭ 20 (-53.49%)

Mutual labels: kaldi, speaker-recognition, speaker-verification, speaker-identification

Speaker-Recognition

This repo contains my attempt to create a Speaker Recognition and Verification system using SideKit-1.3.1

Stars: ✭ 94 (+118.6%)

Mutual labels: speaker-recognition, speaker-verification, speaker-identification

kaldi-timit-sre-ivector

Develop speaker recognition model based on i-vector using TIMIT database

Stars: ✭ 17 (-60.47%)

Mutual labels: kaldi, speaker-recognition, speaker-verification

wavenet-classifier

Keras Implementation of Deepmind's WaveNet for Supervised Learning Tasks

Stars: ✭ 54 (+25.58%)

Mutual labels: speaker-recognition, speaker-verification, speaker-identification

Huawei-Challenge-Speaker-Identification

Trained speaker embedding deep learning models and evaluation pipelines in pytorch and tesorflow for speaker recognition.

Stars: ✭ 34 (-20.93%)

Mutual labels: speaker-recognition, speaker-verification, speaker-identification

GE2E-Loss

Pytorch implementation of Generalized End-to-End Loss for speaker verification

Stars: ✭ 72 (+67.44%)

Mutual labels: speaker-recognition, speaker-verification, speaker-identification

meta-SR

Pytorch implementation of Meta-Learning for Short Utterance Speaker Recognition with Imbalance Length Pairs (Interspeech, 2020)

Stars: ✭ 58 (+34.88%)

Mutual labels: speaker-recognition, speaker-verification

speakerIdentificationNeuralNetworks

⇨ The Speaker Recognition System consists of two phases, Feature Extraction and Recognition. ⇨ In the Extraction phase, the Speaker's voice is recorded and typical number of features are extracted to form a model. ⇨ During the Recognition phase, a speech sample is compared against a previously created voice print stored in the database. ⇨ The hi…

Stars: ✭ 26 (-39.53%)

Mutual labels: speaker-recognition, speaker-identification

deepaudio-speaker

neural network based speaker embedder

Stars: ✭ 19 (-55.81%)

Mutual labels: speaker-recognition, speaker-verification

Speaker-Identification

A program for automatic speaker identification using deep learning techniques.

Stars: ✭ 84 (+95.35%)

Mutual labels: speaker-recognition, speaker-verification

Kaldi

kaldi-asr/kaldi is the official location of the Kaldi project.

Stars: ✭ 11,151 (+25832.56%)

Mutual labels: kaldi, speaker-verification

D-TDNN

PyTorch implementation of Densely Connected Time Delay Neural Network

Stars: ✭ 60 (+39.53%)

Mutual labels: speaker-recognition, speaker-verification

Voice-ML

MobileNet trained with VoxCeleb dataset and used for voice verification

Stars: ✭ 15 (-65.12%)

Mutual labels: speaker-verification, speaker-identification

bob

Bob is a free signal-processing and machine learning toolbox originally developed by the Biometrics group at Idiap Research Institute, in Switzerland. - Mirrored from https://gitlab.idiap.ch/bob/bob

Stars: ✭ 38 (-11.63%)

Mutual labels: speaker-recognition, speaker-verification

speaker-recognition-papers

Share some recent speaker recognition papers and their implementations.

Stars: ✭ 92 (+113.95%)

Mutual labels: speaker-recognition, speaker-verification

kaldi ag training

Docker image and scripts for training finetuned or completely personal Kaldi speech models. Particularly for use with kaldi-active-grammar.

Stars: ✭ 14 (-67.44%)

Mutual labels: kaldi

kaldi-long-audio-alignment

Long audio alignment using Kaldi

Stars: ✭ 21 (-51.16%)

Mutual labels: kaldi

2018-dlsl

UPC Deep Learning for Speech and Language 2018

Stars: ✭ 18 (-58.14%)

Mutual labels: speaker-identification

Deep-learning-And-Paper

【仅作为交流学习使用】机器智能--相关书目及经典论文包括AutoML、情感分类、语音识别、声纹识别、语音合成实验代码等

Stars: ✭ 62 (+44.19%)

Mutual labels: speaker-verification

speaker extraction

target speaker extraction and verification for multi-talker speech

Stars: ✭ 85 (+97.67%)

Mutual labels: speaker-verification

View All Similar Projects ➔

KaldiBasedSpeakerVerification
========================================
Author: Qianhui Wan
Version: 1.0.0
Date : 2018-01-23

Prerequisite
------------
1. Kaldi 5.3, as well as Altas and OpenFst required by Kaldi.
https://github.com/kaldi-asr/kaldi

2. libfvad, Voice activity detection (VAD) library, based on WebRTC's VAD engine.
https://github.com/dpirch/libfvad

Installation
------------
1. Install Kaldi 5.3:
git clone https://github.com/kaldi-asr/kaldi.git kaldi --origin upstream
cd kaldi

2. Install Kaldi's required libraries:
cd to /kaldi/tools and follow INSTALL instructions there.

3. Compile and finish Kaldi install:
cd to /kaldi/src and follow INSTALL instructions there.

4. Install libfvad:
git clone https://github.com/dpirch/libfvad
cd libfvad
./bootstrap
./configure
make
make install (perhaps sudo at this command)

5. Install KaldiBasedSpeakerVerification

cd KaldiBasedSpeakerVerification/src
*edit makefile; provide the correct locations for this project and the libraries.
make
(This will output 3 executables under /src: enroll, identifySpeaker and extractFeatures)

Project file structure (under KaldiBasedSpeakerVerification folder)
----------------------------------
/examples
contains enroll and test examples, along with example data

/examples/iv
contains i-vector features extracted from enrollment.(this can be empty before enrolling speakers, must have 2 files before testing)

/examples/mat
contains background model data, must have six files.

/scripts
contains scripts mainly used to create background model.

/src
contains code for 3 applications: creating a background model, enrolling speakers and speaker identification.

Main applications
-------------------------------------------------
/src/enroll.cpp
This program is used to extract speech features from one speaker.
Usage: enroll speakerId wavefile
The output should look like:
Not registered speaker: speakerId. Created a new spkid
or
Found registered speaker: speakerId. Updated speaker model

The wavefile should be in .wav format.

This will create/update two files in /iv: train_iv.ark and train_num_utts.ark.

/src/identifySpeaker.cpp
This program process a given audio clip and output person identification every ~3.2 seconds.
Usage: identifySpeaker wavefile
The output should look like:
Family membmer detected! Speaker: 225
Family membmer detected! Speaker: 225
Stanger detected!
Family membmer detected! Speaker: 227
Family membmer detected! Speaker: 227
...

It will also output the probability score for each segments -> this could be used to adjust the decision threshold due to different audio condition.

Examples
-------------------------------------------------
After installing all required applications, you can run the following examples to test if your installation is right.

1. make sure there is three folder in /examples
/example_data
/iv
/mat (due to the file size limit of GitHub, final.ie was zipped into several parts. To unzip, do: cat iepart* -> final.ie)

2. run ./test1Enroll.sh
This will enroll all speech files in /example_data/enroll
The output should look like:

The total active speech is 1.61 seconds.
No registered speaker: 174. Create a new spkid
Done.
The total active speech is 15 seconds.
Found registered speaker: 174. Update speaker model
Done.
The total active speech is 0.88 seconds.
No registered speaker: 84. Create a new spkid
Done.
The total active speech is 3.47 seconds.
Found registered speaker: 84. Update speaker model
Done.

3. run ./test1Test.sh
This will test speech /example_data/test/84/84-121550-0030.wav against all registered speaker
The output should look like:

Effective speech length: 2.605s.No family member detected. (score: 4.97931)
Effective speech length: 5.685s.Family member detected! Speaker: 84 (score: 33.7779)
Speech data is finished!
Done.

*Note:
There will also be outputs of kaldi log which look like:
LOG ([5.3.96~1-7ee7]:ComputeDerivedVars():ivector-extractor.cc:183) Computing derived variables for iVector extractor
LOG ([5.3.96~1-7ee7]:ComputeDerivedVars():ivector-extractor.cc:204) Done.

This tells you one audio segment has been processed and can be omitted by setting kaldi verbose level.

Background Model Training
-------------------------------------
/src/extractFeatures
The program extracts 20-dim MFCC (with energy), append deltas and double deltas, and apply CMVN
Usage: extractFeatures wav.scp ark,scp:feat.ark,feat.scp
Input: wav.scp, a text list of speech file name and path
Output: feat.ark, feat.scp -> same as kaldi.

/scripts/data_prep.sh
usage: data_prep.sh path_to_speech path_to_info
prepare useful text file for later process, please refer to data_prep.sh for details

/scripts/utt2spk_to_spk2utt.pl
usage: utt2spk_to_spk2utt.pl utt2spk > spk2utt
create the spk2utt file with given utt2spk file

/scripts/train_ubm.sh
usage: train_ubm.sh path_to_feat path_to_mat
output: final.dubm, final.ubm
please refer to train_ubm.sh for details

/scripts/train_ivextractor.sh
usage: train_ivextractor.sh path_to_feat path_to_mat
output: final.ie
please refer to train_ivextractor.sh for details

/scripts/train_comp_plda.sh
usage: train_comp_plda.sh path_to_feat path_to_mat
output: final.plda, transform.mat, mean_vec
please refer to train_comp_plda.sh for details

The following folders will be created during running:
/dev_data
contains development dataset speech information, MFCC features and i-vectors

/mat
contains all trained models:
final.dubm, final.ubm, final.ie, final.plda, transform.mat, mean_vec

Note: The whole process can take several hours (e.g. 5 to 6 hours from VirtualBox-run CentOS version).
Note: All scripts need to modified manually for the path (same as examples), this can be avoided if you add all paths to environmental variables.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 43

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (3) 🔗