All Projects → Aurora11111 → speaker-recognition-pytorch

Aurora11111 / speaker-recognition-pytorch

Licence: BSD-3-Clause license
Speaker recognition ,Voiceprint recognition

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to speaker-recognition-pytorch

Speaker-Identification
A program for automatic speaker identification using deep learning techniques.
Stars: ✭ 84 (+71.43%)
Mutual labels:  speaker-recognition
Speaker-Recognition
This repo contains my attempt to create a Speaker Recognition and Verification system using SideKit-1.3.1
Stars: ✭ 94 (+91.84%)
Mutual labels:  speaker-recognition
Huawei-Challenge-Speaker-Identification
Trained speaker embedding deep learning models and evaluation pipelines in pytorch and tesorflow for speaker recognition.
Stars: ✭ 34 (-30.61%)
Mutual labels:  speaker-recognition
VoiceprintRecognition-Keras
基于Kersa实现的声纹识别模型
Stars: ✭ 70 (+42.86%)
Mutual labels:  speaker-recognition
AutoSpeech
[InterSpeech 2020] "AutoSpeech: Neural Architecture Search for Speaker Recognition" by Shaojin Ding*, Tianlong Chen*, Xinyu Gong, Weiwei Zha, Zhangyang Wang
Stars: ✭ 195 (+297.96%)
Mutual labels:  speaker-recognition
Piwho
Speaker recognition library based on MARF for raspberry pi and other SBCs.
Stars: ✭ 50 (+2.04%)
Mutual labels:  speaker-recognition
FreeSR
A Free Library for Speaker Recognition (Verification),implemented by ncnn.
Stars: ✭ 21 (-57.14%)
Mutual labels:  speaker-recognition
D-TDNN
PyTorch implementation of Densely Connected Time Delay Neural Network
Stars: ✭ 60 (+22.45%)
Mutual labels:  speaker-recognition
bob
Bob is a free signal-processing and machine learning toolbox originally developed by the Biometrics group at Idiap Research Institute, in Switzerland. - Mirrored from https://gitlab.idiap.ch/bob/bob
Stars: ✭ 38 (-22.45%)
Mutual labels:  speaker-recognition
wavenet-classifier
Keras Implementation of Deepmind's WaveNet for Supervised Learning Tasks
Stars: ✭ 54 (+10.2%)
Mutual labels:  speaker-recognition
VoiceprintRecognition-PaddlePaddle
使用PaddlePaddle实现声纹识别
Stars: ✭ 57 (+16.33%)
Mutual labels:  speaker-recognition
dropclass speaker
DropClass and DropAdapt - repository for the paper accepted to Speaker Odyssey 2020
Stars: ✭ 20 (-59.18%)
Mutual labels:  speaker-recognition
deepaudio-speaker
neural network based speaker embedder
Stars: ✭ 19 (-61.22%)
Mutual labels:  speaker-recognition
speaker recognition
speaker recognition using keras
Stars: ✭ 34 (-30.61%)
Mutual labels:  speaker-recognition
meta-embeddings
Meta-embeddings are a probabilistic generalization of embeddings in machine learning.
Stars: ✭ 22 (-55.1%)
Mutual labels:  speaker-recognition
kaldi-timit-sre-ivector
Develop speaker recognition model based on i-vector using TIMIT database
Stars: ✭ 17 (-65.31%)
Mutual labels:  speaker-recognition
GE2E-Loss
Pytorch implementation of Generalized End-to-End Loss for speaker verification
Stars: ✭ 72 (+46.94%)
Mutual labels:  speaker-recognition
speaker-recognition-papers
Share some recent speaker recognition papers and their implementations.
Stars: ✭ 92 (+87.76%)
Mutual labels:  speaker-recognition
AESRC2020
a deep accent recognition network
Stars: ✭ 35 (-28.57%)
Mutual labels:  speaker-recognition
Voiceprint-recognition-Speaker-recognition
It is a complete project of voiceprint recognition or speaker recognition.
Stars: ✭ 82 (+67.35%)
Mutual labels:  speaker-recognition

speaker recognition

PyTorch implementation of speech embedding net and loss described here: https://arxiv.org/pdf/1710.10467.pdf.

Also contains code to create embeddings compatible as input for the speaker diarization model found at https://github.com/google/uis-rnn

training loss

The TIMIT speech corpus was used to train the model, found here: https://catalog.ldc.upenn.edu/LDC93S1, or here, https://github.com/philipperemy/timit

Dependencies

  • PyTorch 0.4.1
  • python 3.5+
  • numpy 1.15.4
  • librosa 0.6.1

The python WebRTC VAD found at https://github.com/wiseman/py-webrtcvad is required to create run dvector_create.py, but not to train the neural network.

Preprocessing

Change the following config.yaml key to a regex containing all .WAV files in your downloaded TIMIT dataset. The TIMIT .WAV files must be converted to the standard format (RIFF) for the dvector_create.py script, but not for training the neural network.

unprocessed_data: './TIMIT/*/*/*/*.wav'

Run the preprocessing script:

./data_preprocess.py 

Two folders will be created, train_tisv and test_tisv, containing .npy files containing numpy ndarrays of speaker utterances with a 90%/10% training/testing split.

GE2E-loss model training

To train the speaker verification model, run:

./train_speech_embedder.py 

with the following config.yaml key set to true:

training: !!bool "true"

for testing, set the key value to:

training: !!bool "false"

The log file and checkpoint save locations are controlled by the following values:

log_file: './speech_id_checkpoint/Stats'
checkpoint_dir: './speech_id_checkpoint'

Only TI-SV is implemented.

Performance

EER across 10 epochs: 0.0377

D vector embedding creation

After training and testing the model, run dvector.py to create the data.pkl

The file can be loaded and used to train the triple-loss model.

triplet-loss model training

After create dvector,we use the triplet loss to train a model which are discribed here: https://arxiv.org/pdf/1705.02304.pdf run train.py

Reference

When reference speakers,run cli.py

https://github.com/HarryVolek/PyTorch_Speaker_Verification

https://github.com/philipperemy/deep-speaker

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].