All Projects → zacharyclam → speaker_recognition

zacharyclam / speaker_recognition

Licence: Apache-2.0 License
speaker recognition using keras

Programming Languages

python
139335 projects - #7 most used programming language
shell
77523 projects

Projects that are alternatives of or similar to speaker recognition

Speaker-Recognition
This repo contains my attempt to create a Speaker Recognition and Verification system using SideKit-1.3.1
Stars: ✭ 94 (+176.47%)
Mutual labels:  speaker-recognition
AESRC2020
a deep accent recognition network
Stars: ✭ 35 (+2.94%)
Mutual labels:  speaker-recognition
meta-SR
Pytorch implementation of Meta-Learning for Short Utterance Speaker Recognition with Imbalance Length Pairs (Interspeech, 2020)
Stars: ✭ 58 (+70.59%)
Mutual labels:  speaker-recognition
Piwho
Speaker recognition library based on MARF for raspberry pi and other SBCs.
Stars: ✭ 50 (+47.06%)
Mutual labels:  speaker-recognition
Huawei-Challenge-Speaker-Identification
Trained speaker embedding deep learning models and evaluation pipelines in pytorch and tesorflow for speaker recognition.
Stars: ✭ 34 (+0%)
Mutual labels:  speaker-recognition
speaker-recognition-papers
Share some recent speaker recognition papers and their implementations.
Stars: ✭ 92 (+170.59%)
Mutual labels:  speaker-recognition
AutoSpeech
[InterSpeech 2020] "AutoSpeech: Neural Architecture Search for Speaker Recognition" by Shaojin Ding*, Tianlong Chen*, Xinyu Gong, Weiwei Zha, Zhangyang Wang
Stars: ✭ 195 (+473.53%)
Mutual labels:  speaker-recognition
kaldi-timit-sre-ivector
Develop speaker recognition model based on i-vector using TIMIT database
Stars: ✭ 17 (-50%)
Mutual labels:  speaker-recognition
meta-embeddings
Meta-embeddings are a probabilistic generalization of embeddings in machine learning.
Stars: ✭ 22 (-35.29%)
Mutual labels:  speaker-recognition
KaldiBasedSpeakerVerification
Kaldi based speaker verification
Stars: ✭ 43 (+26.47%)
Mutual labels:  speaker-recognition
deepaudio-speaker
neural network based speaker embedder
Stars: ✭ 19 (-44.12%)
Mutual labels:  speaker-recognition
wavenet-classifier
Keras Implementation of Deepmind's WaveNet for Supervised Learning Tasks
Stars: ✭ 54 (+58.82%)
Mutual labels:  speaker-recognition
speaker-recognition-pytorch
Speaker recognition ,Voiceprint recognition
Stars: ✭ 49 (+44.12%)
Mutual labels:  speaker-recognition
GE2E-Loss
Pytorch implementation of Generalized End-to-End Loss for speaker verification
Stars: ✭ 72 (+111.76%)
Mutual labels:  speaker-recognition
MiniVox
Code for our ACML and INTERSPEECH papers: "Speaker Diarization as a Fully Online Bandit Learning Problem in MiniVox".
Stars: ✭ 15 (-55.88%)
Mutual labels:  speaker-recognition
bob
Bob is a free signal-processing and machine learning toolbox originally developed by the Biometrics group at Idiap Research Institute, in Switzerland. - Mirrored from https://gitlab.idiap.ch/bob/bob
Stars: ✭ 38 (+11.76%)
Mutual labels:  speaker-recognition
D-TDNN
PyTorch implementation of Densely Connected Time Delay Neural Network
Stars: ✭ 60 (+76.47%)
Mutual labels:  speaker-recognition
Speaker-Identification
A program for automatic speaker identification using deep learning techniques.
Stars: ✭ 84 (+147.06%)
Mutual labels:  speaker-recognition
FreeSR
A Free Library for Speaker Recognition (Verification),implemented by ncnn.
Stars: ✭ 21 (-38.24%)
Mutual labels:  speaker-recognition
VoiceprintRecognition-Pytorch
本项目使用了EcapaTdnn模型实现的声纹识别
Stars: ✭ 140 (+311.76%)
Mutual labels:  speaker-recognition

Speaker Recognition

avatar avatar avatar avatar

​ 使用数据集:AISHELL-ASR0009-OS1 下载

​ 模型结构参考论文 “DEEP NEURAL NETWORKS FOR SMALL FOOTPRINT TEXT-DEPENDENT“,在实现时将论文中所提出的4层DNN结构的前两层替换为两层一维卷积,训练时通过Softmax分类器进行训练,注册及验证时将Softmax层去掉,DNN的输出作为d-vector,通过计算 cosine-distance 来判别说话人是否在注册集内。

  • 项目结构

    - code
      -- 0-input				          # 数据预处理
      -- 1-development			          # 模型定义及训练
      -- 2-enrollment			          # 注册
      -- 3-evalution			          # 陌生人验证评估
      -- 4-roc_curve			          # 绘制ROC曲线图,并计算EER及阈值
      -- utils				    
    - data					           # 数据存放
    - docs					           # 参考论文
    - logs					           # tensorboard 日志文件
    - model						   # 模型存储文件
    - results					
      -- features				           # 根据模型计算出注册人及陌生人的d-vector
      -- plots					   # 绘制完成的ROC曲线图
      -- scores					   # 绘制ROC曲线所需的score
    
  • 训练

    • 首先对下载好的数据集进行VAD处理,处理代码位于 code/0-input/vad.py

      usage:

      python vad.py --save_dir="../../data/vad_data"  --data_dir="解压之后的数据集路径" \
      --category="要处理的数据类别,eg:train,test,dev" 
    • 将vad处理后的数据提取 log fbank 特征,该过程使用 python_speech_features 库完成

      usage:

      python process_data.py --data_dir="../../data/vad_data"  --save_dir="提取log fbank后 bin文件保存路径" \
      --category="要处理的数据类别" \ 
      --validata_scale="若处理训练集数据,该参数可设置为验证集所占比例,eg:0.05, 若处理其他类别数据将其设置为0即可"
    • 将训练集和验证集数据文件路径写入txt中,方便训练时打乱数据送入模型

      usage:

      python get_data_list.py --save_dir="../../data/bin/" --category="validate" # 验证集list
      python get_data_list.py --save_dir="../../data/bin/" --category="train"    # 训练集list
    • 通过执行train.py即可开始训练

      usage:

      python train.py --batch_size=128 --num_epochs=1000 --learn_rate=0.0001
  • 评估模型

    直接运行model_test.sh 脚本即可绘制ROC曲线图并计算EER,该脚本需要模型的路径参数,结果文件会保存至results目录下

    usage:

    model_test.sh "model/checkpoint-00484-0.99.h5"
    
  • 模型参数统计

    Total params: 5,781,524

    Trainable params: 5,781,428

    Non-trainable params: 96

    模型结构图可在 results 目录下查看

  • 实验结果:

    使用340人的语音数据进行训练,训练完成后,使用dev数据集共40人进行注册,将数据分为注册和验证两部分,每人选取15s音频进行注册,然后用100条长度为1s的音频进行验证,统计TP和FP的个数。使用test数据集共20人进行陌生人验证,每人选取100条长度为1s的音频,统计每条音频与注册集内得分最高的cds值。通过上述测试数据绘制ROC曲线图,计算出EER为12.2%,阈值为0.7824.

  • 模型下载:

    百度网盘地址:https://pan.baidu.com/s/1rrVCKEIiqzZ3fTr4sKzr1Q 密码:3gri

    其中 checkpoint-00484-0.99.h5 文件包含Softmax层 ,checkpoint-00484-0.99_notop.h5 已去掉Softmax层

    模型训练时的超参数:batch_size=128 ,learn_rate=0.0001,实验时使用一块 Titan 进行训练,大约5小时训练完成

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].