All Projects → yuyq96 → D-TDNN

yuyq96 / D-TDNN

Licence: other
PyTorch implementation of Densely Connected Time Delay Neural Network

Programming Languages

python
139335 projects - #7 most used programming language
shell
77523 projects

Projects that are alternatives of or similar to D-TDNN

dropclass speaker
DropClass and DropAdapt - repository for the paper accepted to Speaker Odyssey 2020
Stars: ✭ 20 (-66.67%)
Mutual labels:  speaker-recognition, speaker-verification, speaker-embedding, speaker-adaptation
Huawei-Challenge-Speaker-Identification
Trained speaker embedding deep learning models and evaluation pipelines in pytorch and tesorflow for speaker recognition.
Stars: ✭ 34 (-43.33%)
Mutual labels:  speaker-recognition, speaker-verification, speaker-embedding
GE2E-Loss
Pytorch implementation of Generalized End-to-End Loss for speaker verification
Stars: ✭ 72 (+20%)
Mutual labels:  speaker-recognition, speaker-verification, speaker-diarization
deepaudio-speaker
neural network based speaker embedder
Stars: ✭ 19 (-68.33%)
Mutual labels:  speaker-recognition, speaker-verification, speaker-diarization
kaldi-timit-sre-ivector
Develop speaker recognition model based on i-vector using TIMIT database
Stars: ✭ 17 (-71.67%)
Mutual labels:  speaker-recognition, speaker-verification
MiniVox
Code for our ACML and INTERSPEECH papers: "Speaker Diarization as a Fully Online Bandit Learning Problem in MiniVox".
Stars: ✭ 15 (-75%)
Mutual labels:  speaker-recognition, speaker-diarization
Speaker-Identification
A program for automatic speaker identification using deep learning techniques.
Stars: ✭ 84 (+40%)
Mutual labels:  speaker-recognition, speaker-verification
minutes
🔭 Speaker diarization via transfer learning
Stars: ✭ 25 (-58.33%)
Mutual labels:  speech, speaker-diarization
StreamingSpeakerDiarization
Official open source implementation of the paper "Overlap-aware low-latency online speaker diarization based on end-to-end local segmentation"
Stars: ✭ 79 (+31.67%)
Mutual labels:  speaker-diarization, speaker-embedding
UniSpeech
UniSpeech - Large Scale Self-Supervised Learning for Speech
Stars: ✭ 224 (+273.33%)
Mutual labels:  speech, speaker-verification
Delta
DELTA is a deep learning based natural language and speech processing platform.
Stars: ✭ 1,479 (+2365%)
Mutual labels:  speech, speaker-verification
meta-SR
Pytorch implementation of Meta-Learning for Short Utterance Speaker Recognition with Imbalance Length Pairs (Interspeech, 2020)
Stars: ✭ 58 (-3.33%)
Mutual labels:  speaker-recognition, speaker-verification
KaldiBasedSpeakerVerification
Kaldi based speaker verification
Stars: ✭ 43 (-28.33%)
Mutual labels:  speaker-recognition, speaker-verification
speaker-recognition-papers
Share some recent speaker recognition papers and their implementations.
Stars: ✭ 92 (+53.33%)
Mutual labels:  speaker-recognition, speaker-verification
bob
Bob is a free signal-processing and machine learning toolbox originally developed by the Biometrics group at Idiap Research Institute, in Switzerland. - Mirrored from https://gitlab.idiap.ch/bob/bob
Stars: ✭ 38 (-36.67%)
Mutual labels:  speaker-recognition, speaker-verification
Speaker-Recognition
This repo contains my attempt to create a Speaker Recognition and Verification system using SideKit-1.3.1
Stars: ✭ 94 (+56.67%)
Mutual labels:  speaker-recognition, speaker-verification
LIUM
Scripts for LIUM SpkDiarization tools
Stars: ✭ 28 (-53.33%)
Mutual labels:  speech, speaker-diarization
Kaldi
kaldi-asr/kaldi is the official location of the Kaldi project.
Stars: ✭ 11,151 (+18485%)
Mutual labels:  speech, speaker-verification
wavenet-classifier
Keras Implementation of Deepmind's WaveNet for Supervised Learning Tasks
Stars: ✭ 54 (-10%)
Mutual labels:  speaker-recognition, speaker-verification
kaldi ag training
Docker image and scripts for training finetuned or completely personal Kaldi speech models. Particularly for use with kaldi-active-grammar.
Stars: ✭ 14 (-76.67%)
Mutual labels:  speech

Densely Connected Time Delay Neural Network

PyTorch implementation of Densely Connected Time Delay Neural Network (D-TDNN) in our paper "Densely Connected Time Delay Neural Network for Speaker Verification" (INTERSPEECH 2020).

News

  • [2021-09-05] TimeDelay is replaced by Conv1d by default, since convolution is better optimized in all kinds of deep learning frameworks (Note: The pretrained models are directly converted from the old ones so that the results might be slightly different from those in the paper).

  • [2021-08-28] D-TDNN and D-TDNN-SS outperform SOTA system on the AP20-OLR-dialect-task of oriental language recognition (OLR) challenge 2020 (WeChat artical / paper), showing their potential on other speech processing tasks.

  • [2021-02-01] The following paper is accepted by ICASSP 2021.

    Y.-Q. Yu, S. Zheng, H. Suo, Y. Lei, and W.-J. Li, "CAM: Context-Aware Masking for Robust Speaker Verification"

    • D-TDNN + CAM (w/o data augmentation, 4M params)

      VoxCeleb1-E VoxCeleb1-H
      EER 1.183 2.152
      DCF_0.01 0.1257 0.1966
      DCF_0.001 0.2405 0.3106

Pretrained Models

We provide the pretrained models which can be used in many tasks such as:

  • Speaker Verification
  • Speaker-Dependent Speech Separation
  • Multi-Speaker Text-to-Speech
  • Voice Conversion

D-TDNN & D-TDNN-SS

Usage

Data preparation

You can either use Kaldi toolkit:

  • Download VoxCeleb1 test set and unzip it.
  • Place prepare_voxceleb1_test.sh under $kaldi_root/egs/voxceleb/v2 and change the $datadir and $voxceleb1_root in it.
  • Run chmod +x prepare_voxceleb1_test.sh && ./prepare_voxceleb1_test.sh to generate 30-dim MFCCs.
  • Place the trials under $datadir/test_no_sil.

Or checkout the kaldifeat branch if you do not want to install Kaldi.

Test

  • Download the pretrained D-TDNN model and run:
python evaluate.py --root $datadir/test_no_sil --model D-TDNN --checkpoint dtdnn.pth --device cuda

Evaluation

VoxCeleb1-O

Model Emb. Params (M) Loss Backend EER (%) DCF_0.01 DCF_0.001
TDNN 512 4.2 Softmax PLDA 2.34 0.28 0.38
E-TDNN 512 6.1 Softmax PLDA 2.08 0.26 0.41
F-TDNN 512 12.4 Softmax PLDA 1.89 0.21 0.29
D-TDNN 512 2.8 Softmax Cosine 1.81 0.20 0.28
D-TDNN-SS (0) 512 3.0 Softmax Cosine 1.55 0.20 0.30
D-TDNN-SS 512 3.5 Softmax Cosine 1.41 0.19 0.24
D-TDNN-SS 128 3.1 AAM-Softmax Cosine 1.22 0.13 0.20

Citation

If you find D-TDNN helps your research, please cite

@inproceedings{DBLP:conf/interspeech/YuL20,
  author    = {Ya-Qi Yu and
               Wu-Jun Li},
  title     = {Densely Connected Time Delay Neural Network for Speaker Verification},
  booktitle = {Annual Conference of the International Speech Communication Association (INTERSPEECH)},
  pages     = {921--925},
  year      = {2020}
}

Revision of the Paper

References:

[16] X. Li, W. Wang, X. Hu, and J. Yang, "Selective Kernel Networks," in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019, pp. 510-519.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].