All Projects → deepaudio → deepaudio-speaker

deepaudio / deepaudio-speaker

Licence: other
neural network based speaker embedder

Programming Languages

python
139335 projects - #7 most used programming language
shell
77523 projects

Projects that are alternatives of or similar to deepaudio-speaker

GE2E-Loss
Pytorch implementation of Generalized End-to-End Loss for speaker verification
Stars: ✭ 72 (+278.95%)
Mutual labels:  speaker-recognition, speaker-verification, speaker-diarization
D-TDNN
PyTorch implementation of Densely Connected Time Delay Neural Network
Stars: ✭ 60 (+215.79%)
Mutual labels:  speaker-recognition, speaker-verification, speaker-diarization
Speaker-Recognition
This repo contains my attempt to create a Speaker Recognition and Verification system using SideKit-1.3.1
Stars: ✭ 94 (+394.74%)
Mutual labels:  speaker-recognition, speaker-verification
meta-SR
Pytorch implementation of Meta-Learning for Short Utterance Speaker Recognition with Imbalance Length Pairs (Interspeech, 2020)
Stars: ✭ 58 (+205.26%)
Mutual labels:  speaker-recognition, speaker-verification
bob
Bob is a free signal-processing and machine learning toolbox originally developed by the Biometrics group at Idiap Research Institute, in Switzerland. - Mirrored from https://gitlab.idiap.ch/bob/bob
Stars: ✭ 38 (+100%)
Mutual labels:  speaker-recognition, speaker-verification
Huawei-Challenge-Speaker-Identification
Trained speaker embedding deep learning models and evaluation pipelines in pytorch and tesorflow for speaker recognition.
Stars: ✭ 34 (+78.95%)
Mutual labels:  speaker-recognition, speaker-verification
speaker-recognition-papers
Share some recent speaker recognition papers and their implementations.
Stars: ✭ 92 (+384.21%)
Mutual labels:  speaker-recognition, speaker-verification
kaldi-timit-sre-ivector
Develop speaker recognition model based on i-vector using TIMIT database
Stars: ✭ 17 (-10.53%)
Mutual labels:  speaker-recognition, speaker-verification
MiniVox
Code for our ACML and INTERSPEECH papers: "Speaker Diarization as a Fully Online Bandit Learning Problem in MiniVox".
Stars: ✭ 15 (-21.05%)
Mutual labels:  speaker-recognition, speaker-diarization
lightning-hydra-template
PyTorch Lightning + Hydra. A very user-friendly template for rapid and reproducible ML experimentation with best practices. ⚡🔥⚡
Stars: ✭ 1,905 (+9926.32%)
Mutual labels:  hydra, pytorch-lightning
dropclass speaker
DropClass and DropAdapt - repository for the paper accepted to Speaker Odyssey 2020
Stars: ✭ 20 (+5.26%)
Mutual labels:  speaker-recognition, speaker-verification
lightning-transformers
Flexible components pairing 🤗 Transformers with Pytorch Lightning
Stars: ✭ 551 (+2800%)
Mutual labels:  hydra, pytorch-lightning
wavenet-classifier
Keras Implementation of Deepmind's WaveNet for Supervised Learning Tasks
Stars: ✭ 54 (+184.21%)
Mutual labels:  speaker-recognition, speaker-verification
KaldiBasedSpeakerVerification
Kaldi based speaker verification
Stars: ✭ 43 (+126.32%)
Mutual labels:  speaker-recognition, speaker-verification
Speaker-Identification
A program for automatic speaker identification using deep learning techniques.
Stars: ✭ 84 (+342.11%)
Mutual labels:  speaker-recognition, speaker-verification
lightning-asr
Modular and extensible speech recognition library leveraging pytorch-lightning and hydra.
Stars: ✭ 36 (+89.47%)
Mutual labels:  hydra, pytorch-lightning
pytorch tempest
My repo for training neural nets using pytorch-lightning and hydra
Stars: ✭ 124 (+552.63%)
Mutual labels:  hydra, pytorch-lightning
Hdcycles
Cycles Hydra Delegate
Stars: ✭ 197 (+936.84%)
Mutual labels:  hydra
Fast-AgingGAN
A deep learning model to age faces in the wild, currently runs at 60+ fps on GPUs
Stars: ✭ 133 (+600%)
Mutual labels:  pytorch-lightning
Rlcycle
A library for ready-made reinforcement learning agents and reusable components for neat prototyping
Stars: ✭ 184 (+868.42%)
Mutual labels:  hydra

Content

What is deepaudio-speaker?

Deepaudio-speaker is a framework for training neural network based speaker embedders. It supports online audio augmentation thanks to torch-audiomentation. It inlcudes or will include popular neural network architectures and losses used for speaker embedder.

To make it easy to use various functions such as mixed-precision, multi-node training, and TPU training etc, I introduced PyTorch-Lighting and Hydra in this framework (just like what pyannote-audio and openspeech do).

Deepaudio-tts is coming soon.

Installation

conda create -n deepaudio python=3.8.5
conda activate deepaudio
conda install numpy cffi
conda install libsndfile=1.0.28 -c conda-forge
git clone https://github.com/deepaudio/deepaudio-speaker.git
cd deepaudio-speaker
pip install -e .

Get Started

Supported Datasets

####Voxceleb2

/path/to/voxceleb/voxceleb1/dev/wav/id10001/1zcIwhmdeo4/00001.wav
/path/to/voxceleb/voxceleb1/test/wav/id10270/5r0dWxy17C8/00001.wav
/path/to/voxceleb/voxceleb2/dev/aac/id00012/21Uxsk56VDQ/00001.m4a
/path/to/voxceleb/voxceleb2/test/aac/id00017/01dfn2spqyE/00001.m4a

Training examples

  • Example1: Train the ecapa-tdnn model with fbank features on GPU.
$ deepaudio-speaker-train  \
    dataset=voxceleb2 \
    dataset.dataset_path=/your/path/to/voxceleb2/dev/wav/ \
    model=clovaai_ecapa \
    model.channels=1024 \
    feature=fbank \
    lr_scheduler=reduce_lr_on_plateau \
    trainer=gpu \
    criterion=pyannote_aamsoftmax
  • Example2: Train ecapa model to get eer around 1.13% for voxceleb 1 trials ( original version, without norm operation).
$ git clone https://github.com/deepaudio/deepaudio-database.git
$ cd deepaudio-database
$ vim database.yml # edit the list path and wav path
$ deepaudio-speaker-train  \
    dataset=dataframe \
    dataset.database_yml=/your/path/to/deepaudio-database/database.yml \
    dataset.dataset_name=voxceleb2_dev \
    model=clovaai_ecapa \
    model.channels=1024 \
    model.embed_dim=256 \
    model.min_num_frames=200 \
    model.max_num_frames=300 \
    feature=fbank \
    lr_scheduler=warmup_adaptive_reduce_lr_on_plateau \
    lr_scheduler.warmup_steps=30000 \
    lr_scheduler.lr_factor=0.8 \
    trainer=gpu \
    trainer.batch_size=128 \
    trainer.max_epochs=30 \
    trainer.num_checkpoints=30 \
    criterion=adaptive_aamsoftmax \
    criterion.increase_steps=300000 \
    augment.apply_spec_augment=True\
    augment.time_mask_num=1 \
    augment.apply_noise_augment=True \
    augment.apply_reverb_augment=True \
    augment.apply_noise_reverb_augment=True \
    augment.noise_augment_weight=2 \
    augment.noise_dataset_dir=/your/path/to/musan \
    augment.rir_dataset_dir=/your/path/to/RIRS_NOISES/simulated_rirs/ \
  • Example3: Compute the equal error rate (EER)
from deepaudio.speaker.datasets.dataframe.utils import load_trial_dataframe, get_dataset_items
from deepaudio.speaker.models.inference import Inference
from deepaudio.speaker.metrics.eer import model_eer

trial_meta = get_dataset_items('/your/path/to/deepaudio-database/database.yml',
                               'voxceleb1_o', 'trial')
wav_dir, trial_path = trial_meta[0]
trials = load_trial_dataframe(wav_dir, trial_path)
inference = Inference('/your/path/to/checkpoint.ckpt')
eer, thresh = model_eer(inference, trials)
  • Example4: Export torchscript model
from deepaudio.speaker.models.inference import Inference
model = Inference('/your/path/to/checkpoint.ckpt').model
model.to_torchscript('filepath/to/model')

Model Architecture

ECAPA-TDNN This is an unofficial implementation from @lawlict. Please find more details in this link.

ECAPA-TDNN This is implemented by @joonson. Please find more details in this link.

ResNetSE34L This is borrowed from voxceleb trainer.

ResNetSE34V2 This is borrowed from voxceleb trainer.

Resnet101 This is proposed by BUT for speaker diarization. Please note that the feature used in this framework is different from VB-HMM

How to contribute to deepaudio-speaker

It is a personal project. So I don't have enough gpu resources to do a lot of experiments. I appreciate any kind of feedback or contributions. Please feel free to make a pull requsest for some small issues like bug fixes, experiment results. If you have any questions, please open an issue.

Acknowledge

I borrow a lot of codes from openspeech and pyannote-audio

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].