Alternatives and detailed information of deepaudio-speaker

Bob is a free signal-processing and machine learning toolbox originally developed by the Biometrics group at Idiap Research Institute, in Switzerland. - Mirrored from https://gitlab.idiap.ch/bob/bob

Stars: ✭ 38 (+100%)

Mutual labels: speaker-recognition, speaker-verification

Huawei-Challenge-Speaker-Identification

Trained speaker embedding deep learning models and evaluation pipelines in pytorch and tesorflow for speaker recognition.

Stars: ✭ 34 (+78.95%)

Mutual labels: speaker-recognition, speaker-verification

speaker-recognition-papers

Share some recent speaker recognition papers and their implementations.

Stars: ✭ 92 (+384.21%)

Mutual labels: speaker-recognition, speaker-verification

kaldi-timit-sre-ivector

Develop speaker recognition model based on i-vector using TIMIT database

Stars: ✭ 17 (-10.53%)

Mutual labels: speaker-recognition, speaker-verification

MiniVox

Code for our ACML and INTERSPEECH papers: "Speaker Diarization as a Fully Online Bandit Learning Problem in MiniVox".

Stars: ✭ 15 (-21.05%)

Mutual labels: speaker-recognition, speaker-diarization

lightning-hydra-template

PyTorch Lightning + Hydra. A very user-friendly template for rapid and reproducible ML experimentation with best practices. ⚡🔥⚡

Stars: ✭ 1,905 (+9926.32%)

Mutual labels: hydra, pytorch-lightning

dropclass speaker

DropClass and DropAdapt - repository for the paper accepted to Speaker Odyssey 2020

Stars: ✭ 20 (+5.26%)

Mutual labels: speaker-recognition, speaker-verification

lightning-transformers

Flexible components pairing 🤗 Transformers with Pytorch Lightning

Stars: ✭ 551 (+2800%)

Mutual labels: hydra, pytorch-lightning

wavenet-classifier

Keras Implementation of Deepmind's WaveNet for Supervised Learning Tasks

Stars: ✭ 54 (+184.21%)

Mutual labels: speaker-recognition, speaker-verification

KaldiBasedSpeakerVerification

Kaldi based speaker verification

Stars: ✭ 43 (+126.32%)

Mutual labels: speaker-recognition, speaker-verification

Speaker-Identification

A program for automatic speaker identification using deep learning techniques.

Stars: ✭ 84 (+342.11%)

Mutual labels: speaker-recognition, speaker-verification

lightning-asr

Modular and extensible speech recognition library leveraging pytorch-lightning and hydra.

Stars: ✭ 36 (+89.47%)

Mutual labels: hydra, pytorch-lightning

pytorch tempest

My repo for training neural nets using pytorch-lightning and hydra

Stars: ✭ 124 (+552.63%)

Mutual labels: hydra, pytorch-lightning

Hdcycles

Cycles Hydra Delegate

Stars: ✭ 197 (+936.84%)

Mutual labels: hydra

Fast-AgingGAN

A deep learning model to age faces in the wild, currently runs at 60+ fps on GPUs

Stars: ✭ 133 (+600%)

Mutual labels: pytorch-lightning

Rlcycle

A library for ready-made reinforcement learning agents and reusable components for neat prototyping

Stars: ✭ 184 (+868.42%)

Mutual labels: hydra

View All Similar Projects ➔

Content

What is deepaudio-speaker?

Deepaudio-speaker is a framework for training neural network based speaker embedders. It supports online audio augmentation thanks to torch-audiomentation. It inlcudes or will include popular neural network architectures and losses used for speaker embedder.

To make it easy to use various functions such as mixed-precision, multi-node training, and TPU training etc, I introduced PyTorch-Lighting and Hydra in this framework (just like what pyannote-audio and openspeech do).

Deepaudio-tts is coming soon.

Installation

conda create -n deepaudio python=3.8.5
conda activate deepaudio
conda install numpy cffi
conda install libsndfile=1.0.28 -c conda-forge
git clone https://github.com/deepaudio/deepaudio-speaker.git
cd deepaudio-speaker
pip install -e .

Get Started

Supported Datasets

####Voxceleb2

Download VoxCeleb dataset and follow this script to obtain this kind of directory structure:

/path/to/voxceleb/voxceleb1/dev/wav/id10001/1zcIwhmdeo4/00001.wav
/path/to/voxceleb/voxceleb1/test/wav/id10270/5r0dWxy17C8/00001.wav
/path/to/voxceleb/voxceleb2/dev/aac/id00012/21Uxsk56VDQ/00001.m4a
/path/to/voxceleb/voxceleb2/test/aac/id00017/01dfn2spqyE/00001.m4a

Training examples

Example1: Train the ecapa-tdnn model with fbank features on GPU.

$ deepaudio-speaker-train  \
    dataset=voxceleb2 \
    dataset.dataset_path=/your/path/to/voxceleb2/dev/wav/ \
    model=clovaai_ecapa \
    model.channels=1024 \
    feature=fbank \
    lr_scheduler=reduce_lr_on_plateau \
    trainer=gpu \
    criterion=pyannote_aamsoftmax

Example2: Train ecapa model to get eer around 1.13% for voxceleb 1 trials ( original version, without norm operation).

$ git clone https://github.com/deepaudio/deepaudio-database.git
$ cd deepaudio-database
$ vim database.yml # edit the list path and wav path
$ deepaudio-speaker-train  \
    dataset=dataframe \
    dataset.database_yml=/your/path/to/deepaudio-database/database.yml \
    dataset.dataset_name=voxceleb2_dev \
    model=clovaai_ecapa \
    model.channels=1024 \
    model.embed_dim=256 \
    model.min_num_frames=200 \
    model.max_num_frames=300 \
    feature=fbank \
    lr_scheduler=warmup_adaptive_reduce_lr_on_plateau \
    lr_scheduler.warmup_steps=30000 \
    lr_scheduler.lr_factor=0.8 \
    trainer=gpu \
    trainer.batch_size=128 \
    trainer.max_epochs=30 \
    trainer.num_checkpoints=30 \
    criterion=adaptive_aamsoftmax \
    criterion.increase_steps=300000 \
    augment.apply_spec_augment=True\
    augment.time_mask_num=1 \
    augment.apply_noise_augment=True \
    augment.apply_reverb_augment=True \
    augment.apply_noise_reverb_augment=True \
    augment.noise_augment_weight=2 \
    augment.noise_dataset_dir=/your/path/to/musan \
    augment.rir_dataset_dir=/your/path/to/RIRS_NOISES/simulated_rirs/ \

Example3: Compute the equal error rate (EER)

from deepaudio.speaker.datasets.dataframe.utils import load_trial_dataframe, get_dataset_items
from deepaudio.speaker.models.inference import Inference
from deepaudio.speaker.metrics.eer import model_eer

trial_meta = get_dataset_items('/your/path/to/deepaudio-database/database.yml',
                               'voxceleb1_o', 'trial')
wav_dir, trial_path = trial_meta[0]
trials = load_trial_dataframe(wav_dir, trial_path)
inference = Inference('/your/path/to/checkpoint.ckpt')
eer, thresh = model_eer(inference, trials)

Example4: Export torchscript model

from deepaudio.speaker.models.inference import Inference
model = Inference('/your/path/to/checkpoint.ckpt').model
model.to_torchscript('filepath/to/model')

Model Architecture

ECAPA-TDNN This is an unofficial implementation from @lawlict. Please find more details in this link.

ECAPA-TDNN This is implemented by @joonson. Please find more details in this link.

ResNetSE34L This is borrowed from voxceleb trainer.

ResNetSE34V2 This is borrowed from voxceleb trainer.

Resnet101 This is proposed by BUT for speaker diarization. Please note that the feature used in this framework is different from VB-HMM

How to contribute to deepaudio-speaker

It is a personal project. So I don't have enough gpu resources to do a lot of experiments. I appreciate any kind of feedback or contributions. Please feel free to make a pull requsest for some small issues like bug fixes, experiment results. If you have any questions, please open an issue.

Acknowledge

I borrow a lot of codes from openspeech and pyannote-audio

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

deepaudio / deepaudio-speaker

Programming Languages

Labels

Projects that are alternatives of or similar to deepaudio-speaker

Content

What is deepaudio-speaker?

Installation

Get Started

Supported Datasets

Training examples

Model Architecture

How to contribute to deepaudio-speaker

Acknowledge