Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → wq2012 → Awesome Diarization

wq2012 / Awesome Diarization

Licence: apache-2.0

A curated list of awesome Speaker Diarization papers, libraries, datasets, and other resources.

Labels

deep-learning machine-learning awesome awesome-list speech-recognition speech-processing

Projects that are alternatives of or similar to Awesome Diarization

Speechbrain.github.io

The SpeechBrain project aims to build a novel speech toolkit fully based on PyTorch. With SpeechBrain users can easily create speech processing systems, ranging from speech recognition (both HMM/DNN and end-to-end), speaker recognition, speech enhancement, speech separation, multi-microphone speech processing, and many others.

Stars: ✭ 242 (-64.04%)

Mutual labels: speech-recognition, speech-processing

awesome-keyword-spotting

This repository is a curated list of awesome Speech Keyword Spotting (Wake-Up Word Detection).

Stars: ✭ 150 (-77.71%)

Mutual labels: speech-recognition, speech-processing

A data annotation pipeline to generate high-quality, large-scale speech datasets with machine pre-labeling and fully manual auditing.

Stars: ✭ 94 (-86.03%)

Mutual labels: speech-recognition, speech-processing

Speech recognition toolkit for the arduino

Stars: ✭ 448 (-33.43%)

Mutual labels: speech-recognition, speech-processing

Speech-Backbones

This is the main repository of open-sourced speech technology by Huawei Noah's Ark Lab.

Stars: ✭ 205 (-69.54%)

Mutual labels: speech-recognition, speech-processing

Nonautoreggenprogress

Tracking the progress in non-autoregressive generation (translation, transcription, etc.)

Stars: ✭ 118 (-82.47%)

Mutual labels: speech-recognition, speech-processing

react-native-spokestack

Spokestack: give your React Native app a voice interface!

Stars: ✭ 53 (-92.12%)

Mutual labels: speech-recognition, speech-processing

SincNet is a neural architecture for efficiently processing raw audio samples.

Stars: ✭ 764 (+13.52%)

Mutual labels: speech-recognition, speech-processing

open-speech-corpora

💎 A list of accessible speech corpora for ASR, TTS, and other Speech Technologies

Stars: ✭ 841 (+24.96%)

Mutual labels: speech-recognition, speech-processing

QuantumSpeech-QCNN

IEEE ICASSP 21 - Quantum Convolution Neural Networks for Speech Processing and Automatic Speech Recognition

Stars: ✭ 71 (-89.45%)

Mutual labels: speech-recognition, speech-processing

Keras (tensorflow) implementation of SincNet (Mirco Ravanelli, Yoshua Bengio - https://github.com/mravanelli/SincNet)

Stars: ✭ 47 (-93.02%)

Mutual labels: speech-recognition, speech-processing

UniSpeech - Large Scale Self-Supervised Learning for Speech

Stars: ✭ 224 (-66.72%)

Mutual labels: speech-recognition, speech-processing

Formant Analyzer

iOS application for finding formants in spoken sounds

Stars: ✭ 43 (-93.61%)

Mutual labels: speech-recognition, speech-processing

Zzz Retired openstt

RETIRED - OpenSTT is now retired. If you would like more information on Mycroft AI's open source STT projects, please visit:

Stars: ✭ 146 (-78.31%)

Mutual labels: speech-recognition, speech-processing

A implementation of Power Normalized Cepstral Coefficients: PNCC

Stars: ✭ 40 (-94.06%)

Mutual labels: speech-recognition, speech-processing

Pytorch implementation of subband decomposition

Stars: ✭ 63 (-90.64%)

Mutual labels: speech-recognition, speech-processing

a simple speech recognition app using the Web Speech API Interfaces

Stars: ✭ 18 (-97.33%)

Mutual labels: speech-recognition, speech-processing

Spokestack: give your iOS app a voice interface!

Stars: ✭ 27 (-95.99%)

Mutual labels: speech-recognition, speech-processing

[wip]Speech recognition tool-box written by Nim. Based on Arraymancer.

Stars: ✭ 17 (-97.47%)

Mutual labels: speech-recognition, speech-processing

💬 /so.nus/ STT (speech to text) for Node with offline hotword detection

Stars: ✭ 532 (-20.95%)

Mutual labels: speech-recognition

View All Similar Projects ➔

Awesome Speaker Diarization

Table of contents

Overview
Publications
Software
Datasets
Conferences
Leaderboards
Other learning materials
Products

Overview

This is a curated list of awesome Speaker Diarization papers, libraries, datasets, and other resources.

The purpose of this repo is to organize the world’s resources for speaker diarization, and make them universally accessible and useful.

To add items to this page, simply send a pull request. (contributing guide)

Publications

Special topics

Review & survey papers

Supervisied diarization

Joint diarization and ASR

Challenges

Other

2020

2019

2018

2017

2016

A Speaker Diarization System for Studying Peer-Led Team Learning Groups

2015

Diarization resegmentation in the factor analysis subspace

2014

2013

Unsupervised methods for speaker diarization: An integrated and iterative approach

2011

2009

Speaker Diarization for Meeting Room Audio

2008

Stream-based speaker segmentation using speaker factors and eigenvoices

2006

Software

Framework

Link	Language	Description
SpeechBrain	Python & PyTorch	SpeechBrain is an open-source and all-in-one speech toolkit based on PyTorch.
SIDEKIT for diarization (s4d)	Python	An open source package extension of SIDEKIT for Speaker diarization.
pyAudioAnalysis	Python	Python Audio Analysis Library: Feature Extraction, Classification, Segmentation and Applications.
AaltoASR	Python & Perl	Speaker diarization scripts, based on AaltoASR.
LIUM SpkDiarization	Java	LIUM_SpkDiarization is a software dedicated to speaker diarization (i.e. speaker segmentation and clustering). It is written in Java, and includes the most recent developments in the domain (as of 2013).
kaldi-asr	Bash	Example scripts for speaker diarization on a portion of CALLHOME used in the 2000 NIST speaker recognition evaluation.
Alize LIA_SpkSeg	C++	ALIZÉ is an opensource platform for speaker recognition. LIA_SpkSeg is the tools for speaker diarization.
pyannote-audio	Python	Neural building blocks for speaker diarization: speech activity detection, speaker change detection, speaker embedding.
pyBK	Python	Speaker diarization using binary key speaker modelling. Computationally light solution that does not require external training data.
Speaker-Diarization	Python	Speaker diarization using uis-rnn and GhostVLAD. An easier way to support openset speakers.
EEND	Python & Bash & Perl	End-to-End Neural Diarization.
VBDiarization	Python	Speaker diarization based on Kaldi x-vectors using pretrained model trained in Kaldi (kaldi-asr/kaldi) and converted to ONNX format (onnx/onnx) running in ONNXRuntime (Microsoft/onnxruntime).
RE-VERB	Python & JavaScript	RE: VERB is speaker diarization system, it allows the user to send/record audio of a conversation and receive timestamps of who spoke when.

Evaluation

Link	Language	Description
pyannote-metrics	Python	A toolkit for reproducible evaluation, diagnostic, and error analysis of speaker diarization systems.
SimpleDER	Python	A lightweight library to compute Diarization Error Rate (DER).
NIST md-eval	Perl	(1) modified md-eval.pl from Mary Tai Knox; (2) md-eval-v21.pl from jitendra; (3) md-eval-22.pl from nryant
dscore	Python & Perl	Diarization scoring tools.
Sequence Match Accuracy	Python	Match the accuracy of two sequences with Hungarian algorithm.

Clustering

Link	Language	Description
uis-rnn	Python & PyTorch	Google's Unbounded Interleaved-State Recurrent Neural Network (UIS-RNN) algorithm, for Fully Supervised Speaker Diarization. This clustering algorithm is supervised.
uis-rnn-sml	Python & PyTorch	A variant of UIS-RNN, for the paper Supervised Online Diarization with Sample Mean Loss for Multi-Domain Data.
DNC	Python & ESPnet	Transformer-based Discriminative Neural Clustering (DNC) for Speaker Diarisation. Like UIS-RNN, it is supervised.
SpectralCluster	Python	Spectral clustering with affinity matrix refinement operations.
sklearn.cluster	Python	scikit-learn clustering algorithms.
PLDA	Python	Probabilistic Linear Discriminant Analysis & classification, written in Python.
PLDA	C++	Open-source implementation of simplified PLDA (Probabilistic Linear Discriminant Analysis).
Auto-Tuning Spectral Clustering	Python	Auto-tuning Spectral Clustering method that does not need development set or supervised tuning.

Speaker embedding

Link	Method	Language	Description
resemble-ai/Resemblyzer	d-vector	Python & PyTorch	PyTorch implementation of generalized end-to-end loss for speaker verification, which can be used for voice cloning and diarization.
Speaker_Verification	d-vector	Python & TensorFlow	Tensorflow implementation of generalized end-to-end loss for speaker verification.
PyTorch_Speaker_Verification	d-vector	Python & PyTorch	PyTorch implementation of "Generalized End-to-End Loss for Speaker Verification" by Wan, Li et al. With UIS-RNN integration.
Real-Time Voice Cloning	d-vector	Python & PyTorch	Implementation of "Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis" (SV2TTS) with a vocoder that works in real-time.
deep-speaker	d-vector	Python & Keras	Third party implementation of the Baidu paper Deep Speaker: an End-to-End Neural Speaker Embedding System.
x-vector-kaldi-tf	x-vector	Python & TensorFlow & Perl	Tensorflow implementation of x-vector topology on top of Kaldi recipe.
kaldi-ivector	i-vector	C++ & Perl	Extension to Kaldi implementing the standard i-vector hyperparameter estimation and i-vector extraction procedure.
voxceleb-ivector	i-vector	Perl	Voxceleb1 i-vector based speaker recognition system.
pytorch_xvectors	x-vector	Python & PyTorch	PyTorch implementation of Voxceleb x-vectors. Additionaly, includes meta-learning architectures for embedding training. Evaluated with speaker diarization and speaker verification.

Speaker change detection

Link	Language	Description
change_detection	Python & Keras	Code for Speaker Change Detection in Broadcast TV using Bidirectional Long Short-Term Memory Networks.

Audio feature extraction

Link	Language	Description
LibROSA	Python	Python library for audio and music analysis. https://librosa.github.io/
python_speech_features	Python	This library provides common speech features for ASR including MFCCs and filterbank energies. https://python-speech-features.readthedocs.io/en/latest/
pyAudioAnalysis	Python	Python Audio Analysis Library: Feature Extraction, Classification, Segmentation and Applications.

Audio data augmentation

Link	Language	Description
pyroomacoustics	Python	Pyroomacoustics is a package for audio signal processing for indoor applications. It was developed as a fast prototyping platform for beamforming algorithms in indoor scenarios. https://pyroomacoustics.readthedocs.io
gpuRIR	Python	Python library for Room Impulse Response (RIR) simulation with GPU acceleration
rir_simulator_python	Python	Room impulse response simulator using python

Other software

Link	Language	Description
VB Diarization	Python	VB Diarization with Eigenvoice and HMM Priors.

Datasets

Diarization datasets

Audio	Diarization ground truth	Language	Pricing	Additional information
2000 NIST Speaker Recognition Evaluation	Disk-6 (Switchboard), Disk-8 (CALLHOME)	Multiple	$2400.00	Evaluation Plan
2003 NIST Rich Transcription Evaluation Data	Together with audios	en, ar, zh	$2000.00	telephone speech, broadcast news
CALLHOME American English Speech	CALLHOME American English Transcripts	en	$1500.00 + $1000.00	CH109 whitelist
The ICSI Meeting Corpus	Together with audios	en	Free	License
The AMI Meeting Corpus	Together with audios (need to be processed)	Multiple	Free	License
Fisher English Training Speech Part 1 Speech	Fisher English Training Speech Part 1 Transcripts	en	$7000.00 + $1000.00
Fisher English Training Part 2, Speech	Fisher English Training Part 2, Transcripts	en	$7000.00 + $1000.00
VoxConverse	TBD	TBD	Free	VoxConverse is an audio-visual diarisation dataset consisting of over 50 hours of multispeaker clips of human speech, extracted from YouTube videos

Speaker embedding training sets

Name	Utterances	Speakers	Language	Pricing	Additional information
TIMIT	6K+	630	en	$250.00	Published in 1993, the TIMIT corpus of read speech is one of the earliest speaker recognition datasets.
VCTK	43K+	109	en	Free	Most were selected from a newspaper plus the Rainbow Passage and an elicitation paragraph intended to identify the speaker's accent.
LibriSpeech	292K	2K+	en	Free	Large-scale (1000 hours) corpus of read English speech.
Multilingual LibriSpeech (MLS)	?	?	en, de, nl, es, fr, it, pt, po	Free	Multilingual LibriSpeech (MLS) dataset is a large multilingual corpus suitable for speech research. The dataset is derived from read audiobooks from LibriVox and consists of 8 languages - English, German, Dutch, Spanish, French, Italian, Portuguese, Polish.
LibriVox	180K	9K+	Multiple	Free	Free public domain audiobooks. LibriSpeech is a processed subset of LibriVox. Each original unsegmented utterance could be very long.
VoxCeleb 1&2	1M+	7K	Multiple	Free	VoxCeleb is an audio-visual dataset consisting of short clips of human speech, extracted from interview videos uploaded to YouTube.
The Spoken Wikipedia Corpora	5K	879	en, de, nl	Free	Volunteer readers reading Wikipedia articles.
CN-Celeb	130K+	1K	zh	Free	A Free Chinese Speaker Recognition Corpus Released by [email protected] University.
BookTubeSpeech	8K	8K	en	Free	Audio samples extracted from BookTube videos - videos where people share their opinions on books - from YouTube. The dataset can be downloaded using BookTubeSpeech-download.
DeepMine	540K	1850	fa, en	Unknown	A speech database in Persian and English designed to build and evaluate speaker verification, as well as Persian ASR systems.
NISP-Dataset	?	345	hi, kn, ml, ta, te (all Indian languages)	Free	This dataset contains speech recordings along with speaker physical parameters (height, weight, ... ) as well as regional information and linguistic information.

Augmentation noise sources

Name	Utterances	Pricing	Additional information
AudioSet	2M	Free	A large-scale dataset of manually annotated audio events.
MUSAN	N/A	Free	MUSAN is a corpus of music, speech, and noise recordings.

Conferences

Conference/Workshop	Frequency	Page Limit	Organization	Blind Review
ICASSP	Annual	4 + 1 (ref)	IEEE	No
InterSpeech	Annual	4 + 1 (ref)	ISCA	No
Speaker Odyssey	Biennial	8 + 2 (ref)	ISCA	No
SLT	Biennial	6 + 2 (ref)	IEEE	Yes
ASRU	Biennial	6 + 2 (ref)	IEEE	Yes
WASPAA	Biennial	4 + 1 (ref)	IEEE	No

Other learning materials

Books

Voice Identity Techniques: From core algorithms to engineering practice (Chinese) by Quan Wang, 2020

Tech blogs

Video tutorials

pyannote audio: neural building blocks for speaker diarization by Hervé Bredin
Google's Diarization System: Speaker Diarization with LSTM by Google
Fully Supervised Speaker Diarization: Say Goodbye to clustering by Google
Speaker Diarization: Optimal Clustering and Learning Speaker Embeddings by Microsoft Research
Robust Speaker Diarization for Meetings: the ICSI system by Microsoft Research
【机器之心&博文视点】入门声纹技术｜第二讲：声纹分割聚类与其他应用 by Quan Wang

Products

Company	Product
Google	Google Cloud Speech-to-Text API
Amazon	Amazon Transcribe
IBM	Watson Speech To Text API
DeepAffects	Speaker Diarization API

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 673

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (1) 🔗