All Projects → xuchenglin28 → speaker_extraction

xuchenglin28 / speaker_extraction

Licence: GPL-3.0 license
target speaker extraction and verification for multi-talker speech

Programming Languages

python
139335 projects - #7 most used programming language
shell
77523 projects
matlab
3953 projects

Projects that are alternatives of or similar to speaker extraction

Speaker-Identification
A program for automatic speaker identification using deep learning techniques.
Stars: ✭ 84 (-1.18%)
Mutual labels:  speaker-verification
Voice-ML
MobileNet trained with VoxCeleb dataset and used for voice verification
Stars: ✭ 15 (-82.35%)
Mutual labels:  speaker-verification
wavenet-classifier
Keras Implementation of Deepmind's WaveNet for Supervised Learning Tasks
Stars: ✭ 54 (-36.47%)
Mutual labels:  speaker-verification
dropclass speaker
DropClass and DropAdapt - repository for the paper accepted to Speaker Odyssey 2020
Stars: ✭ 20 (-76.47%)
Mutual labels:  speaker-verification
Awesome Speech Recognition Speech Synthesis Papers
Automatic Speech Recognition (ASR), Speaker Verification, Speech Synthesis, Text-to-Speech (TTS), Language Modelling, Singing Voice Synthesis (SVS), Voice Conversion (VC)
Stars: ✭ 2,085 (+2352.94%)
Mutual labels:  speaker-verification
GE2E-Loss
Pytorch implementation of Generalized End-to-End Loss for speaker verification
Stars: ✭ 72 (-15.29%)
Mutual labels:  speaker-verification
UniSpeech
UniSpeech - Large Scale Self-Supervised Learning for Speech
Stars: ✭ 224 (+163.53%)
Mutual labels:  speaker-verification
audio source separation
An implementation of audio source separation tools.
Stars: ✭ 41 (-51.76%)
Mutual labels:  source-separation
bob
Bob is a free signal-processing and machine learning toolbox originally developed by the Biometrics group at Idiap Research Institute, in Switzerland. - Mirrored from https://gitlab.idiap.ch/bob/bob
Stars: ✭ 38 (-55.29%)
Mutual labels:  speaker-verification
wavenet
Audio source separation (mixture to vocal) using the Wavenet
Stars: ✭ 20 (-76.47%)
Mutual labels:  source-separation
3D-convolutional-speaker-recognition-pytorch
🔈 Deep Learning & 3D Convolutional Neural Networks for Speaker Verification
Stars: ✭ 106 (+24.71%)
Mutual labels:  speaker-verification
Kaldi
kaldi-asr/kaldi is the official location of the Kaldi project.
Stars: ✭ 11,151 (+13018.82%)
Mutual labels:  speaker-verification
SpleeterRT
Real time monaural source separation base on fully convolutional neural network operates on Time-frequency domain.
Stars: ✭ 111 (+30.59%)
Mutual labels:  source-separation
ASVspoof2019 system
D3M - Dynamic Data Discrepancy Mitigation for Anti-spoofing - Implementation of work Dynamically Mitigating Data Discrepancy with Balanced Focal Loss for Replay Attack Detection
Stars: ✭ 22 (-74.12%)
Mutual labels:  speaker-verification
soundscape IR
Tools of soundscape information retrieval, this repository is a developing project. Please go to https://github.com/meil-brcas-org/soundscape_IR for full releases.
Stars: ✭ 23 (-72.94%)
Mutual labels:  source-separation
kaldi-timit-sre-ivector
Develop speaker recognition model based on i-vector using TIMIT database
Stars: ✭ 17 (-80%)
Mutual labels:  speaker-verification
Speaker-Recognition
This repo contains my attempt to create a Speaker Recognition and Verification system using SideKit-1.3.1
Stars: ✭ 94 (+10.59%)
Mutual labels:  speaker-verification
WASE
PyTorch implementation of WASE described in our ICASSP 2021: "Wase: Learning When to Attend for Speaker Extraction in Cocktail Party Environments"
Stars: ✭ 18 (-78.82%)
Mutual labels:  speaker-extraction
Huawei-Challenge-Speaker-Identification
Trained speaker embedding deep learning models and evaluation pipelines in pytorch and tesorflow for speaker recognition.
Stars: ✭ 34 (-60%)
Mutual labels:  speaker-verification
deepaudio-speaker
neural network based speaker embedder
Stars: ✭ 19 (-77.65%)
Mutual labels:  speaker-verification

Target Speaker Extraction and Verification for Multi-talker Speech

The codes here are speaker extraction, where only target speaker's voice will be extracted given this target speaker's characteristics. In paper 2), we use a small network to jointly learn target speaker's characteristics from a different utterance of target speaker. You also can replace the network by using i-vector, or x-vector network.

If you are interested in speech separation to get all the speaker's voices in the mixture, please move to https://github.com/xuchenglin28/speech_separation

Papers

Please cite:

  1. Chenglin Xu, Wei Rao, Xiong Xiao, Eng Siong Chng and Haizhou Li, "SINGLE CHANNEL SPEECH SEPARATION WITH CONSTRAINED UTTERANCE LEVEL PERMUTATION INVARIANT TRAINING USING GRID LSTM", in Proc. of ICASSP 2018, pp 6-10.
  2. Chenglin Xu, Wei Rao, Eng Siong Chng, and Haizhou Li, "Optimization of Speaker Extraction Neural Network with Magnitude and Temporal Spectrum Approximation Loss", in Proc. of ICASSP 2019, pp 6990-6994.
  3. Wei Rao, Chenglin Xu, Eng Siong Chng, and Haizhou Li, "Target Speaker Extraction for Multi-Talker Speaker Verification", in Proc. of Interspeech 2019, pp.1273-1277.

Data Generation:

If you are using wsj0 to simulate data as in the paper 2) and 3), please read the code in run_data_generation.sh for detials, and change the path accordingly.

The list of files and SNRs for {training, development and test sets} are in simulation/mix_2_spk_{tr,cv,tt}_extr.txt. In the files, the first column is the utterance of the target speaker to generate mixture and also used as target clean to supervise the network learning. The seconde column is the interference speaker to generate the mixture. The third column is the taget speaker's another utterane to obtain speaker's characteristics.

After run the .sh script, there will be 3 folders {mix, aux, s1} for the three sets {tr, cv, tt}. The mix folder is the mixture speech, aux folder is the utterances to obtain speaker's characteristics, and s1 is the folder of target clean. In all three folders, the names are cosistent for each example.

Speaker Extraction

This part includes feature extraction, model training, run-time inference. Please read the run.sh code for detail and revise accordingly.

noisy_dir: the folder where your simulated data is. For example, "data/wsj0_2mix_extr/wav8k/max", under this path, there will be three folder for training, development and test sets (tr, cv, tt). In each set, there will be three folder with names of (mix, aux, s1) as described in Data Genration part.

(The folder name for training and development set has been hard code. If you want to use differnt forlder name, please change parameters in read_list() function in train.py)

After given the path to the noisy_dir, you can just run the code to extract feature, train model, and do run-time inference.

run.sh

If you want to repeat the results in the published paper, you need to set the dur=0 in run.sh. Because variant utterances are used without fixing the duration of the utterances.

Speaker Verification:

Here we only provide the key files for the paper 3) on speaker verification. Please read the paper for details.

verification/keys: key files of simulated trials for multi-talker speaker verification system.

Environments:

python: 2.7

Tensorflow: 1.12 (some API are older version, but compatiable by 1.12)

Contact

e-mail: [email protected]

Licence

The code and models in this repository are licensed under the GNU General Public License Version 3.

Citation

If you would like to cite, use this :

@inproceedings{xu2018single,
  title={Single channel speech separation with constrained utterance level permutation invariant training using grid lstm},
  author={Xu, Chenglin and Rao, Wei and Xiao, Xiong and Chng, Eng Siong and Li, Haizhou},
  booktitle={IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
  pages={6--10},
  year={2018}
}
@inproceedings{xu2019optimization,
  title={Optimization of speaker extraction neural network with magnitude and temporal spectrum approximation loss},
  author={Xu, Chenglin and Rao, Wei and Chng, Eng Siong and Li, Haizhou},
  booktitle={IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
  pages={6990--6994},
  year={2019}
}
@inproceedings{rao2019target,
  title={Target speaker extraction for multi-talker speaker verification},
  author={Rao, Wei and Xu, Chenglin and Chng, Eng Siong and Li, Haizhou},
  booktitle={Proc. Of INTERSPEECH},
  pages={1273--1277},
  year={2019}
}
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].