Automatic Speech Recognition (ASR), Speaker Verification, Speech Synthesis, Text-to-Speech (TTS), Language Modelling, Singing Voice Synthesis (SVS), Voice Conversion (VC)

Stars: ✭ 2,085 (+2352.94%)

Mutual labels: speaker-verification

GE2E-Loss

Pytorch implementation of Generalized End-to-End Loss for speaker verification

Stars: ✭ 72 (-15.29%)

Mutual labels: speaker-verification

UniSpeech

UniSpeech - Large Scale Self-Supervised Learning for Speech

Stars: ✭ 224 (+163.53%)

Mutual labels: speaker-verification

audio source separation

An implementation of audio source separation tools.

Stars: ✭ 41 (-51.76%)

Mutual labels: source-separation

bob

Bob is a free signal-processing and machine learning toolbox originally developed by the Biometrics group at Idiap Research Institute, in Switzerland. - Mirrored from https://gitlab.idiap.ch/bob/bob

Stars: ✭ 38 (-55.29%)

Mutual labels: speaker-verification

wavenet

Audio source separation (mixture to vocal) using the Wavenet

Stars: ✭ 20 (-76.47%)

Mutual labels: source-separation

3D-convolutional-speaker-recognition-pytorch

🔈 Deep Learning & 3D Convolutional Neural Networks for Speaker Verification

Stars: ✭ 106 (+24.71%)

Mutual labels: speaker-verification

Kaldi

kaldi-asr/kaldi is the official location of the Kaldi project.

Stars: ✭ 11,151 (+13018.82%)

Mutual labels: speaker-verification

SpleeterRT

Real time monaural source separation base on fully convolutional neural network operates on Time-frequency domain.

Stars: ✭ 111 (+30.59%)

Mutual labels: source-separation

ASVspoof2019 system

D3M - Dynamic Data Discrepancy Mitigation for Anti-spoofing - Implementation of work Dynamically Mitigating Data Discrepancy with Balanced Focal Loss for Replay Attack Detection

Stars: ✭ 22 (-74.12%)

Mutual labels: speaker-verification

soundscape IR

Tools of soundscape information retrieval, this repository is a developing project. Please go to https://github.com/meil-brcas-org/soundscape_IR for full releases.

Stars: ✭ 23 (-72.94%)

Mutual labels: source-separation

kaldi-timit-sre-ivector

Develop speaker recognition model based on i-vector using TIMIT database

Stars: ✭ 17 (-80%)

Mutual labels: speaker-verification

Speaker-Recognition

This repo contains my attempt to create a Speaker Recognition and Verification system using SideKit-1.3.1

Stars: ✭ 94 (+10.59%)

Mutual labels: speaker-verification

WASE

PyTorch implementation of WASE described in our ICASSP 2021: "Wase: Learning When to Attend for Speaker Extraction in Cocktail Party Environments"

Stars: ✭ 18 (-78.82%)

Mutual labels: speaker-extraction

Huawei-Challenge-Speaker-Identification

Trained speaker embedding deep learning models and evaluation pipelines in pytorch and tesorflow for speaker recognition.

Stars: ✭ 34 (-60%)

Mutual labels: speaker-verification

deepaudio-speaker

neural network based speaker embedder

Stars: ✭ 19 (-77.65%)

Mutual labels: speaker-verification

View All Similar Projects ➔

Target Speaker Extraction and Verification for Multi-talker Speech

The codes here are speaker extraction, where only target speaker's voice will be extracted given this target speaker's characteristics. In paper 2), we use a small network to jointly learn target speaker's characteristics from a different utterance of target speaker. You also can replace the network by using i-vector, or x-vector network.

If you are interested in speech separation to get all the speaker's voices in the mixture, please move to https://github.com/xuchenglin28/speech_separation

Papers

Please cite:

Chenglin Xu, Wei Rao, Xiong Xiao, Eng Siong Chng and Haizhou Li, "SINGLE CHANNEL SPEECH SEPARATION WITH CONSTRAINED UTTERANCE LEVEL PERMUTATION INVARIANT TRAINING USING GRID LSTM", in Proc. of ICASSP 2018, pp 6-10.
Chenglin Xu, Wei Rao, Eng Siong Chng, and Haizhou Li, "Optimization of Speaker Extraction Neural Network with Magnitude and Temporal Spectrum Approximation Loss", in Proc. of ICASSP 2019, pp 6990-6994.
Wei Rao, Chenglin Xu, Eng Siong Chng, and Haizhou Li, "Target Speaker Extraction for Multi-Talker Speaker Verification", in Proc. of Interspeech 2019, pp.1273-1277.

Data Generation:

If you are using wsj0 to simulate data as in the paper 2) and 3), please read the code in run_data_generation.sh for detials, and change the path accordingly.

The list of files and SNRs for {training, development and test sets} are in simulation/mix_2_spk_{tr,cv,tt}_extr.txt. In the files, the first column is the utterance of the target speaker to generate mixture and also used as target clean to supervise the network learning. The seconde column is the interference speaker to generate the mixture. The third column is the taget speaker's another utterane to obtain speaker's characteristics.

After run the .sh script, there will be 3 folders {mix, aux, s1} for the three sets {tr, cv, tt}. The mix folder is the mixture speech, aux folder is the utterances to obtain speaker's characteristics, and s1 is the folder of target clean. In all three folders, the names are cosistent for each example.

Speaker Extraction

This part includes feature extraction, model training, run-time inference. Please read the run.sh code for detail and revise accordingly.

noisy_dir: the folder where your simulated data is. For example, "data/wsj0_2mix_extr/wav8k/max", under this path, there will be three folder for training, development and test sets (tr, cv, tt). In each set, there will be three folder with names of (mix, aux, s1) as described in Data Genration part.

(The folder name for training and development set has been hard code. If you want to use differnt forlder name, please change parameters in read_list() function in train.py)

After given the path to the noisy_dir, you can just run the code to extract feature, train model, and do run-time inference.

run.sh

If you want to repeat the results in the published paper, you need to set the dur=0 in run.sh. Because variant utterances are used without fixing the duration of the utterances.

Speaker Verification:

Here we only provide the key files for the paper 3) on speaker verification. Please read the paper for details.

verification/keys: key files of simulated trials for multi-talker speaker verification system.

Environments:

python: 2.7

Tensorflow: 1.12 (some API are older version, but compatiable by 1.12)

Contact

e-mail: [email protected]

Licence

The code and models in this repository are licensed under the GNU General Public License Version 3.

Citation

If you would like to cite, use this :

@inproceedings{xu2018single,
  title={Single channel speech separation with constrained utterance level permutation invariant training using grid lstm},
  author={Xu, Chenglin and Rao, Wei and Xiao, Xiong and Chng, Eng Siong and Li, Haizhou},
  booktitle={IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
  pages={6--10},
  year={2018}
}
@inproceedings{xu2019optimization,
  title={Optimization of speaker extraction neural network with magnitude and temporal spectrum approximation loss},
  author={Xu, Chenglin and Rao, Wei and Chng, Eng Siong and Li, Haizhou},
  booktitle={IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
  pages={6990--6994},
  year={2019}
}
@inproceedings{rao2019target,
  title={Target speaker extraction for multi-talker speaker verification},
  author={Rao, Wei and Xu, Chenglin and Chng, Eng Siong and Li, Haizhou},
  booktitle={Proc. Of INTERSPEECH},
  pages={1273--1277},
  year={2019}
}

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

xuchenglin28 / speaker_extraction

Programming Languages

Labels

Projects that are alternatives of or similar to speaker extraction

Target Speaker Extraction and Verification for Multi-talker Speech

Papers

Data Generation:

Speaker Extraction

If you want to repeat the results in the published paper, you need to set the dur=0 in run.sh. Because variant utterances are used without fixing the duration of the utterances.

Speaker Verification:

Environments:

Contact

Licence

Citation