Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → gemengtju → Tutorial_separation

gemengtju / Tutorial_separation

This repo summarizes the tutorials, datasets, papers, codes and tools for speech separation and speaker extraction task. You are kindly invited to pull requests.

Programming Languages

matlab

3953 projects

Labels

deep-learning deep-neural-networks signal-processing speech-processing

Projects that are alternatives of or similar to Tutorial separation

Awesome Speech Enhancement

A tutorial for Speech Enhancement researchers and practitioners. The purpose of this repo is to organize the world’s resources for speech enhancement and make them universally accessible and useful.

Stars: ✭ 257 (+70.2%)

Mutual labels: deep-neural-networks, speech-processing, signal-processing

bob

Bob is a free signal-processing and machine learning toolbox originally developed by the Biometrics group at Idiap Research Institute, in Switzerland. - Mirrored from https://gitlab.idiap.ch/bob/bob

Stars: ✭ 38 (-74.83%)

Mutual labels: signal-processing, speech-processing

Shifter

Pitch shifter using WSOLA and resampling implemented by Python3

Stars: ✭ 22 (-85.43%)

Mutual labels: signal-processing, speech-processing

spafe

🔉 spafe: Simplified Python Audio Features Extraction

Stars: ✭ 310 (+105.3%)

Mutual labels: signal-processing, speech-processing

Gcommandspytorch

ConvNets for Audio Recognition using Google Commands Dataset

Stars: ✭ 65 (-56.95%)

Mutual labels: deep-neural-networks, speech-processing

pyssp

python speech signal processing library

Stars: ✭ 18 (-88.08%)

Mutual labels: signal-processing, speech-processing

torchsubband

Pytorch implementation of subband decomposition

Stars: ✭ 63 (-58.28%)

Mutual labels: signal-processing, speech-processing

Surfboard

Novoic's audio feature extraction library

Stars: ✭ 318 (+110.6%)

Mutual labels: speech-processing, signal-processing

Sincnet

SincNet is a neural architecture for efficiently processing raw audio samples.

Stars: ✭ 764 (+405.96%)

Mutual labels: speech-processing, signal-processing

Tfg Voice Conversion

Deep Learning-based Voice Conversion system

Stars: ✭ 115 (-23.84%)

Mutual labels: deep-neural-networks, speech-processing

Python Pesq

PESQ (Perceptual Evaluation of Speech Quality) Wrapper for Python Users (narrow band and wide band)

Stars: ✭ 144 (-4.64%)

Mutual labels: signal-processing

Multihead Siamese Nets

Implementation of Siamese Neural Networks built upon multihead attention mechanism for text semantic similarity task.

Stars: ✭ 144 (-4.64%)

Mutual labels: deep-neural-networks

Djl

An Engine-Agnostic Deep Learning Framework in Java

Stars: ✭ 2,262 (+1398.01%)

Mutual labels: deep-neural-networks

Arc Pytorch

The first public PyTorch implementation of Attentive Recurrent Comparators

Stars: ✭ 147 (-2.65%)

Mutual labels: deep-neural-networks

Wavenet vocoder

WaveNet vocoder

Stars: ✭ 1,926 (+1175.5%)

Mutual labels: speech-processing

Pycwt

A Python module for continuous wavelet spectral analysis. It includes a collection of routines for wavelet transform and statistical analysis via FFT algorithm. In addition, the module also includes cross-wavelet transforms, wavelet coherence tests and sample scripts.

Stars: ✭ 146 (-3.31%)

Mutual labels: signal-processing

Livianet

This repository contains the code of LiviaNET, a 3D fully convolutional neural network that was employed in our work: "3D fully convolutional networks for subcortical segmentation in MRI: A large-scale study"

Stars: ✭ 143 (-5.3%)

Mutual labels: deep-neural-networks

Shainet

SHAInet - a pure Crystal machine learning library

Stars: ✭ 143 (-5.3%)

Mutual labels: deep-neural-networks

Deep Learning Papers Reading Roadmap

深度学习论文阅读路线图

Stars: ✭ 142 (-5.96%)

Mutual labels: deep-neural-networks

Densenet Sdr

repo that holds code for improving on dropout using Stochastic Delta Rule

Stars: ✭ 148 (-1.99%)

Mutual labels: deep-neural-networks

View All Similar Projects ➔

Speech Separation and Extraction via Deep Learning

This repo summarizes the tutorials, datasets, papers, codes and tools for speech separation and speaker extraction task. You are kindly invited to pull requests.

Tutorials
Datasets
Papers
Tools
- System Tool
- Evaluation Tool
Results on WSJ0-2mix

Tutorials

[Speech Separation, Hung-yi Lee, 2020] [Video (Subtitle)] [Video] [Slide]
[Advances in End-to-End Neural Source Separation, Yi Luo, 2020] [Video (BiliBili)] [Video] [Slide]
[Audio Source Separation and Speech Enhancement, Emmanuel Vincent, 2018] [Book]
[Audio Source Separation, Shoji Makino, 2018] [Book]
[Overview Papers] [Paper (Daniel Michelsanti)] [Paper (DeLiang Wang)] [Paper (Bo Xu)] [Paper (Zafar Rafii)] [Paper (Sharon Gannot)]
[Overview Slides] [Slide (DeLiang Wang)] [Slide (Haizhou Li)] [Slide (Meng Ge)]
[Hand Book] [Ongoing]

Datasets

[Dataset Intruduciton] [Pure Speech Dataset Slide (Meng Ge)] [Audio-Visual Dataset Slide (Zexu Pan)]
[WSJ0] [Dataset]
[WSJ0-2mix] [Script]
[WSJ0-2mix-extr] [Script]
[WHAM & WHAMR] [Paper (WHAM)] [Paper (WHAMR)] [Dataset]
[LibriMix] [Paper] [Script]
[LibriCSS] [Paper] [Script]
[SparseLibriMix] [Script]
[VCTK-2Mix] [Script]
[CHIME5 & CHIME6 Challenge] [Dataset]
[AudioSet] [Dataset]
[Microsoft DNS Challenge] [Dataset]
[AVSpeech] [Dataset]
[LRW] [Dataset]
[LRS2] [Dataset]
[LRS3] [Dataset] [Script]
[VoxCeleb] [Dataset]

Papers

Speech Separation based on Brain Studies

[Attentional Selection in a Cocktail Party Environment Can Be Decoded from Single-Trial EEG, James, Cerebral Cortex 2012] [Paper]
[Selective cortical representation of attended speaker in multi-talker speech perception, Nima Mesgarani, Nature 2012] [Paper]
[Neural decoding of attentional selection in multi-speaker environments without access to clean sources, James, Journal of Neural Engineering 2017] [Paper]
[Speech synthesis from neural decoding of spoken sentences, Gopala K. Anumanchipalli, Nature 2019] [Paper]
[Towards reconstructing intelligible speech from the human auditory cortex, HassanAkbari, Scientific Reports 2019] [Paper] [Code]

Pure Speech Separation

[Joint Optimization of Masks and Deep Recurrent Neural Networks for Monaural Source Separation, Po-Sen Huang, TASLP 2015] [Paper] [Code (posenhuang)]
[Complex Ratio Masking for Monaural Speech Separation, DS Williamson, TASLP 2015] [Paper]
[Deep clustering: Discriminative embeddings for segmentation and separation, JR Hershey, ICASSP 2016] [Paper] [Code (Kai Li)] [Code (Jian Wu)] [Code (asteroid)]
[Single-channel multi-speaker separation using deep clustering, Y Isik, Interspeech 2016] [Paper] [Code (Kai Li)] [Code (Jian Wu)]
[Permutation invariant training of deep models for speaker-independent multi-talker speech separation, Dong Yu, ICASSP 2017] [Paper] [Code (Kai Li)] [Code (Sining Sun)]
[Recognizing Multi-talker Speech with Permutation Invariant Training, Dong Yu, ICASSP 2017] [Paper]
[Multitalker speech separation with utterance-level permutation invariant training of deep recurrent neural networks, M Kolbæk, TASLP 2017] [Paper] [Code (Kai Li)]
[Deep attractor network for single-microphone speaker separation, Zhuo Chen, ICASSP 2017] [Paper] [Code (Kai Li)]
[Alternative Objective Functions for Deep Clustering, Zhong-Qiu Wang, ICASSP 2018] [Paper]
[Listen, Think and Listen Again: Capturing Top-down Auditory Attention for Speaker-independent Speech Separation, Jing Shi, IJCAI 2018] [Paper]
[End-to-End Speech Separation with Unfolded Iterative Phase Reconstructioni, Zhong-Qiu Wang et al. 2018] [Paper]
[Modeling Attention and Memory for Auditory Selection in a Cocktail Party Environment, Jiaming Xu, AAAI 2018] [Paper] [Code]
[Speaker-independent Speech Separation with Deep Attractor Network, Luo Yi, TASLP 2018] [Paper] [Code (Kai Li)]
[Listening to Each Speaker One by One with Recurrent Selective Hearing Networks, Keisuke Kinoshita, ICASSP 2018] [Paper]
[Tasnet: time-domain audio separation network for real-time, single-channel speech separation, Luo Yi, ICASSP 2018] [Paper] [Code (Kai Li)] [Code (asteroid)]
[Conv-TasNet: Surpassing Ideal Time-Frequency Magnitude Masking for Speech Separation, Luo Yi, TASLP 2019] [Paper] [Code (Kai Li)] [Code (asteroid)]
[Divide and Conquer: A Deep CASA Approach to Talker-independent Monaural Speaker Separation, Yuzhou Liu, TASLP 2019] [Paper] [Code] [Code]
[Improved Speech Separation with Time-and-Frequency Cross-domain Joint Embedding and Clustering, Gene-Ping Yang, Interspeech 2019] [Paper] [Code]
[Dual-path RNN: efficient long sequence modeling for time-domain single-channel speech separation, Luo Yi, Arxiv 2019] [Paper] [Code (Kai Li)]
[A comprehensive study of speech separation: spectrogram vs waveform separation, Fahimeh Bahmaninezhad, Interspeech 2019] [Paper]
[Discriminative Learning for Monaural Speech Separation Using Deep Embedding Features, Cunhang Fan, Interspeech 2019] [Paper]
[Interrupted and cascaded permutation invariant training for speech separation, Gene-Ping Yang, ICASSP, 2020][Paper]
[FurcaNeXt: End-to-end monaural speech separation with dynamic gated dilated temporal convolutional networks, Liwen Zhang, MMM 2020] [Paper]
[Filterbank design for end-to-end speech separation, Manuel Pariente et al., ICASSP 2020] [Paper]
[Voice Separation with an Unknown Number of Multiple Speakers, Eliya Nachmani, Arxiv 2020] [Paper] [Demo]
[AN EMPIRICAL STUDY OF CONV-TASNET, Berkan Kadıoglu , Arxiv 2020] [Paper] [Code]
[Voice Separation with an Unknown Number of Multiple Speakers, Eliya Nachmani, Arxiv 2020] [Paper]
[Wavesplit: End-to-End Speech Separation by Speaker Clustering, Neil Zeghidour et al. Arxiv 2020 ] [Paper]
[La Furca: Iterative Context-Aware End-to-End Monaural Speech Separation Based on Dual-Path Deep Parallel Inter-Intra Bi-LSTM with Attention, Ziqiang Shi, Arxiv 2020] [Paper]
[Deep Attention Fusion Feature for Speech Separation with End-to-End Post-ﬁlter Method, Cunhang Fan, Arxiv 2020] [Paper]
[Identify Speakers in Cocktail Parties with End-to-End Attention, Junzhe Zhu, Arxiv 2018] [Paper] [Code]
[Sequence to Multi-Sequence Learning via Conditional Chain Mapping for Mixture Signals, Jing Shi, Arxiv 2020] [Paper] [Code/Demo]
[Speaker-Conditional Chain Model for Speech Separation and Extraction, Jing Shi, Arxiv 2020] [Paper] [Code/Demo]
[Improving Voice Separation by Incorporating End-to-end Speech Recognition, Naoya Takahashi, ICASSP 2020] [Paper] [Code]
[A Multi-Phase Gammatone Filterbank for Speech Separation via TasNet, David Ditter, ICASSP 2020] [Paper] [Code]
[Two-Step Sound Source Separation: Training on Learned Latent Targets, Efthymios Tzinis, ICASSP 2020] [Paper] [Code (Asteroid)] [Code (Tzinis)]
[Unsupervised Sound Separation Using Mixtures of Mixtures, Scott Wisdom, Arxiv] [Paper]
[Speech Separation Based on Multi-Stage Elaborated Dual-Path Deep BiLSTM with Auxiliary Identity Loss, Ziqiang Shi, 2020] [Paper]

Multi-Model Speech Separation

[Deep Audio-Visual Learning: A Survey, Hao Zhu, Arxiv 2020] [Paper]
[Audio-Visual Speech Enhancement Using Multimodal Deep Convolutional Neural Networks, Jen-Cheng Hou, TETCI 2017] [Paper] [Code]
[The Sound of Pixels, Hang Zhao, ECCV 2018] [Paper/Demo]
[Learning to Separate Object Sounds by Watching Unlabeled Video, Ruohan Gao, ECCV 2018] [Paper]
[The Conversation: Deep Audio-Visual Speech Enhancement, Triantafyllos Afouras, Interspeech 2018] [Paper]
[End-to-end audiovisual speech recognition, Stavros Petridis, ICASSP 2018] [Paper] [Code]
[The Sound of Pixels, Hang Zhao, ECCV 2018] [Paper] [Code]
[Looking to Listen at the Cocktail Party: A Speaker-Independent Audio-Visual Model for Speech Separation, ARIEL EPHRAT, ACM Transactions on Graphics 2018] [Paper] [Code]
[Learning to Separate Object Sounds by Watching Unlabeled Video, Ruohan Gao, ECCV 2018] [Paper]
[Time domain audio visual speech separation, Jian Wu, Arxiv 2019] [Paper]
[Co-Separating Sounds of Visual Objects, Ruohan Gao, ICCV 2019] [Paper]
[Recursive Visual Sound Separation Using Minus-Plus Net, Xudong Xu, ICCV 2019] [Paper]
[The Sound of Motions, Hang Zhao, ICCV 2019] [Paper]
[Audio-Visual Speech Separation and Dereverberation with a Two-Stage Multimodal Network, Ke Tan, Arxiv 2019] [Paper]
[Co-Separating Sounds of Visual Objects, Ruohan Gao, ICCV 2019] [Paper] [Code]
[Face Landmark-based Speaker-Independent Audio-Visual Speech Enhancement in Multi-Talker Environments, Giovanni Morrone, Arxiv 2019] [Paper] [Code]
[Music Gesture for Visual Sound Separation, Chuang Gao, CVPR 2020] [Paper]
[FaceFilter: Audio-visual speech separation using still images, Soo-Whan Chung, Arxiv 2020] [Paper]
[Awesome Audio-Visual, Github, Kranti Kumar Parida] [Github Link]

Multi-channel Speech Separation

[FaSNet: Low-latency Adaptive Beamforming for Multi-microphone Audio Processing, Yi Luo , Arxiv 2019] [Paper]
[MIMO-SPEECH: End-to-End Multi-Channel Multi-Speaker Speech Recognition, Xuankai Chang et al., ASRU 2020] [Paper]
[End-to-end Microphone Permutation and Number Invariant Multi-channel Speech Separation, Yi Luo et al., ICASSP 2020] [Paper] [Code]
[Enhancing End-to-End Multi-channel Speech Separation via Spatial Feature Learning, Rongzhi Guo, ICASSP 2020] [Paper]
[Multi-modal Multi-channel Target Speech Separation, Rongzhi Guo, J-STSP 2020] [Paper]

Speaker Extraction

[Single channel target speaker extraction and recognition with speaker beam, Marc Delcroix, ICASSP 2018] [Paper]
[VoiceFilter: Targeted Voice Separation by Speaker-Conditioned Spectrogram Masking, Quan Wang, INTERSPEECH 2018] [Paper] [Code (Jian Wu)]
[Single-Channel Speech Extraction Using Speaker Inventory and Attention Network, Xiong Xiao et al, ICASSP 2019] [Paper]
[Optimization of Speaker Extraction Neural Network with Magnitude and Temporal Spectrum Approximation Loss, Chenglin Xu, ICASSP 2019] [Paper] [Code]
[Time-domain speaker extraction network, Chenglin Xu, ASRU 2019] [Paper]
[SpEx: Multi-Scale Time Domain Speaker Extraction Network, Chenglin Xu, TASLP 2020] [Paper]
[Improving speaker discrimination of target speech extraction with time-domain SpeakerBeam, Marc Delcroix, ICASSP 2020] [Paper]
[SpEx+: A Complete Time Domain Speaker Extraction Network, Meng Ge, Arxiv 2020] [Paper] [Code]

Tools

System Tools

[Asteroid: the PyTorch-based audio source separation toolkit for researchers, Manuel Pariente et al., ICASSP 2020] [Tool Link]
[ESPnet-se: end-to-end speech enhancement and separation toolkit designed for asr integration, Chenda Li et al., arxiv] [Paper Link]

Evaluation Tools

[Performance measurement in blind audio sourceseparation, Emmanuel Vincent et al., TASLP 2004] [Paper] [Tool Link]
[SDR – Half-baked or Well Done?, Jonathan Le Roux, ICASSP 2019] [Paper] [Tool Link]

Results on WSJ0-2mix

Speech separation (SS) and speaker extraction (SE) on the WSJ0-2mix (8k, min) dataset.

Task	Methods	Model Size	SDRi	SI-SDRi
SS	DPCL++	13.6M	-	10.8
SS	uPIT-BLSTM-ST	92.7M	10.0	-
SS	DANet	9.1M	-	10.5
SS	cuPIT-Grid-RD	53.2M	10.2	-
SS	SDC-G-MTL	53.9M	10.5	-
SS	CBLDNN-GAT	39.5M	11.0	-
SS	Chimera++	32.9M	12.0	11.5
SS	WA-MISI-5	32.9M	13.1	12.6
SS	BLSTM-TasNet	23.6M	13.6	13.2
SS	Conv-TasNet	5.1M	15.6	15.3
SE	SpEx	10.8M	17.0	16.6
SE	SpEx+	11.1M	17.6	17.4
SS	DeepCASA	12.8M	18.0	17.7
SS	FurcaNeXt	51.4M	18.4	-
SS	DPRNN-TasNet	2.6M	19.0	18.8
SS	Wavesplit	-	19.2	19.0
SS	Wavesplit + Dynamic mixing	-	20.6	20.4

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 151

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (2) 🔗

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

gemengtju / Tutorial_separation

Programming Languages

Labels

Projects that are alternatives of or similar to Tutorial separation

Speech Separation and Extraction via Deep Learning

Table of Contents

Tutorials

Datasets

Papers

Speech Separation based on Brain Studies

Pure Speech Separation

Multi-Model Speech Separation

Multi-channel Speech Separation

Speaker Extraction

Tools

System Tools

Evaluation Tools

Results on WSJ0-2mix