All Projects → gemengtju → Tutorial_separation

gemengtju / Tutorial_separation

This repo summarizes the tutorials, datasets, papers, codes and tools for speech separation and speaker extraction task. You are kindly invited to pull requests.

Programming Languages

matlab
3953 projects

Projects that are alternatives of or similar to Tutorial separation

Awesome Speech Enhancement
A tutorial for Speech Enhancement researchers and practitioners. The purpose of this repo is to organize the world’s resources for speech enhancement and make them universally accessible and useful.
Stars: ✭ 257 (+70.2%)
Mutual labels:  deep-neural-networks, speech-processing, signal-processing
bob
Bob is a free signal-processing and machine learning toolbox originally developed by the Biometrics group at Idiap Research Institute, in Switzerland. - Mirrored from https://gitlab.idiap.ch/bob/bob
Stars: ✭ 38 (-74.83%)
Mutual labels:  signal-processing, speech-processing
Shifter
Pitch shifter using WSOLA and resampling implemented by Python3
Stars: ✭ 22 (-85.43%)
Mutual labels:  signal-processing, speech-processing
spafe
🔉 spafe: Simplified Python Audio Features Extraction
Stars: ✭ 310 (+105.3%)
Mutual labels:  signal-processing, speech-processing
Gcommandspytorch
ConvNets for Audio Recognition using Google Commands Dataset
Stars: ✭ 65 (-56.95%)
Mutual labels:  deep-neural-networks, speech-processing
pyssp
python speech signal processing library
Stars: ✭ 18 (-88.08%)
Mutual labels:  signal-processing, speech-processing
torchsubband
Pytorch implementation of subband decomposition
Stars: ✭ 63 (-58.28%)
Mutual labels:  signal-processing, speech-processing
Surfboard
Novoic's audio feature extraction library
Stars: ✭ 318 (+110.6%)
Mutual labels:  speech-processing, signal-processing
Sincnet
SincNet is a neural architecture for efficiently processing raw audio samples.
Stars: ✭ 764 (+405.96%)
Mutual labels:  speech-processing, signal-processing
Tfg Voice Conversion
Deep Learning-based Voice Conversion system
Stars: ✭ 115 (-23.84%)
Mutual labels:  deep-neural-networks, speech-processing
Python Pesq
PESQ (Perceptual Evaluation of Speech Quality) Wrapper for Python Users (narrow band and wide band)
Stars: ✭ 144 (-4.64%)
Mutual labels:  signal-processing
Multihead Siamese Nets
Implementation of Siamese Neural Networks built upon multihead attention mechanism for text semantic similarity task.
Stars: ✭ 144 (-4.64%)
Mutual labels:  deep-neural-networks
Djl
An Engine-Agnostic Deep Learning Framework in Java
Stars: ✭ 2,262 (+1398.01%)
Mutual labels:  deep-neural-networks
Arc Pytorch
The first public PyTorch implementation of Attentive Recurrent Comparators
Stars: ✭ 147 (-2.65%)
Mutual labels:  deep-neural-networks
Wavenet vocoder
WaveNet vocoder
Stars: ✭ 1,926 (+1175.5%)
Mutual labels:  speech-processing
Pycwt
A Python module for continuous wavelet spectral analysis. It includes a collection of routines for wavelet transform and statistical analysis via FFT algorithm. In addition, the module also includes cross-wavelet transforms, wavelet coherence tests and sample scripts.
Stars: ✭ 146 (-3.31%)
Mutual labels:  signal-processing
Livianet
This repository contains the code of LiviaNET, a 3D fully convolutional neural network that was employed in our work: "3D fully convolutional networks for subcortical segmentation in MRI: A large-scale study"
Stars: ✭ 143 (-5.3%)
Mutual labels:  deep-neural-networks
Shainet
SHAInet - a pure Crystal machine learning library
Stars: ✭ 143 (-5.3%)
Mutual labels:  deep-neural-networks
Deep Learning Papers Reading Roadmap
深度学习论文阅读路线图
Stars: ✭ 142 (-5.96%)
Mutual labels:  deep-neural-networks
Densenet Sdr
repo that holds code for improving on dropout using Stochastic Delta Rule
Stars: ✭ 148 (-1.99%)
Mutual labels:  deep-neural-networks

Speech Separation and Extraction via Deep Learning

This repo summarizes the tutorials, datasets, papers, codes and tools for speech separation and speaker extraction task. You are kindly invited to pull requests.

Table of Contents

Tutorials

Datasets

Papers

Speech Separation based on Brain Studies

  • [Attentional Selection in a Cocktail Party Environment Can Be Decoded from Single-Trial EEG, James, Cerebral Cortex 2012] [Paper]

  • [Selective cortical representation of attended speaker in multi-talker speech perception, Nima Mesgarani, Nature 2012] [Paper]

  • [Neural decoding of attentional selection in multi-speaker environments without access to clean sources, James, Journal of Neural Engineering 2017] [Paper]

  • [Speech synthesis from neural decoding of spoken sentences, Gopala K. Anumanchipalli, Nature 2019] [Paper]

  • [Towards reconstructing intelligible speech from the human auditory cortex, HassanAkbari, Scientific Reports 2019] [Paper] [Code]

Pure Speech Separation

  • [Joint Optimization of Masks and Deep Recurrent Neural Networks for Monaural Source Separation, Po-Sen Huang, TASLP 2015] [Paper] [Code (posenhuang)]

  • [Complex Ratio Masking for Monaural Speech Separation, DS Williamson, TASLP 2015] [Paper]

  • [Deep clustering: Discriminative embeddings for segmentation and separation, JR Hershey, ICASSP 2016] [Paper] [Code (Kai Li)] [Code (Jian Wu)] [Code (asteroid)]

  • [Single-channel multi-speaker separation using deep clustering, Y Isik, Interspeech 2016] [Paper] [Code (Kai Li)] [Code (Jian Wu)]

  • [Permutation invariant training of deep models for speaker-independent multi-talker speech separation, Dong Yu, ICASSP 2017] [Paper] [Code (Kai Li)] [Code (Sining Sun)]

  • [Recognizing Multi-talker Speech with Permutation Invariant Training, Dong Yu, ICASSP 2017] [Paper]

  • [Multitalker speech separation with utterance-level permutation invariant training of deep recurrent neural networks, M Kolbæk, TASLP 2017] [Paper] [Code (Kai Li)]

  • [Deep attractor network for single-microphone speaker separation, Zhuo Chen, ICASSP 2017] [Paper] [Code (Kai Li)]

  • [Alternative Objective Functions for Deep Clustering, Zhong-Qiu Wang, ICASSP 2018] [Paper]

  • [Listen, Think and Listen Again: Capturing Top-down Auditory Attention for Speaker-independent Speech Separation, Jing Shi, IJCAI 2018] [Paper]

  • [End-to-End Speech Separation with Unfolded Iterative Phase Reconstructioni, Zhong-Qiu Wang et al. 2018] [Paper]

  • [Modeling Attention and Memory for Auditory Selection in a Cocktail Party Environment, Jiaming Xu, AAAI 2018] [Paper] [Code]

  • [Speaker-independent Speech Separation with Deep Attractor Network, Luo Yi, TASLP 2018] [Paper] [Code (Kai Li)]

  • [Listening to Each Speaker One by One with Recurrent Selective Hearing Networks, Keisuke Kinoshita, ICASSP 2018] [Paper]

  • [Tasnet: time-domain audio separation network for real-time, single-channel speech separation, Luo Yi, ICASSP 2018] [Paper] [Code (Kai Li)] [Code (asteroid)]

  • [Conv-TasNet: Surpassing Ideal Time-Frequency Magnitude Masking for Speech Separation, Luo Yi, TASLP 2019] [Paper] [Code (Kai Li)] [Code (asteroid)]

  • [Divide and Conquer: A Deep CASA Approach to Talker-independent Monaural Speaker Separation, Yuzhou Liu, TASLP 2019] [Paper] [Code] [Code]

  • [Improved Speech Separation with Time-and-Frequency Cross-domain Joint Embedding and Clustering, Gene-Ping Yang, Interspeech 2019] [Paper] [Code]

  • [Dual-path RNN: efficient long sequence modeling for time-domain single-channel speech separation, Luo Yi, Arxiv 2019] [Paper] [Code (Kai Li)]

  • [A comprehensive study of speech separation: spectrogram vs waveform separation, Fahimeh Bahmaninezhad, Interspeech 2019] [Paper]

  • [Discriminative Learning for Monaural Speech Separation Using Deep Embedding Features, Cunhang Fan, Interspeech 2019] [Paper]

  • [Interrupted and cascaded permutation invariant training for speech separation, Gene-Ping Yang, ICASSP, 2020][Paper]

  • [FurcaNeXt: End-to-end monaural speech separation with dynamic gated dilated temporal convolutional networks, Liwen Zhang, MMM 2020] [Paper]

  • [Filterbank design for end-to-end speech separation, Manuel Pariente et al., ICASSP 2020] [Paper]

  • [Voice Separation with an Unknown Number of Multiple Speakers, Eliya Nachmani, Arxiv 2020] [Paper] [Demo]

  • [AN EMPIRICAL STUDY OF CONV-TASNET, Berkan Kadıoglu , Arxiv 2020] [Paper] [Code]

  • [Voice Separation with an Unknown Number of Multiple Speakers, Eliya Nachmani, Arxiv 2020] [Paper]

  • [Wavesplit: End-to-End Speech Separation by Speaker Clustering, Neil Zeghidour et al. Arxiv 2020 ] [Paper]

  • [La Furca: Iterative Context-Aware End-to-End Monaural Speech Separation Based on Dual-Path Deep Parallel Inter-Intra Bi-LSTM with Attention, Ziqiang Shi, Arxiv 2020] [Paper]

  • [Deep Attention Fusion Feature for Speech Separation with End-to-End Post-filter Method, Cunhang Fan, Arxiv 2020] [Paper]

  • [Identify Speakers in Cocktail Parties with End-to-End Attention, Junzhe Zhu, Arxiv 2018] [Paper] [Code]

  • [Sequence to Multi-Sequence Learning via Conditional Chain Mapping for Mixture Signals, Jing Shi, Arxiv 2020] [Paper] [Code/Demo]

  • [Speaker-Conditional Chain Model for Speech Separation and Extraction, Jing Shi, Arxiv 2020] [Paper] [Code/Demo]

  • [Improving Voice Separation by Incorporating End-to-end Speech Recognition, Naoya Takahashi, ICASSP 2020] [Paper] [Code]

  • [A Multi-Phase Gammatone Filterbank for Speech Separation via TasNet, David Ditter, ICASSP 2020] [Paper] [Code]

  • [Two-Step Sound Source Separation: Training on Learned Latent Targets, Efthymios Tzinis, ICASSP 2020] [Paper] [Code (Asteroid)] [Code (Tzinis)]

  • [Unsupervised Sound Separation Using Mixtures of Mixtures, Scott Wisdom, Arxiv] [Paper]

  • [Speech Separation Based on Multi-Stage Elaborated Dual-Path Deep BiLSTM with Auxiliary Identity Loss, Ziqiang Shi, 2020] [Paper]

Multi-Model Speech Separation

  • [Deep Audio-Visual Learning: A Survey, Hao Zhu, Arxiv 2020] [Paper]

  • [Audio-Visual Speech Enhancement Using Multimodal Deep Convolutional Neural Networks, Jen-Cheng Hou, TETCI 2017] [Paper] [Code]

  • [The Sound of Pixels, Hang Zhao, ECCV 2018] [Paper/Demo]

  • [Learning to Separate Object Sounds by Watching Unlabeled Video, Ruohan Gao, ECCV 2018] [Paper]

  • [The Conversation: Deep Audio-Visual Speech Enhancement, Triantafyllos Afouras, Interspeech 2018] [Paper]

  • [End-to-end audiovisual speech recognition, Stavros Petridis, ICASSP 2018] [Paper] [Code]

  • [The Sound of Pixels, Hang Zhao, ECCV 2018] [Paper] [Code]

  • [Looking to Listen at the Cocktail Party: A Speaker-Independent Audio-Visual Model for Speech Separation, ARIEL EPHRAT, ACM Transactions on Graphics 2018] [Paper] [Code]

  • [Learning to Separate Object Sounds by Watching Unlabeled Video, Ruohan Gao, ECCV 2018] [Paper]

  • [Time domain audio visual speech separation, Jian Wu, Arxiv 2019] [Paper]

  • [Co-Separating Sounds of Visual Objects, Ruohan Gao, ICCV 2019] [Paper]

  • [Recursive Visual Sound Separation Using Minus-Plus Net, Xudong Xu, ICCV 2019] [Paper]

  • [The Sound of Motions, Hang Zhao, ICCV 2019] [Paper]

  • [Audio-Visual Speech Separation and Dereverberation with a Two-Stage Multimodal Network, Ke Tan, Arxiv 2019] [Paper]

  • [Co-Separating Sounds of Visual Objects, Ruohan Gao, ICCV 2019] [Paper] [Code]

  • [Face Landmark-based Speaker-Independent Audio-Visual Speech Enhancement in Multi-Talker Environments, Giovanni Morrone, Arxiv 2019] [Paper] [Code]

  • [Music Gesture for Visual Sound Separation, Chuang Gao, CVPR 2020] [Paper]

  • [FaceFilter: Audio-visual speech separation using still images, Soo-Whan Chung, Arxiv 2020] [Paper]

  • [Awesome Audio-Visual, Github, Kranti Kumar Parida] [Github Link]

Multi-channel Speech Separation

  • [FaSNet: Low-latency Adaptive Beamforming for Multi-microphone Audio Processing, Yi Luo , Arxiv 2019] [Paper]

  • [MIMO-SPEECH: End-to-End Multi-Channel Multi-Speaker Speech Recognition, Xuankai Chang et al., ASRU 2020] [Paper]

  • [End-to-end Microphone Permutation and Number Invariant Multi-channel Speech Separation, Yi Luo et al., ICASSP 2020] [Paper] [Code]

  • [Enhancing End-to-End Multi-channel Speech Separation via Spatial Feature Learning, Rongzhi Guo, ICASSP 2020] [Paper]

  • [Multi-modal Multi-channel Target Speech Separation, Rongzhi Guo, J-STSP 2020] [Paper]

Speaker Extraction

  • [Single channel target speaker extraction and recognition with speaker beam, Marc Delcroix, ICASSP 2018] [Paper]

  • [VoiceFilter: Targeted Voice Separation by Speaker-Conditioned Spectrogram Masking, Quan Wang, INTERSPEECH 2018] [Paper] [Code (Jian Wu)]

  • [Single-Channel Speech Extraction Using Speaker Inventory and Attention Network, Xiong Xiao et al, ICASSP 2019] [Paper]

  • [Optimization of Speaker Extraction Neural Network with Magnitude and Temporal Spectrum Approximation Loss, Chenglin Xu, ICASSP 2019] [Paper] [Code]

  • [Time-domain speaker extraction network, Chenglin Xu, ASRU 2019] [Paper]

  • [SpEx: Multi-Scale Time Domain Speaker Extraction Network, Chenglin Xu, TASLP 2020] [Paper]

  • [Improving speaker discrimination of target speech extraction with time-domain SpeakerBeam, Marc Delcroix, ICASSP 2020] [Paper]

  • [SpEx+: A Complete Time Domain Speaker Extraction Network, Meng Ge, Arxiv 2020] [Paper] [Code]

Tools

System Tools

  • [Asteroid: the PyTorch-based audio source separation toolkit for researchers, Manuel Pariente et al., ICASSP 2020] [Tool Link]
  • [ESPnet-se: end-to-end speech enhancement and separation toolkit designed for asr integration, Chenda Li et al., arxiv] [Paper Link]

Evaluation Tools

  • [Performance measurement in blind audio sourceseparation, Emmanuel Vincent et al., TASLP 2004] [Paper] [Tool Link]

  • [SDR – Half-baked or Well Done?, Jonathan Le Roux, ICASSP 2019] [Paper] [Tool Link]

Results on WSJ0-2mix

Speech separation (SS) and speaker extraction (SE) on the WSJ0-2mix (8k, min) dataset.

Task Methods Model Size SDRi SI-SDRi
SS DPCL++ 13.6M - 10.8
SS uPIT-BLSTM-ST 92.7M 10.0 -
SS DANet 9.1M - 10.5
SS cuPIT-Grid-RD 53.2M 10.2 -
SS SDC-G-MTL 53.9M 10.5 -
SS CBLDNN-GAT 39.5M 11.0 -
SS Chimera++ 32.9M 12.0 11.5
SS WA-MISI-5 32.9M 13.1 12.6
SS BLSTM-TasNet 23.6M 13.6 13.2
SS Conv-TasNet 5.1M 15.6 15.3
SE SpEx 10.8M 17.0 16.6
SE SpEx+ 11.1M 17.6 17.4
SS DeepCASA 12.8M 18.0 17.7
SS FurcaNeXt 51.4M 18.4 -
SS DPRNN-TasNet 2.6M 19.0 18.8
SS Wavesplit - 19.2 19.0
SS Wavesplit + Dynamic mixing - 20.6 20.4
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].