All Projects → seanwood → Gcc Nmf

seanwood / Gcc Nmf

Licence: mit
Real-time GCC-NMF Blind Speech Separation and Enhancement

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Gcc Nmf

LIUM
Scripts for LIUM SpkDiarization tools
Stars: ✭ 28 (-87.88%)
Mutual labels:  speech, speech-processing
Linux
XanMod: Linux kernel source code tree
Stars: ✭ 310 (+34.2%)
Mutual labels:  real-time, low-latency
Restoring-Extremely-Dark-Images-In-Real-Time
The project is the official implementation of our CVPR 2021 paper, "Restoring Extremely Dark Images in Real Time"
Stars: ✭ 79 (-65.8%)
Mutual labels:  real-time, low-latency
Neural Voice Cloning With Few Samples
Implementation of Neural Voice Cloning with Few Samples Research Paper by Baidu
Stars: ✭ 211 (-8.66%)
Mutual labels:  speech, speech-processing
A Convolutional Recurrent Neural Network For Real Time Speech Enhancement
A minimum unofficial implementation of the "A Convolutional Recurrent Neural Network for Real-Time Speech Enhancement" (CRN) using PyTorch
Stars: ✭ 123 (-46.75%)
Mutual labels:  speech-processing, real-time
ttslearn
ttslearn: Library for Pythonで学ぶ音声合成 (Text-to-speech with Python)
Stars: ✭ 158 (-31.6%)
Mutual labels:  speech, speech-processing
Pysptk
A python wrapper for Speech Signal Processing Toolkit (SPTK).
Stars: ✭ 297 (+28.57%)
Mutual labels:  speech, speech-processing
deadsfu
Dead-simple WebRTC broadcasting. From the browser, or your application. Cloud-native and scalable.
Stars: ✭ 23 (-90.04%)
Mutual labels:  real-time, low-latency
Tfg Voice Conversion
Deep Learning-based Voice Conversion system
Stars: ✭ 115 (-50.22%)
Mutual labels:  speech, speech-processing
Xpedite
A non-sampling profiler purpose built to measure and optimize performance of ultra low latency/real time systems
Stars: ✭ 89 (-61.47%)
Mutual labels:  real-time, low-latency
UniSpeech
UniSpeech - Large Scale Self-Supervised Learning for Speech
Stars: ✭ 224 (-3.03%)
Mutual labels:  speech, speech-processing
React Native Dialogflow
A React-Native Bridge for the Google Dialogflow (API.AI) SDK
Stars: ✭ 182 (-21.21%)
Mutual labels:  speech, speech-processing
Werk
High-throughput / low-latency C++ application framework
Stars: ✭ 30 (-87.01%)
Mutual labels:  real-time, low-latency
python-rtmixer
🎤 Reliable low-latency audio playback and recording with Python 🐍
Stars: ✭ 44 (-80.95%)
Mutual labels:  real-time, low-latency
ripple
Simple shared surface streaming application
Stars: ✭ 17 (-92.64%)
Mutual labels:  real-time, low-latency
hifigan-denoiser
HiFi-GAN: High Fidelity Denoising and Dereverberation Based on Speech Deep Features in Adversarial Networks
Stars: ✭ 88 (-61.9%)
Mutual labels:  speech, speech-processing
IMS-Toucan
Text-to-Speech Toolkit of the Speech and Language Technologies Group at the University of Stuttgart. Objectives of the development are simplicity, modularity, controllability and multilinguality.
Stars: ✭ 295 (+27.71%)
Mutual labels:  speech, speech-processing
Shifter
Pitch shifter using WSOLA and resampling implemented by Python3
Stars: ✭ 22 (-90.48%)
Mutual labels:  speech, speech-processing
Speech Denoising Wavenet
A neural network for end-to-end speech denoising
Stars: ✭ 516 (+123.38%)
Mutual labels:  speech, speech-processing
Wavenet vocoder
WaveNet vocoder
Stars: ✭ 1,926 (+733.77%)
Mutual labels:  speech, speech-processing

GCC-NMF

GCC-NMF is a blind source separation and denoising algorithm that combines the GCC spatial localization method with the NMF unsupervised dictionary learning algorithm. GCC-NMF has been used for stereo speech separation and enhancement in both offline and real-time settings. Though we have focused on speech applications so far, GCC-NMF is a generic source separation and denoising algorithm and may well be applicable to other types of signals.

This GitHub repository provides:

  1. A standalone Python executable to execute and visualize GCC-NMF in real-time.

  2. A series of iPython notebooks notebooks presenting GCC-NMF in tutorial style, building towards the low latency, real-time context:

Journal Papers

Conference Papers

Real-time Speech Enhancement: RT-GCC-NMF

The Real-time Speech Enhancement standalone Python executable is an implementation of the RT-GCC-NMF real-time speech enhancement algorithm. Users may interactively modify system parameters including the NMF dictionary size and GCC-NMF masking function parameters, where the effects on speech enhancement quality may be heard in real-time.

png

Offline Speech Separation

The Offline Speech Separation iPython notebook shows how GCC-NMF can be used to separate multiple concurrent speakers in an offline fashion. The NMF dictionary is first learned directly from the mixture signal, and sources are subsequently separated by attributing each atom at each time to a single source based on the dictionary atoms' estimated time delay of arrival (TDOA). Source localization is achieved with GCC-PHAT.

png

Offline Speech Enhancement

The Offline Speech Enhancement iPython notebook demonstrates how GCC-NMF can can be used for offline speech enhancement, where instead of multiple speakers, we have a single speaker plus noise. In this case, individual atoms are attributed either to the speaker or to noise at each point in time base on the the atom TDOAs as above. The target speaker is again localized with GCC-PHAT.

png

Online Speech Enhancement

The Online Speech Enhancement iPython notebook demonstrates an online variant of GCC-NMF that works in a frame-by-frame fashion to perform speech enhancement in real-time. Here, the NMF dictionary is pre-learned from a different dataset than used at test time, NMF coefficients are inferred frame-by-frame, and speaker localization is performed with an accumulated GCC-PHAT method.

png

Low Latency Speech Enhancement

In the Low Latency Speech Enhancement iPython notebook we extend the online GCC-NMF approach to reduce algorithmic latency via asymmetric STFT windowing strategy. Long analysis windows maintain the high spectral resolution required by GCC-NMF, while short synthesis windows drastically reduce algorithmic latency with little effect on speech enhancement quality. Algorithmic latency can be reduced from over 64 ms using traditional symmetric STFT windowing to below 2 ms with the proposed asymmetric STFT windowing, provided sufficient computational power is available.

png

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].