All Projects → jtkim-kaist → Speech Enhancement

jtkim-kaist / Speech Enhancement

Licence: gpl-2.0
Deep neural network based speech enhancement toolkit

Programming Languages

matlab
3953 projects

Projects that are alternatives of or similar to Speech Enhancement

Fullsubnet
PyTorch implementation of "A Full-Band and Sub-Band Fusion Model for Real-Time Single-Channel Speech Enhancement."
Stars: ✭ 51 (-69.46%)
Mutual labels:  speech-processing
Tf Kaldi Speaker
Neural speaker recognition/verification system based on Kaldi and Tensorflow
Stars: ✭ 117 (-29.94%)
Mutual labels:  speech-processing
Zzz Retired openstt
RETIRED - OpenSTT is now retired. If you would like more information on Mycroft AI's open source STT projects, please visit:
Stars: ✭ 146 (-12.57%)
Mutual labels:  speech-processing
Gcommandspytorch
ConvNets for Audio Recognition using Google Commands Dataset
Stars: ✭ 65 (-61.08%)
Mutual labels:  speech-processing
Wave U Net For Speech Enhancement
Implement Wave-U-Net by PyTorch, and migrate it to the speech enhancement.
Stars: ✭ 106 (-36.53%)
Mutual labels:  speech-processing
Deepvoice3 pytorch
PyTorch implementation of convolutional neural networks-based text-to-speech synthesis models
Stars: ✭ 1,654 (+890.42%)
Mutual labels:  speech-processing
Formant Analyzer
iOS application for finding formants in spoken sounds
Stars: ✭ 43 (-74.25%)
Mutual labels:  speech-processing
Speech signal processing and classification
Front-end speech processing aims at extracting proper features from short- term segments of a speech utterance, known as frames. It is a pre-requisite step toward any pattern recognition problem employing speech or audio (e.g., music). Here, we are interesting in voice disorder classification. That is, to develop two-class classifiers, which can discriminate between utterances of a subject suffering from say vocal fold paralysis and utterances of a healthy subject.The mathematical modeling of the speech production system in humans suggests that an all-pole system function is justified [1-3]. As a consequence, linear prediction coefficients (LPCs) constitute a first choice for modeling the magnitute of the short-term spectrum of speech. LPC-derived cepstral coefficients are guaranteed to discriminate between the system (e.g., vocal tract) contribution and that of the excitation. Taking into account the characteristics of the human ear, the mel-frequency cepstral coefficients (MFCCs) emerged as descriptive features of the speech spectral envelope. Similarly to MFCCs, the perceptual linear prediction coefficients (PLPs) could also be derived. The aforementioned sort of speaking tradi- tional features will be tested against agnostic-features extracted by convolu- tive neural networks (CNNs) (e.g., auto-encoders) [4]. The pattern recognition step will be based on Gaussian Mixture Model based classifiers,K-nearest neighbor classifiers, Bayes classifiers, as well as Deep Neural Networks. The Massachussets Eye and Ear Infirmary Dataset (MEEI-Dataset) [5] will be exploited. At the application level, a library for feature extraction and classification in Python will be developed. Credible publicly available resources will be 1used toward achieving our goal, such as KALDI. Comparisons will be made against [6-8].
Stars: ✭ 155 (-7.19%)
Mutual labels:  speech-processing
Tfg Voice Conversion
Deep Learning-based Voice Conversion system
Stars: ✭ 115 (-31.14%)
Mutual labels:  speech-processing
Wavenet vocoder
WaveNet vocoder
Stars: ✭ 1,926 (+1053.29%)
Mutual labels:  speech-processing
Sptk
A modified version of Speech Signal Processing Toolkit (SPTK)
Stars: ✭ 71 (-57.49%)
Mutual labels:  speech-processing
Pytorch Kaldi Neural Speaker Embeddings
A light weight neural speaker embeddings extraction based on Kaldi and PyTorch.
Stars: ✭ 99 (-40.72%)
Mutual labels:  speech-processing
A Convolutional Recurrent Neural Network For Real Time Speech Enhancement
A minimum unofficial implementation of the "A Convolutional Recurrent Neural Network for Real-Time Speech Enhancement" (CRN) using PyTorch
Stars: ✭ 123 (-26.35%)
Mutual labels:  speech-processing
Dnc
Discriminative Neural Clustering for Speaker Diarisation
Stars: ✭ 60 (-64.07%)
Mutual labels:  speech-processing
Dtln
Tensorflow 2.x implementation of the DTLN real time speech denoising model. With TF-lite, ONNX and real-time audio processing support.
Stars: ✭ 147 (-11.98%)
Mutual labels:  speech-processing
Keras Sincnet
Keras (tensorflow) implementation of SincNet (Mirco Ravanelli, Yoshua Bengio - https://github.com/mravanelli/SincNet)
Stars: ✭ 47 (-71.86%)
Mutual labels:  speech-processing
Nonautoreggenprogress
Tracking the progress in non-autoregressive generation (translation, transcription, etc.)
Stars: ✭ 118 (-29.34%)
Mutual labels:  speech-processing
Vocgan
VocGAN: A High-Fidelity Real-time Vocoder with a Hierarchically-nested Adversarial Network
Stars: ✭ 158 (-5.39%)
Mutual labels:  speech-processing
Tutorial separation
This repo summarizes the tutorials, datasets, papers, codes and tools for speech separation and speaker extraction task. You are kindly invited to pull requests.
Stars: ✭ 151 (-9.58%)
Mutual labels:  speech-processing
Pb bss
Collection of EM algorithms for blind source separation of audio signals
Stars: ✭ 127 (-23.95%)
Mutual labels:  speech-processing

Speech Enhancement Toolkit

This toolkit is the implemention of following paper:

J. Kim and M. Hahn, "Speech Enhancement Using a Two-Stage Network for an Efficient Boosting Strategy," in IEEE Signal Processing Letters. doi: 10.1109/LSP.2019.2905660

URL: https://ieeexplore.ieee.org/document/8668449

The speech enhancement (SE) removes the noise signal from the noisy speech signal.

Now, the SE in this toolkit is based on the deep neural network (DNN). And the proposed model will be uploaded.

We hope that this toolkit will contribute as the baselines for SE research area.

This toolkit provides as follows:

  • The data generator script for building the noisy training and test dataset from the speech and noise dataset. (MATLAB)

  • The training and test script. (python3)

Prerequisites

Setup

  1. Install aformentioned prerequistes.

  2. Open the MATLAB and add the directories ./SE and ./Datamake including their sub-directories.

  3. Install matlab.engine

cd "matlabroot/extern/engines/python"
python3 setup.py install

Gererate the training and test data

  1. Prepare the speech and noise data. In general, the TIMIT corpus is used for the speech data. And, the noise data can be found in Hu's corpus, USTC's corpus and NOISEX-92.

  2. project_directory(prj_dir)/Datamake/make_train_noisy.m will make the training set from your data. This code sequentially load the clean speech and synthesize the noisy speech with randomly selected SNR. Here, the type of noise is randomly selected from your training noise dataset. To reduce the file number, this code concatenate all generated noisy speech. Therefore, if your RAM is not enough, you should modify the code. All generated data will be written in '.raw' format with 'int16' datatype.

  3. project_directory(prj_dir)/Datamake/make_test_noisy.m will make the test set from your data. This code sequentially load the clean speech and synthesize the noisy speech with desired SNR. Here, the code use all types of noises in the test noise dataset when synthesize the noisy speech. All generated data will be written in '.raw' format with 'int16' datatype.

Usage of make_train_noisy.m

Before run the code, move your training speech and noise dataset by referring the below code.

% prj_dir/Datamake/make_train_noisy.m
timit_list = dirPlus('./speech/TIMIT/TRAIN', 'FileFilter', '\.(wav|WAV)$');

hu_list = dirPlus('./noise/Nonspeech', 'FileFilter', '\.(wav|WAV)$');
ad_list = dirPlus('./noise/noise-15', 'FileFilter', '\.(wav|WAV)$');

Options

  • You can set the SNRs for noisy speech by adjusting snr_list.
  • You can make more fluent data by adjusiting aug.

Results

The generated dataset will be saved in prj_dir/SE/data/train/noisy and prj_dir/SE/data/train/clean.

Usage of make_test_noisy.m

Before run the code, move your test speech and noise dataset by referring the below code.

% prj_dir/Datamake/make_test_noisy.m
timit_list = dirPlus('./speech/timit_coretest', 'FileFilter', '\.(wav|WAV)$');
noise_list = dirPlus('./noise/NOISEX-92_16000');

Options

  • You can set the SNRs for noisy speech by adjusting snr_list.

Results

The generated dataset will be saved in prj_dir/SE/data/test/noisy and prj_dir/SE/data/test/clean.

Validation dataset

To run the code, the validation set is needed. I used to randomly select about 50 noisy utterances with corresponding clean utterances from test set then, move these to prj_dir/SE/data/valid/noisy and prj_dir/SE/data/valid/clean.

Gererate the normalize factor

This code conduct Z-score normalization to the input features, so that some normalization factor from training dataset is needed.

To get the normalization factor, just run the prj_dir/SE/get_norm.py

Options

  • You can use the multiple core by adjusting distribution_num.

Results

The generated normalization factor will be saved in prj_dir/SE/data/train/norm.

Training

Just run the prj_dir/SE/main.py

Model

You can check the training model in prj_dir/SE/lib/trnmodel.py

Configuration

You can check the training configuration in prj_dir/SE/lib/config.py

Tensorboard

While training, you can use the tensorboard for monitoring the training procedure.

tensorboard --logdir='prj_dir/SE/logs_dir/your log directory'

This toolkit supports followings:

  • PESQ, STOI, LSD, SSNR (Objective measure).

alt tag

  • Clean, noisy, and enhanced spectrogram.

alt tag

  • Clean, noisy and enhanced wavs.

alt tag

  • Configuration

alt tag

Reference

[1] Xu, Yong, et al. "A regression approach to speech enhancement based on deep neural networks." IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP) 23.1 (2015): 7-19.

[2] Brookes, Mike. (2011). Voicebox: Speech Processing Toolbox for Matlab.

[3] Jacob, SoundZone_Tools, (2017), GitHub repository, https://github.com/JacobD10/SoundZone_Tools

[4] Loizou, P.C.: "Speech enhancement: theory and practice", (CRC press, 2013), pp. 83−84

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].