Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → jtkim-kaist → Speech Enhancement

jtkim-kaist / Speech Enhancement

Licence: gpl-2.0

Deep neural network based speech enhancement toolkit

Programming Languages

3953 projects

Labels

speech-processing

Projects that are alternatives of or similar to Speech Enhancement

PyTorch implementation of "A Full-Band and Sub-Band Fusion Model for Real-Time Single-Channel Speech Enhancement."

Stars: ✭ 51 (-69.46%)

Mutual labels: speech-processing

Tf Kaldi Speaker

Neural speaker recognition/verification system based on Kaldi and Tensorflow

Stars: ✭ 117 (-29.94%)

Mutual labels: speech-processing

Zzz Retired openstt

RETIRED - OpenSTT is now retired. If you would like more information on Mycroft AI's open source STT projects, please visit:

Stars: ✭ 146 (-12.57%)

Mutual labels: speech-processing

Gcommandspytorch

ConvNets for Audio Recognition using Google Commands Dataset

Stars: ✭ 65 (-61.08%)

Mutual labels: speech-processing

Wave U Net For Speech Enhancement

Implement Wave-U-Net by PyTorch, and migrate it to the speech enhancement.

Stars: ✭ 106 (-36.53%)

Mutual labels: speech-processing

Deepvoice3 pytorch

PyTorch implementation of convolutional neural networks-based text-to-speech synthesis models

Stars: ✭ 1,654 (+890.42%)

Mutual labels: speech-processing

Formant Analyzer

iOS application for finding formants in spoken sounds

Stars: ✭ 43 (-74.25%)

Mutual labels: speech-processing

Speech signal processing and classification

Front-end speech processing aims at extracting proper features from short- term segments of a speech utterance, known as frames. It is a pre-requisite step toward any pattern recognition problem employing speech or audio (e.g., music). Here, we are interesting in voice disorder classification. That is, to develop two-class classifiers, which can discriminate between utterances of a subject suffering from say vocal fold paralysis and utterances of a healthy subject.The mathematical modeling of the speech production system in humans suggests that an all-pole system function is justified [1-3]. As a consequence, linear prediction coefficients (LPCs) constitute a first choice for modeling the magnitute of the short-term spectrum of speech. LPC-derived cepstral coefficients are guaranteed to discriminate between the system (e.g., vocal tract) contribution and that of the excitation. Taking into account the characteristics of the human ear, the mel-frequency cepstral coefficients (MFCCs) emerged as descriptive features of the speech spectral envelope. Similarly to MFCCs, the perceptual linear prediction coefficients (PLPs) could also be derived. The aforementioned sort of speaking tradi- tional features will be tested against agnostic-features extracted by convolu- tive neural networks (CNNs) (e.g., auto-encoders) [4]. The pattern recognition step will be based on Gaussian Mixture Model based classifiers,K-nearest neighbor classifiers, Bayes classifiers, as well as Deep Neural Networks. The Massachussets Eye and Ear Infirmary Dataset (MEEI-Dataset) [5] will be exploited. At the application level, a library for feature extraction and classification in Python will be developed. Credible publicly available resources will be 1used toward achieving our goal, such as KALDI. Comparisons will be made against [6-8].

Stars: ✭ 155 (-7.19%)

Mutual labels: speech-processing

Tfg Voice Conversion

Deep Learning-based Voice Conversion system

Stars: ✭ 115 (-31.14%)

Mutual labels: speech-processing

Wavenet vocoder

WaveNet vocoder

Stars: ✭ 1,926 (+1053.29%)

Mutual labels: speech-processing

A modified version of Speech Signal Processing Toolkit (SPTK)

Stars: ✭ 71 (-57.49%)

Mutual labels: speech-processing

Pytorch Kaldi Neural Speaker Embeddings

A light weight neural speaker embeddings extraction based on Kaldi and PyTorch.

Stars: ✭ 99 (-40.72%)

Mutual labels: speech-processing

A Convolutional Recurrent Neural Network For Real Time Speech Enhancement

A minimum unofficial implementation of the "A Convolutional Recurrent Neural Network for Real-Time Speech Enhancement" (CRN) using PyTorch

Stars: ✭ 123 (-26.35%)

Mutual labels: speech-processing

Discriminative Neural Clustering for Speaker Diarisation

Stars: ✭ 60 (-64.07%)

Mutual labels: speech-processing

Tensorflow 2.x implementation of the DTLN real time speech denoising model. With TF-lite, ONNX and real-time audio processing support.

Stars: ✭ 147 (-11.98%)

Mutual labels: speech-processing

Keras (tensorflow) implementation of SincNet (Mirco Ravanelli, Yoshua Bengio - https://github.com/mravanelli/SincNet)

Stars: ✭ 47 (-71.86%)

Mutual labels: speech-processing

Nonautoreggenprogress

Tracking the progress in non-autoregressive generation (translation, transcription, etc.)

Stars: ✭ 118 (-29.34%)

Mutual labels: speech-processing

VocGAN: A High-Fidelity Real-time Vocoder with a Hierarchically-nested Adversarial Network

Stars: ✭ 158 (-5.39%)

Mutual labels: speech-processing

Tutorial separation

This repo summarizes the tutorials, datasets, papers, codes and tools for speech separation and speaker extraction task. You are kindly invited to pull requests.

Stars: ✭ 151 (-9.58%)

Mutual labels: speech-processing

Collection of EM algorithms for blind source separation of audio signals

Stars: ✭ 127 (-23.95%)

Mutual labels: speech-processing

View All Similar Projects ➔

Speech Enhancement Toolkit

This toolkit is the implemention of following paper:

J. Kim and M. Hahn, "Speech Enhancement Using a Two-Stage Network for an Efficient Boosting Strategy," in IEEE Signal Processing Letters. doi: 10.1109/LSP.2019.2905660

URL: https://ieeexplore.ieee.org/document/8668449

The speech enhancement (SE) removes the noise signal from the noisy speech signal.

Now, the SE in this toolkit is based on the deep neural network (DNN). And the proposed model will be uploaded.

We hope that this toolkit will contribute as the baselines for SE research area.

This toolkit provides as follows:

The data generator script for building the noisy training and test dataset from the speech and noise dataset. (MATLAB)
The training and test script. (python3)

Prerequisites

Setup

Install aformentioned prerequistes.
Open the MATLAB and add the directories ./SE and ./Datamake including their sub-directories.
Install matlab.engine

cd "matlabroot/extern/engines/python"
python3 setup.py install

Gererate the training and test data

Prepare the speech and noise data. In general, the TIMIT corpus is used for the speech data. And, the noise data can be found in Hu's corpus, USTC's corpus and NOISEX-92.
project_directory(prj_dir)/Datamake/make_train_noisy.m will make the training set from your data. This code sequentially load the clean speech and synthesize the noisy speech with randomly selected SNR. Here, the type of noise is randomly selected from your training noise dataset. To reduce the file number, this code concatenate all generated noisy speech. Therefore, if your RAM is not enough, you should modify the code. All generated data will be written in '.raw' format with 'int16' datatype.
project_directory(prj_dir)/Datamake/make_test_noisy.m will make the test set from your data. This code sequentially load the clean speech and synthesize the noisy speech with desired SNR. Here, the code use all types of noises in the test noise dataset when synthesize the noisy speech. All generated data will be written in '.raw' format with 'int16' datatype.

Usage of `make_train_noisy.m`

Before run the code, move your training speech and noise dataset by referring the below code.

% prj_dir/Datamake/make_train_noisy.m
timit_list = dirPlus('./speech/TIMIT/TRAIN', 'FileFilter', '\.(wav|WAV)$');

hu_list = dirPlus('./noise/Nonspeech', 'FileFilter', '\.(wav|WAV)$');
ad_list = dirPlus('./noise/noise-15', 'FileFilter', '\.(wav|WAV)$');

Options

You can set the SNRs for noisy speech by adjusting snr_list.
You can make more fluent data by adjusiting aug.

Results

The generated dataset will be saved in prj_dir/SE/data/train/noisy and prj_dir/SE/data/train/clean.

Usage of `make_test_noisy.m`

Before run the code, move your test speech and noise dataset by referring the below code.

% prj_dir/Datamake/make_test_noisy.m
timit_list = dirPlus('./speech/timit_coretest', 'FileFilter', '\.(wav|WAV)$');
noise_list = dirPlus('./noise/NOISEX-92_16000');

Options

You can set the SNRs for noisy speech by adjusting snr_list.

Results

The generated dataset will be saved in prj_dir/SE/data/test/noisy and prj_dir/SE/data/test/clean.

Validation dataset

To run the code, the validation set is needed. I used to randomly select about 50 noisy utterances with corresponding clean utterances from test set then, move these to prj_dir/SE/data/valid/noisy and prj_dir/SE/data/valid/clean.

Gererate the normalize factor

This code conduct Z-score normalization to the input features, so that some normalization factor from training dataset is needed.

To get the normalization factor, just run the prj_dir/SE/get_norm.py

Options

You can use the multiple core by adjusting distribution_num.

Results

The generated normalization factor will be saved in prj_dir/SE/data/train/norm.

Training

Just run the prj_dir/SE/main.py

Model

You can check the training model in prj_dir/SE/lib/trnmodel.py

Configuration

You can check the training configuration in prj_dir/SE/lib/config.py

Tensorboard

While training, you can use the tensorboard for monitoring the training procedure.

tensorboard --logdir='prj_dir/SE/logs_dir/your log directory'

This toolkit supports followings:

PESQ, STOI, LSD, SSNR (Objective measure).

Clean, noisy, and enhanced spectrogram.

Clean, noisy and enhanced wavs.

Configuration

Reference

[1] Xu, Yong, et al. "A regression approach to speech enhancement based on deep neural networks." IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP) 23.1 (2015): 7-19.

[2] Brookes, Mike. (2011). Voicebox: Speech Processing Toolbox for Matlab.

[3] Jacob, SoundZone_Tools, (2017), GitHub repository, https://github.com/JacobD10/SoundZone_Tools

[4] Loizou, P.C.: "Speech enhancement: theory and practice", (CRC press, 2013), pp. 83−84

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 167

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (20) 🔗