All Projects → ifnspaml → Enhancement-Coded-Speech

ifnspaml / Enhancement-Coded-Speech

Licence: other
No description or website provided.

Programming Languages

matlab
3953 projects
python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Enhancement-Coded-Speech

EaBNet
This is the repo of the manuscript "Embedding and Beamforming: All-Neural Causal Beamformer for Multichannel Speech Enhancement", which was submitted to ICASSP2022.
Stars: ✭ 34 (+100%)
Mutual labels:  speech-enhancement
semetrics
Speech Enhancement Metrics (PESQ, CSIG, CBAK, COVL)
Stars: ✭ 39 (+129.41%)
Mutual labels:  speech-enhancement
Voice-Separation-and-Enhancement
A framework for quick testing and comparing multi-channel speech enhancement and separation methods, such as DSB, MVDR, LCMV, GEVD beamforming and ICA, FastICA, IVA, AuxIVA, OverIVA, ILRMA, FastMNMF.
Stars: ✭ 60 (+252.94%)
Mutual labels:  speech-enhancement
Phase-aware-Deep-Complex-UNet
(NOT Official) Implementation DC-UNet (ICLR 2019)
Stars: ✭ 48 (+182.35%)
Mutual labels:  speech-enhancement
fdndlp
A speech dereverberation algorithm, also called wpe
Stars: ✭ 115 (+576.47%)
Mutual labels:  speech-enhancement
awesome-speech-enhancement
A curated list of awesome Speech Enhancement papers, libraries, datasets, and other resources.
Stars: ✭ 48 (+182.35%)
Mutual labels:  speech-enhancement
speech-enhancement-WGAN
speech enhancement GAN on waveform/log-power-spectrum data using Improved WGAN
Stars: ✭ 35 (+105.88%)
Mutual labels:  speech-enhancement
Speech Enhancement MMSE-STSA
A statistical model-based Speech Enhancement Using MMSE-STSA
Stars: ✭ 54 (+217.65%)
Mutual labels:  speech-enhancement
SpleeterRT
Real time monaural source separation base on fully convolutional neural network operates on Time-frequency domain.
Stars: ✭ 111 (+552.94%)
Mutual labels:  speech-enhancement
deepbeam
Deep learning based Speech Beamforming
Stars: ✭ 58 (+241.18%)
Mutual labels:  speech-enhancement
ConvolutionaNeuralNetworksToEnhanceCodedSpeech
In this work we propose two postprocessing approaches applying convolutional neural networks (CNNs) either in the time domain or the cepstral domain to enhance the coded speech without any modification of the codecs. The time domain approach follows an end-to-end fashion, while the cepstral domain approach uses analysis-synthesis with cepstral d…
Stars: ✭ 25 (+47.06%)
Mutual labels:  speech-enhancement
torchsubband
Pytorch implementation of subband decomposition
Stars: ✭ 63 (+270.59%)
Mutual labels:  speech-enhancement
Espnet
End-to-End Speech Processing Toolkit
Stars: ✭ 4,533 (+26564.71%)
Mutual labels:  speech-enhancement
Noise2Noise-audio denoising without clean training data
Source code for the paper titled "Speech Denoising without Clean Training Data: a Noise2Noise Approach". Paper accepted at the INTERSPEECH 2021 conference. This paper tackles the problem of the heavy dependence of clean speech data required by deep learning based audio denoising methods by showing that it is possible to train deep speech denoisi…
Stars: ✭ 49 (+188.24%)
Mutual labels:  speech-enhancement
Voice-Denoising-AN
A Conditional Generative Adverserial Network (cGAN) was adapted for the task of source de-noising of noisy voice auditory images. The base architecture is adapted from Pix2Pix.
Stars: ✭ 42 (+147.06%)
Mutual labels:  speech-enhancement

Enhancement-Coded-Speech

Please find here the scripts referring to the paper Convolutional Neural Networks to Enhance Coded Speech. In this repository we provide the cepstral domain approach with the framework structure III.

The code was written by Ziyue Zhao and Huijun Liu.

LATEST

Some Python code is updated to match the TensorFlow 2 (the original code was written for TensorFlow 1). See Prerequisites for detailed information about how to start.

Introduction

An approach based on a convolutional neural network (CNN) is proposed to enhance coded (i.e., encoded and decoded) speech by utilizing cepstral domain features. The quality of coded speech can be enhanced and thus achieves improved quality without modifing the codec (i.e., encoder and decoder) itself.

Prerequisites and Installation

  • Nvidia GPU with CUDA and CuDNN (the code is tested with CUDA version 11.4)
  • Install Anaconda
  • Start Anaconda Prompt
  • Create a new environment and activate: conda create -n tf-gpu-new python=3.8.5, conda activate tf-gpu-new
  • Install TensorFlow-GPU and Scipy:pip install -r tensorflow-gpu==2.4.1, pip install -r scipy
  • Install Matlab (the code is tested with MATLAB 2016 and later)

Getting Started

Testing with the provided CNN model

  • Two example files: example_s1_g711_coded.raw and example_s2_g711_coded.raw (the original speech samples are from the ITU-T test signals of American English) for the G.711-coded speech are included in the dataset folder
  • Please note that the two example files are split from the file named A_eng_f5.wav in the ITU-T test signals dataset and the splitting point is at 7.0812 s.
  • Run the Matlab script to prepare the input data for the CNN model, with G.711-coded speech sample ./dataset/exapmle_s_g711_coded.raw and the means and standard variances from the training data ./data/mean_std_of_TrainData_g711_best.mat, outputting the CNN input data ./data/type_3_cnn_input_ceps_v73.mat, residual cepstral coefficients ./data/type_3_ceps_resi.mat, and the phase angel vector ./data/type_3_pha_ang.mat:
matlab Test_InputPrepare.m
  • Run the Python script to use the CNN model, with the CNN input data ./data/type_3_cnn_input_ceps_v73.mat and the provided CNN model ./data/cnn_weights_ceps_g711_best.h5, outputting the CNN output data ./data/type_3_cnn_output_ceps.mat:
python CepsDomCNN_Test.py
  • Run the Matlab script to obtain the final enhanced speech, with the CNN output data ./data/type_3_cnn_output_ceps.mat, residual cepstral coefficients ./data/type_3_ceps_resi.mat, the phase angel vector ./data/type_3_pha_ang.mat, and G.711-coded speech sample ./dataset/exapmle_s_g711_coded.raw, outputting the enhanced speech waveform ./dataset/example_s1_g711_coded_cnn_proc.raw or ./dataset/example_s2_g711_coded_cnn_proc.raw:
matlab Test_WaveformRecons.m

Reproduce the results

The results reported in the paper is tested on the NTT wideband speech database, so if you want to reproduce the exact results, the test need to be done with the same speech data (see details in the paper).

Training with your own dataset

  • Run the Matlab script to prepare the CNN training data, with the uncoded speech for training ./dataset/example_uncoded_train_s.raw, uncoded speech for validation ./dataset/example_uncoded_valid_s.raw, coded speech for training ./dataset/example_coded_train_s.raw, and coded speech for validation ./dataset/example_coded_valid_s.raw, outputting training input ./data/Train_inputSet_g711.mat, training target ./data/Train_targetSet_g711.mat, validation input ./data/Validation_inputSet_g711.mat, validation target ./data/Validation_targetSet_g711.mat, and the means and standard variances from the training data ./data/mean_std_of_TrainData_g711_example.mat:
matlab Training_Data.m
  • Run the Python scripts to train the CNN model, with the above-mentioned CNN training data, outputting the trained CNN weights ./data/cnn_weights_ceps_g711_example.h5:
python CepsDomCNN_Train.py
  • Note that your own dataset needs to replace the example speech files (the example speech samples are from the ITU-T test signals of American English).

Codecs and processing functions

  • To obtain G.711-coded speech samples, some processing functions and the ITU-T G.711 codec are needed.
  • Download the processing functions from ITU-T G.191 and compile the relevant files to obtain the programs: filter.exe, sv56demo.exe, and g711demo.exe.
  • Put the compiled programs in the root directory.
  • Run the Matlab script to obtain G.711-coded speech, with a raw speech sample ./dataset/exapmle_s.raw and the above-mentioned programs, outputting G.711-coded speech ./dataset/exapmle_s_g711_coded.raw:
matlab CodedSpeech_Obtain.m.

Citation

If you use the scripts in your research, please cite

@article{zhao2019convolutional,
  author = {Z. Zhao and H. Liu and T. Fingscheidt},
  title = {{Convolutional Neural Networks to Enhance Coded Speech}},
  journal = {IEEE/ACM Transactions on Audio, Speech, and Language Processing},
  year = {2019},
  month = april,
  volume = {27}, 
  number = {4},
  pages = {663-678}
}
@article{cnn2codedspeech,
  author =  {Z. Zhao and H. Liu and T. Fingscheidt},
  title =   {{Convolutional Neural Networks to Enhance Coded Speech}},
  howpublished = {\url{https://github.com/ifnspaml/Enhancement-Coded-Speech}},
  year =    {2018},
  month =   jun
}

Acknowledgements

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].