All Projects → linksense → ConvolutionaNeuralNetworksToEnhanceCodedSpeech

linksense / ConvolutionaNeuralNetworksToEnhanceCodedSpeech

Licence: BSD-3-Clause license
In this work we propose two postprocessing approaches applying convolutional neural networks (CNNs) either in the time domain or the cepstral domain to enhance the coded speech without any modification of the codecs. The time domain approach follows an end-to-end fashion, while the cepstral domain approach uses analysis-synthesis with cepstral d…

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to ConvolutionaNeuralNetworksToEnhanceCodedSpeech

spafe
🔉 spafe: Simplified Python Audio Features Extraction
Stars: ✭ 310 (+1140%)
Mutual labels:  mfcc, speech-processing
awesome-speech-enhancement
A curated list of awesome Speech Enhancement papers, libraries, datasets, and other resources.
Stars: ✭ 48 (+92%)
Mutual labels:  speech-processing, speech-enhancement
scim
[wip]Speech recognition tool-box written by Nim. Based on Arraymancer.
Stars: ✭ 17 (-32%)
Mutual labels:  mfcc, speech-processing
torchsubband
Pytorch implementation of subband decomposition
Stars: ✭ 63 (+152%)
Mutual labels:  speech-processing, speech-enhancement
InstancedMotionVector
Shows how to support rendering per-instance motion vectors within Indirect instanced drawing of Unity.
Stars: ✭ 45 (+80%)
Mutual labels:  post-processing
BasicsMusicalInstrumClassifi
Basics of Musical Instruments Classification using Machine Learning
Stars: ✭ 27 (+8%)
Mutual labels:  mfcc
DTW Digital Voice Recognition
基于DTW与MFCC特征进行数字0-9的语音识别,DTW,MFCC,语音识别,中英数据,端点检测,Digital Voice Recognition。
Stars: ✭ 28 (+12%)
Mutual labels:  mfcc
pytorch-mfcc
A pytorch implementation of MFCC.
Stars: ✭ 30 (+20%)
Mutual labels:  mfcc
wavelet-denoiser
A wavelet audio denoiser done in python
Stars: ✭ 29 (+16%)
Mutual labels:  speech-processing
IMS-Toucan
Text-to-Speech Toolkit of the Speech and Language Technologies Group at the University of Stuttgart. Objectives of the development are simplicity, modularity, controllability and multilinguality.
Stars: ✭ 295 (+1080%)
Mutual labels:  speech-processing
PyTelTools
Python Telemac Tools for post-processing tasks (includes a workflow)
Stars: ✭ 24 (-4%)
Mutual labels:  post-processing
Robust-and-efficient-post-processing-for-video-object-detection
No description or website provided.
Stars: ✭ 107 (+328%)
Mutual labels:  post-processing
Postprocessing
Post Processing Stack
Stars: ✭ 3,524 (+13996%)
Mutual labels:  post-processing
Speaker-Identification
A program for automatic speaker identification using deep learning techniques.
Stars: ✭ 84 (+236%)
Mutual labels:  mfcc
sonopy
A simple audio feature extraction library
Stars: ✭ 72 (+188%)
Mutual labels:  mfcc
vamp-aubio-plugins
aubio plugins for Vamp
Stars: ✭ 38 (+52%)
Mutual labels:  mfcc
UHV-OTS-Speech
A data annotation pipeline to generate high-quality, large-scale speech datasets with machine pre-labeling and fully manual auditing.
Stars: ✭ 94 (+276%)
Mutual labels:  speech-processing
gridpp
Software to post-process gridded weather forecasts
Stars: ✭ 33 (+32%)
Mutual labels:  post-processing
VoxelTerrain
This project's main goal is to generate and visualize terrain built using voxels. It was achieved using different approaches and computing technologies just for the sake of performance and implementation comparison.
Stars: ✭ 37 (+48%)
Mutual labels:  post-processing
Aubio
a library for audio and music analysis
Stars: ✭ 2,601 (+10304%)
Mutual labels:  mfcc

Convolutional Neural Networks to Enhance Coded Speech

(Here Part of the project code,Not for commercial use!!!)

Abstract—Enhancing coded speech suffering from far-end acoustic background noise, quantization noise, and potentially transmission errors, is a challenging task. In this work we propose two postprocessing approaches applying convolutional neural networks (CNNs) either in the time domain or the cepstral domain to enhance the coded speech without any modification of the codecs. The time domain approach follows an end-to-end fashion, while the cepstral domain approach uses analysis-synthesis with cepstral domain features. The proposed postprocessors in both domains are evaluated for various narrowband and wideband speech codecs in a wide range of conditions. The proposed postprocessor improves speech quality (PESQ) by up to 0.25 MOS-LQO points for G.711, 0.30 points for G.726, 0.82 points for G.722, and 0.26 points for adaptive multirate wideband codec(AMR-WB). In a subjective CCR listening test, the proposed postprocessor on G.711-coded speech exceeds the speech quality of an ITU-T standardized postfilter by 0.36 CMOS points, and obtains a clear preference of 1.77 CMOS points compared to G.711, even en par with uncoded speech.

Index Terms—convolutional neural networks, speech codecs, speech enhancement.

If you use Convolutional Neural Networks to Enhance Coded Speech in your research, please cite:

@article{cnn2codedspeech,
  title={Convolutional Neural Networks to Enhance Coded Speech},
  author={Zhao, Ziyue and Liu, Huijun and Fingscheidt, Tim},
  journal={Transactions on Audio, Speech and Language Processing},
  year={2018}
}

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].