linksense / ConvolutionaNeuralNetworksToEnhanceCodedSpeech

Licence: BSD-3-Clause license

In this work we propose two postprocessing approaches applying convolutional neural networks (CNNs) either in the time domain or the cepstral domain to enhance the coded speech without any modification of the codecs. The time domain approach follows an end-to-end fashion, while the cepstral domain approach uses analysis-synthesis with cepstral d…

Programming Languages

python

139335 projects - #7 most used programming language

Projects that are alternatives of or similar to ConvolutionaNeuralNetworksToEnhanceCodedSpeech

spafe

🔉 spafe: Simplified Python Audio Features Extraction

Stars: ✭ 310 (+1140%)

Mutual labels: mfcc, speech-processing

awesome-speech-enhancement

A curated list of awesome Speech Enhancement papers, libraries, datasets, and other resources.

Stars: ✭ 48 (+92%)

Mutual labels: speech-processing, speech-enhancement

scim

[wip]Speech recognition tool-box written by Nim. Based on Arraymancer.

Stars: ✭ 17 (-32%)

Mutual labels: mfcc, speech-processing

torchsubband

Pytorch implementation of subband decomposition

Stars: ✭ 63 (+152%)

Mutual labels: speech-processing, speech-enhancement

InstancedMotionVector

Shows how to support rendering per-instance motion vectors within Indirect instanced drawing of Unity.

Stars: ✭ 45 (+80%)

Mutual labels: post-processing

BasicsMusicalInstrumClassifi

Basics of Musical Instruments Classification using Machine Learning

Stars: ✭ 27 (+8%)

Mutual labels: mfcc

DTW Digital Voice Recognition

基于DTW与MFCC特征进行数字0-9的语音识别，DTW，MFCC，语音识别，中英数据，端点检测，Digital Voice Recognition。

Stars: ✭ 28 (+12%)

Mutual labels: mfcc

pytorch-mfcc

A pytorch implementation of MFCC.

Stars: ✭ 30 (+20%)

Mutual labels: mfcc

wavelet-denoiser

A wavelet audio denoiser done in python

Stars: ✭ 29 (+16%)

Mutual labels: speech-processing

IMS-Toucan

Text-to-Speech Toolkit of the Speech and Language Technologies Group at the University of Stuttgart. Objectives of the development are simplicity, modularity, controllability and multilinguality.

Stars: ✭ 295 (+1080%)

Mutual labels: speech-processing

PyTelTools

Python Telemac Tools for post-processing tasks (includes a workflow)

Stars: ✭ 24 (-4%)

Mutual labels: post-processing

Robust-and-efficient-post-processing-for-video-object-detection

No description or website provided.

Stars: ✭ 107 (+328%)

Mutual labels: post-processing

Postprocessing

Post Processing Stack

Stars: ✭ 3,524 (+13996%)

Mutual labels: post-processing

Speaker-Identification

A program for automatic speaker identification using deep learning techniques.

Stars: ✭ 84 (+236%)

Mutual labels: mfcc

sonopy

A simple audio feature extraction library

Stars: ✭ 72 (+188%)

Mutual labels: mfcc

vamp-aubio-plugins

aubio plugins for Vamp

Stars: ✭ 38 (+52%)

Mutual labels: mfcc

UHV-OTS-Speech

A data annotation pipeline to generate high-quality, large-scale speech datasets with machine pre-labeling and fully manual auditing.

Stars: ✭ 94 (+276%)

Mutual labels: speech-processing

gridpp

Software to post-process gridded weather forecasts

Stars: ✭ 33 (+32%)

Mutual labels: post-processing

VoxelTerrain

This project's main goal is to generate and visualize terrain built using voxels. It was achieved using different approaches and computing technologies just for the sake of performance and implementation comparison.

Stars: ✭ 37 (+48%)

Mutual labels: post-processing

Aubio

a library for audio and music analysis

Stars: ✭ 2,601 (+10304%)

Mutual labels: mfcc

View All Similar Projects ➔

Convolutional Neural Networks to Enhance Coded Speech

(Here Part of the project code，Not for commercial use!!!)

Abstract—Enhancing coded speech suffering from far-end acoustic background noise, quantization noise, and potentially transmission errors, is a challenging task. In this work we propose two postprocessing approaches applying convolutional neural networks (CNNs) either in the time domain or the cepstral domain to enhance the coded speech without any modification of the codecs. The time domain approach follows an end-to-end fashion, while the cepstral domain approach uses analysis-synthesis with cepstral domain features. The proposed postprocessors in both domains are evaluated for various narrowband and wideband speech codecs in a wide range of conditions. The proposed postprocessor improves speech quality (PESQ) by up to 0.25 MOS-LQO points for G.711, 0.30 points for G.726, 0.82 points for G.722, and 0.26 points for adaptive multirate wideband codec(AMR-WB). In a subjective CCR listening test, the proposed postprocessor on G.711-coded speech exceeds the speech quality of an ITU-T standardized postfilter by 0.36 CMOS points, and obtains a clear preference of 1.77 CMOS points compared to G.711, even en par with uncoded speech.

Index Terms—convolutional neural networks, speech codecs, speech enhancement.

If you use Convolutional Neural Networks to Enhance Coded Speech in your research, please cite:

@article{cnn2codedspeech,
  title={Convolutional Neural Networks to Enhance Coded Speech},
  author={Zhao, Ziyue and Liu, Huijun and Fingscheidt, Tim},
  journal={Transactions on Audio, Speech and Language Processing},
  year={2018}
}

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

linksense / ConvolutionaNeuralNetworksToEnhanceCodedSpeech

Programming Languages

Labels

Projects that are alternatives of or similar to ConvolutionaNeuralNetworksToEnhanceCodedSpeech

Convolutional Neural Networks to Enhance Coded Speech