All Projects → madhavmk → Noise2Noise-audio_denoising_without_clean_training_data

madhavmk / Noise2Noise-audio_denoising_without_clean_training_data

Licence: MIT License
Source code for the paper titled "Speech Denoising without Clean Training Data: a Noise2Noise Approach". Paper accepted at the INTERSPEECH 2021 conference. This paper tackles the problem of the heavy dependence of clean speech data required by deep learning based audio denoising methods by showing that it is possible to train deep speech denoisi…

Programming Languages

Jupyter Notebook
11667 projects
python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Noise2Noise-audio denoising without clean training data

awesome-speech-enhancement
A curated list of awesome Speech Enhancement papers, libraries, datasets, and other resources.
Stars: ✭ 48 (-2.04%)
Mutual labels:  noise-reduction, speech-enhancement, speech-denoising
Speech Enhancement MMSE-STSA
A statistical model-based Speech Enhancement Using MMSE-STSA
Stars: ✭ 54 (+10.2%)
Mutual labels:  speech-enhancement, speech-denoising
audio noise clustering
https://dodiku.github.io/audio_noise_clustering/results/ ==> An experiment with a variety of clustering (and clustering-like) techniques to reduce noise on an audio speech recording.
Stars: ✭ 24 (-51.02%)
Mutual labels:  speech, noise-reduction
Deep-Restore-PyTorch
Deep CNN for learning image restoration without clean data!
Stars: ✭ 59 (+20.41%)
Mutual labels:  noise-reduction, noise2noise
minutes
🔭 Speaker diarization via transfer learning
Stars: ✭ 25 (-48.98%)
Mutual labels:  speech
editts
Official implementation of EdiTTS: Score-based Editing for Controllable Text-to-Speech
Stars: ✭ 74 (+51.02%)
Mutual labels:  speech
RpiANC
Active Noise Control on Raspberry Pi
Stars: ✭ 49 (+0%)
Mutual labels:  noise-reduction
Babler
Data Collection System For NLP/Speech Recognition
Stars: ✭ 21 (-57.14%)
Mutual labels:  data-collection
magic-mic
Open Source Noise Cancellation App for Virtual Meetings
Stars: ✭ 59 (+20.41%)
Mutual labels:  noise-reduction
Cifar-Autoencoder
A look at some simple autoencoders for the Cifar10 dataset, including a denoising autoencoder. Python code included.
Stars: ✭ 42 (-14.29%)
Mutual labels:  autoencoder
video autoencoder
Video lstm auto encoder built with pytorch. https://arxiv.org/pdf/1502.04681.pdf
Stars: ✭ 32 (-34.69%)
Mutual labels:  autoencoder
Speech256
An FPGA implementation of a classic 80ies speech synthesizer. Done for the Retro Challenge 2017/10.
Stars: ✭ 51 (+4.08%)
Mutual labels:  speech
SAE-NAD
The implementation of "Point-of-Interest Recommendation: Exploiting Self-Attentive Autoencoders with Neighbor-Aware Influence"
Stars: ✭ 48 (-2.04%)
Mutual labels:  autoencoder
mirapy
MiraPy: A Python package for Deep Learning in Astronomy
Stars: ✭ 40 (-18.37%)
Mutual labels:  autoencoder
Keras Autoencoder
Autoencoders using Keras
Stars: ✭ 74 (+51.02%)
Mutual labels:  autoencoder
sldm4-h2o
Statistical Learning & Data Mining IV - H2O Presenation & Tutorial
Stars: ✭ 26 (-46.94%)
Mutual labels:  autoencoder
sova-asr
SOVA ASR (Automatic Speech Recognition)
Stars: ✭ 123 (+151.02%)
Mutual labels:  speech
T3
[EMNLP 2020] "T3: Tree-Autoencoder Constrained Adversarial Text Generation for Targeted Attack" by Boxin Wang, Hengzhi Pei, Boyuan Pan, Qian Chen, Shuohang Wang, Bo Li
Stars: ✭ 25 (-48.98%)
Mutual labels:  autoencoder
Voice-Denoising-AN
A Conditional Generative Adverserial Network (cGAN) was adapted for the task of source de-noising of noisy voice auditory images. The base architecture is adapted from Pix2Pix.
Stars: ✭ 42 (-14.29%)
Mutual labels:  speech-enhancement
AE-CNN
ICVGIP' 18 Oral Paper - Classification of thoracic diseases on ChestX-Ray14 dataset
Stars: ✭ 33 (-32.65%)
Mutual labels:  autoencoder

Speech Denoising without Clean Training Data: a Noise2Noise Approach

Source code for the Interspeech 2021 paper titled "Speech Denoising without Clean Training Data: a Noise2Noise Approach". This paper removes the obstacle of heavy dependence of clean speech data required by deep learning based audio denoising methods, by showing that it is possible to train deep speech denoising networks using only noisy speech samples. Furthermore it is revealed that training regimes using only noisy audio targets achieve superior denoising performance over conventional training regimes utilizing clean training audio targets, in cases involving complex noise distributions and low Signal-to-Noise ratios (high noise environments). This is demonstrated through experiments studying the efficacy of our proposed approach over both real-world noises and synthetic noises using the 20 layered Deep Complex U-Net architecture. We aim to incentivise the collection of audio data, even when the circumstances are not ideal to allow it to be perfectly clean. We believe that this could significantly advance the prospects of speech denoising technologies for various lowresource languages, due to the decreased costs and barriers in data collection.

Research Paper and Citation

You can find the paper at the following link as part of the proceedings of Interspeech 2021: https://www.isca-speech.org/archive/interspeech_2021/kashyap21_interspeech.html . You can also view it on Arxiv.

If you would like to cite this work, please use the following Bibtex citation:

@inproceedings{kashyap21_interspeech,
author={Madhav Mahesh Kashyap and Anuj Tambwekar and Krishnamoorthy Manohara and S. Natarajan},
title={{Speech Denoising Without Clean Training Data: A Noise2Noise Approach}},
year=2021,
booktitle={Proc. Interspeech 2021},
pages={2716--2720},
doi={10.21437/Interspeech.2021-1130}
}

Python Requirements

We recommend using Python 3.8.8. The package versions are in requirements.txt. We recommend using the Conda package manager to install dependencies.

conda create --name <env> --file requirements.txt

Dataset Generation

We use 2 standard datasets; 'UrbanSound8K'(for real-world noise samples), and 'Voice Bank + DEMAND'(for speech samples). Please download the datasets from urbansounddataset.weebly.com/urbansound8k.html and datashare.ed.ac.uk/handle/10283/2791 respectively. Extract and organize into the Datasets folder as shown below:

Noise2Noise-audio_denoising_without_clean_training_data
│     README.md
│     speech_denoiser_DCUNet.ipynb
|     ...
│_____Datasets
      |     clean_testset_wav
      |     clean_trainset_28spk_wav
      |     noisy_testset_wav
      |     noisy_trainset_28spk_wav
      |_____UrbanSound8K
            |_____audio
                  |_____fold1
                  ...
                  |_____fold10

To train a White noise denoising model, run the script:

python white_noise_dataset_generator.py

To train a UrbanSound noise class denoising model, run the script, and select the noise class:

python urban_sound_noise_dataset_generator.py

0 : air_conditioner
1 : car_horn
2 : children_playing
3 : dog_bark
4 : drilling
5 : engine_idling
6 : gun_shot
7 : jackhammer
8 : siren
9 : street_music

The train and test datasets for the specified noise will be generated in the 'Datasets' directory.

Training a New Model

In the 'speech_denoiser_DCUNet.ipynb' file. Specify the type of noise model you want to train to denoise(You have to generate the specific noise Dataset first). You can choose whether to train using our Noise2Noise approach(using noisy audio for both training inputs and targets), or the conventional approach(using noisy audio as training inputs and the clean audio as training target). If you are using Windows, set 'soundfile' as the torchaudio backend. If you are using Linux, set 'sox' as the torchaudio backend. The weights .pth file is saved for each training epoch in the 'Weights' directory.

Testing Model Inference on Pretrained Weights

We have trained our model with both the Noise2Noise and Noise2Clean approaches, for all 10(numbered 0-9) UrbanSound noise classes and White Gaussian noise. All of our pre-trained model weights are uploaded in 'Pretrained_Weights' directory under the 'Noise2Noise' and 'Noise2Clean' subdirectories.

In the 'speech_denoiser_DCUNet.ipynb' file. Select the weights .pth file for model to use. Point to the testing folders containing the audio you want to denoise. Audio quality metrics will also be calculated. The noisy, clean and denoised wav files will be saved in the 'Samples' directory.

Example

Noisy audio waveform

./static/noisy_waveform.PNG

Model denoised audio waveform

static/denoised_waveform.PNG

True clean audio waveform

static/clean_waveform.PNG

20-Layered Deep Complex U-Net 20 Model Used

static/dcunet20.PNG

Results

static/results.PNG

Special thanks to the following repositories:

References

[1] Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” Nature, vol. 521, no. 7553, pp. 436–444, May 2015.

[2] J. Lehtinen, J. Munkberg, J. Hasselgren, S. Laine, T. Karras, M. Aittala, and T. Aila, “Noise2Noise: Learning image restoration without clean data,” in Proceedings of the 35th International Conference on Machine Learning, 2018, pp. 2965–2974.

[3] N. Alamdari, A. Azarang, and N. Kehtarnavaz, “Improving deep speech denoising by noisy2noisy signal mapping,” Applied Acoustics, vol. 172, p. 107631, 2021.

[4] R. E. Zezario, T. Hussain, X. Lu, H. M.Wang, and Y. Tsao, “Selfsupervised denoising autoencoder with linear regression decoder for speech enhancement,” in ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2020, pp. 6669–6673.

[5] Y. Shi, W. Rong, and N. Zheng, “Speech enhancement using convolutional neural network with skip connections,” in 2018 11th International Symposium on Chinese Spoken Language Processing (ISCSLP), 2018, pp. 6–10.

[6] Z. Zhao, H. Liu, and T. Fingscheidt, “Convolutional neural networks to enhance coded speech,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 27, no. 4, pp. 663– 678, 2019.

[7] F. G. Germain, Q. Chen, and V. Koltun, “Speech Denoising with Deep Feature Losses,” in Proc. Interspeech 2019, 2019, pp. 2723–2727. [Online]. Available: http://dx.doi.org/10.21437/ Interspeech.2019-1924

[8] A. Azarang and N. Kehtarnavaz, “A review of multi-objective deep learning speech denoising methods,” Speech Communication, vol. 122, 05 2020.

[9] C. Valentini-Botinhao, “Noisy speech database for training speech enhancement algorithms and TTS models 2016[sound].” [Online]. Available: https://doi.org/10.7488/ds/2117

[10] J. Salamon, C. Jacoby, and J. P. Bello, “A dataset and taxonomy for urban sound research,” in 22nd ACM International Conference on Multimedia (ACM-MM’14), Orlando, FL, USA, Nov. 2014, pp. 1041–1044.

[11] J. Robert, M. Webbie et al., “Pydub,” 2018. [Online]. Available: http://pydub.com/

[12] H.-S. Choi, J.-H. Kim, J. Huh, A. Kim, J.-W. Ha, and K. Lee, “Phase-aware speech enhancement with deep complex u-net,” in International Conference on Learning Representations, 2018.

[13] O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation,” in International Conference on Medical image computing and computer-assisted intervention. Springer, 2015, pp. 234–241.

[14] C. Veaux, J. Yamagishi, and S. King, “The voice bank corpus: Design, collection and data analysis of a large regional accent speech database,” in 2013 international conference oriental COCOSDA held jointly with 2013 conference on Asian spoken language research and evaluation (O-COCOSDA/CASLRE). IEEE, 2013, pp. 1–4.

[15] J. Thiemann, N. Ito, and E. Vincent, “The diverse environments multi-channel acoustic noise database (demand): A database of multichannel environmental noise recordings,” in Proceedings of Meetings on Acoustics ICA2013, vol. 19, no. 1. Acoustical Society of America, 2013, p. 035081.

[16] C. Valentini-Botinhao, X. Wang, S. Takaki, and J. Yamagishi, “Investigating rnn-based speech enhancement methods for noiserobust text-to-speech.” in SSW, 2016, pp. 146–152.

[17] S. Kelkar, L. Grigsby, and J. Langsner, “An extension of parseval’s theorem and its use in calculating transient energy in the frequency domain,” IEEE Transactions on Industrial Electronics, no. 1, pp. 42–45, 1983.

[18] C. Trabelsi, O. Bilaniuk, Y. Zhang, D. Serdyuk, S. Subramanian, J. F. Santos, S. Mehri, N. Rostamzadeh, Y. Bengio, and C. J. Pal, “Deep complex networks,” in 6th International Conference on Learning Representations, ICLR 2018.

[19] B. Xu, N. Wang, T. Chen, and M. Li, “Empirical evaluation of rectified activations in convolutional network,” arXiv preprint arXiv:1505.00853, 2015.

[20] A. W. Rix, J. G. Beerends, M. P. Hollier, and A. P. Hekstra, “Perceptual evaluation of speech quality (pesq)-a new method for speech quality assessment of telephone networks and codecs,” in 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221), vol. 2, 2001, pp. 749–752 vol.2.

[21] C. H. Taal, R. C. Hendriks, R. Heusdens, and J. Jensen, “A shorttime objective intelligibility measure for time-frequency weighted noisy speech,” in 2010 IEEE International Conference on Acoustics, Speech and Signal Processing, 2010, pp. 4214–4217.

[22] M. Zhou, T. Liu, Y. Li, D. Lin, E. Zhou, and T. Zhao, “Toward understanding the importance of noise in training neural networks,” in Proceedings of the 36th International Conference on Machine Learning, ser. Proceedings of Machine Learning Research, K. Chaudhuri and R. Salakhutdinov, Eds., vol. 97. PMLR, 09–15 Jun 2019, pp. 7594–7602.

[23] P. Ndajah, H. Kikuchi, M. Yukawa, H. Watanabe, and S. Muramatsu, “An investigation on the quality of denoised images,” International Journal of Circuits, Systems and Signal Processing, vol. 5, no. 4, pp. 423–434, Oct. 2011.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].