All Projects → csteinmetz1 → Auraloss

csteinmetz1 / Auraloss

Licence: apache-2.0
Collection of audio-focused loss functions in PyTorch

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Auraloss

Digital video introduction
A hands-on introduction to video technology: image, video, codec (av1, vp9, h265) and more (ffmpeg encoding).
Stars: ✭ 12,184 (+7760.65%)
Mutual labels:  audio
Karaoke
Karaoke built with Web Audio API
Stars: ✭ 149 (-3.87%)
Mutual labels:  audio
Audioowl
Fast and simple music and audio analysis using RNN in Python 🕵️‍♀️ 🥁
Stars: ✭ 151 (-2.58%)
Mutual labels:  audio
Samplernn torch
Torch implementation of SampleRNN: An Unconditional End-to-End Neural Audio Generation Model
Stars: ✭ 146 (-5.81%)
Mutual labels:  audio
Rnnoise
Recurrent neural network for audio noise reduction
Stars: ✭ 2,266 (+1361.94%)
Mutual labels:  audio
Swiftspeech
A speech recognition framework designed for SwiftUI.
Stars: ✭ 149 (-3.87%)
Mutual labels:  audio
Learningcoreaudiowithswift2.0
All the examples of the Learning Core Audio book rewritten with Swift 2.0
Stars: ✭ 145 (-6.45%)
Mutual labels:  audio
Tmoe Linux
🍭Without any basic knowledge of linux shell,you can easily install and configure a GNU/Linux graphical desktop environment on 📱Android termux and 💻WSL .🍰You can also run VSCode on your android phone.🍹Graphical qemu manager,🐋support running docker on Android.配置WSL和安卓手机的linux容器,桌面环境,主题美化,远程桌面,音频服务,镜像源,uefi开机启动项,webdav(nginx),fcitx输入法以及qemu-system虚拟机...
Stars: ✭ 149 (-3.87%)
Mutual labels:  audio
Ultrasonic
Free and open-source music streaming Android client for Subsonic API compatible servers
Stars: ✭ 149 (-3.87%)
Mutual labels:  audio
Green Audio Player
Audio Player javascript library
Stars: ✭ 151 (-2.58%)
Mutual labels:  audio
Fsynth
Web-based and pixels-based collaborative synthesizer
Stars: ✭ 146 (-5.81%)
Mutual labels:  audio
Ffmpeg Video Player
An FFmpeg and SDL Tutorial.
Stars: ✭ 149 (-3.87%)
Mutual labels:  audio
Klaklasp
An extension for the Klak Wiring system to create audio reactive behaviors.
Stars: ✭ 150 (-3.23%)
Mutual labels:  audio
Audio Steganography Algorithms
A Library of Audio Steganography & Watermarking Algorithms
Stars: ✭ 146 (-5.81%)
Mutual labels:  audio
Nwaves
.NET library for 1D signal processing focused specifically on audio processing
Stars: ✭ 151 (-2.58%)
Mutual labels:  audio
Essentia
C++ library for audio and music analysis, description and synthesis, including Python bindings
Stars: ✭ 1,985 (+1180.65%)
Mutual labels:  audio
Dtln
Tensorflow 2.x implementation of the DTLN real time speech denoising model. With TF-lite, ONNX and real-time audio processing support.
Stars: ✭ 147 (-5.16%)
Mutual labels:  audio
Linuxscripts
Script collection for linux
Stars: ✭ 154 (-0.65%)
Mutual labels:  audio
Aeneas
aeneas is a Python/C library and a set of tools to automagically synchronize audio and text (aka forced alignment)
Stars: ✭ 1,942 (+1152.9%)
Mutual labels:  audio
Muspy
A toolkit for symbolic music generation
Stars: ✭ 151 (-2.58%)
Mutual labels:  audio

auraloss

A collection of audio-focused loss functions in PyTorch.

[PDF]

Setup

pip install auraloss

Usage

import torch
import auraloss

mrstft = auraloss.freq.MultiResolutionSTFTLoss()

input = torch.rand(8,1,44100)
target = torch.rand(8,1,44100)

loss = mrstft(input, target)

Loss functions

We categorize the loss functions as either time-domain or frequency-domain approaches. Additionally, we include perceptual transforms.

Loss function Interface Reference
Time domain
Error-to-signal ratio (ESR) auraloss.time.ESRLoss() Wright & Välimäki, 2019
DC error (DC) auraloss.time.DCLoss() Wright & Välimäki, 2019
Log hyperbolic cosine (Log-cosh) auraloss.time.LogCoshLoss() Chen et al., 2019
Signal-to-noise ratio (SNR) auraloss.time.SNRLoss()
Scale-invariant signal-to-distortion
ratio (SI-SDR)
auraloss.time.SISDRLoss() Le Roux et al., 2018
Scale-dependent signal-to-distortion
ratio (SD-SDR)
auraloss.time.SDSDRLoss() Le Roux et al., 2018
Frequency domain
Aggregate STFT auraloss.freq.STFTLoss() Arik et al., 2018
Aggregate Mel-scaled STFT auraloss.freq.MelSTFTLoss(sample_rate)
Multi-resolution STFT auraloss.freq.MultiResolutionSTFTLoss() Yamamoto et al., 2019*
Random-resolution STFT auraloss.freq.RandomResolutionSTFTLoss() Steinmetz & Reiss, 2020
Sum and difference STFT loss auraloss.freq.SumAndDifferenceSTFTLoss() Steinmetz et al., 2020
Perceptual transforms
Sum and difference signal transform auraloss.perceptual.SumAndDifference()
FIR pre-emphasis filters auraloss.perceptual.FIRFilter() Wright & Välimäki, 2019

* Wang et al., 2019 also propose a multi-resolution spectral loss (that Engel et al., 2020 follow), but they do not include both the log magnitude (L1 distance) and spectral convergence terms, introduced in Arik et al., 2018, and then extended for the multi-resolution case in Yamamoto et al., 2019.

Examples

Currently we include an example using a set of the loss functions to train a TCN for modeling an analog dynamic range compressor. For details please refer to the details in examples/compressor. We provide pre-trained models, evaluation scripts to compute the metrics in the paper, as well as scripts to retrain models.

There are some more advanced things you can do based upon the STFTLoss class. For example, you can compute both linear and log scaled STFT errors as in Engel et al., 2020. In this case we do not include the spectral convergence term.

stft_loss = auraloss.freq.STFTLoss(w_log_mag=1.0, 
                                   w_lin_mag=1.0, 
                                   w_sc=0.0, )

There is also a Mel-scaled STFT loss, which has some special requirements. This loss requires you set the sample rate as well as specify the correct device.

sample_rate = 44100
melstft_loss = auraloss.freq.MelSTFTLoss(sample_rate, device="cuda")

You can also build a multi-resolution Mel-scaled STFT loss with 64 bins easily. Make sure you pass the correct device where the tensors you are comparing will be.

mrmelstft_loss = auraloss.freq.MultiResolutionSTFTLoss(scale="mel", 
                                                       n_bins=64,
                                                       sample_rate=sample_rate,
                                                       device="cuda")

Development

We currently have no tests, but those will also be coming soon, so use caution at the moment. Future loss functions to be included will target neural network based perceptual losses, which tend to be a bit more sophisticated than those we have included so far.

If you are interested in adding a loss function please make a pull request.

Cite

If you use this code in your work please consider citing us.

@inproceedings{steinmetz2020auraloss,
    title={auraloss: {A}udio focused loss functions in {PyTorch}},
    author={Steinmetz, Christian J. and Reiss, Joshua D.},
    booktitle={Digital Music Research Network One-day Workshop (DMRN+15)},
    year={2020}}
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].