Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → KinWaiCheuk → Nnaudio

KinWaiCheuk / Nnaudio

Licence: mit

Audio processing by using pytorch 1D convolution network

Programming Languages

139335 projects - #7 most used programming language

Labels

pytorch neural-network audio-processing

Projects that are alternatives of or similar to Nnaudio

Audio/Video Processing Service

Stars: ✭ 55 (-87.15%)

Mutual labels: audio-processing

eloquent-ffmpeg

High-level API for FFmpeg's Command Line Tools

Stars: ✭ 71 (-83.41%)

Mutual labels: audio-processing

macOS System-wide Audio Equalizer & Volume Mixer 🎧

Stars: ✭ 3,947 (+822.2%)

Mutual labels: audio-processing

🎵 Creates a vaporwave (slowed, with reverb) remix of a given MP3 file, with the option of playing over a looped GIF as a video.

Stars: ✭ 14 (-96.73%)

Mutual labels: audio-processing

netpd-instruments

instruments (synths, sequencers, utilities, etc) to be used with netpd

Stars: ✭ 18 (-95.79%)

Mutual labels: audio-processing

Spleeter is Deezer source separation library with pretrained models written in Python and uses Tensorflow. It makes it easy to train source separation model (assuming you have a dataset of isolated sources), and provides already trained state of the art model for performing various flavour of separation :

Stars: ✭ 18,128 (+4135.51%)

Mutual labels: audio-processing

looking-to-listen-at-cocktail-party

Looking to listen at cocktail party

Stars: ✭ 33 (-92.29%)

Mutual labels: audio-processing

A shazam like tool to store songs fingerprints and retrieve them

Stars: ✭ 388 (-9.35%)

Mutual labels: audio-processing

A fast, versatile, easy-to-use and cross-platform Media Encoder based on FFmpeg

Stars: ✭ 66 (-84.58%)

Mutual labels: audio-processing

Novoic's audio feature extraction library

Stars: ✭ 318 (-25.7%)

Mutual labels: audio-processing

Audio-Classification-using-CNN-MLP

Multi class audio classification using Deep Learning (MLP, CNN): The objective of this project is to build a multi class classifier to identify sound of a bee, cricket or noise.

Stars: ✭ 36 (-91.59%)

Mutual labels: audio-processing

video-audio-tools

To process/edit video and audio with Python+FFmpeg. [简单实用] 基于Python+FFmpeg的视频和音频的处理/剪辑。

Stars: ✭ 164 (-61.68%)

Mutual labels: audio-processing

A GPU-accelerated library containing highly optimized building blocks and an execution engine for data processing to accelerate deep learning training and inference applications.

Stars: ✭ 3,624 (+746.73%)

Mutual labels: audio-processing

Demonstrations for the interactive exploration of selected core concepts of audio, image and video processing as well as related topics

Stars: ✭ 12 (-97.2%)

Mutual labels: audio-processing

Audio plugin framework. VST2/VST3/AU/AAX/LV2 for Linux/macOS/Windows.

Stars: ✭ 341 (-20.33%)

Mutual labels: audio-processing

A Java program to implement a DMTF Decoder.

Stars: ✭ 28 (-93.46%)

Mutual labels: audio-processing

Different implementations of "Weighted Prediction Error" for speech dereverberation

Stars: ✭ 265 (-38.08%)

Mutual labels: audio-processing

Auto-Editor: Effort free video editing!

Stars: ✭ 382 (-10.75%)

Mutual labels: audio-processing

Aaxaudioconverter

Convert Audible aax files to mp3 and m4a/m4b

Stars: ✭ 336 (-21.5%)

Mutual labels: audio-processing

Vector Hub - Library for easy discovery, and consumption of State-of-the-art models to turn data into vectors. (text2vec, image2vec, video2vec, graph2vec, bert, inception, etc)

Stars: ✭ 317 (-25.93%)

Mutual labels: audio-processing

View All Similar Projects ➔

nnAudio

nnAudio is an audio processing toolbox using PyTorch convolutional neural network as its backend. By doing so, spectrograms can be generated from audio on-the-fly during neural network training and the Fourier kernels (e.g. or CQT kernels) can be trained. Kapre has a similar concept in which they also use 1D convolutional neural network to extract spectrograms based on Keras.

Other GPU audio processing tools are torchaudio and tf.signal. But they are not using the neural network approach, and hence the Fourier basis can not be trained. As of PyTorch 1.6.0, torchaudio is still very difficult to install under the Windows environment due to sox. nnAudio is a more compatible audio processing tool across different operating systems since it relies mostly on PyTorch convolutional neural network. The name of nnAudio comes from torch.nn

Documentation

https://kinwaicheuk.github.io/nnAudio/index.html

Comparison with other libraries

Feature	nnAudio	torch.stft	kapre	torchaudio	tf.signal	torch-stft	librosa
Trainable	✅	❌	✅	❌	❌	✅	❌
Differentiable	✅	✅	✅	✅	✅	✅	❌
Linear frequency STFT	✅	✅	✅	✅	✅	✅	✅
Logarithmic frequency STFT	✅	❌	✅	❌	❌	❌	❌
Inverse STFT	✅	✅	✅	✅	✅	✅	✅
Griffin-Lim	✅	❌	❌	✅	✅	❌	✅
Mel	✅	❌	✅	✅	✅	❌	✅
MFCC	✅	❌	❌	✅	✅	❌	✅
CQT	✅	❌	❌	❌	❌	❌	✅
Gammatone	✅	❌	❌	❌	❌	❌	❌
CFP¹	✅	❌	❌	❌	❌	❌	❌
GPU support	✅	✅	✅	✅	✅	✅	❌

✅: Fully support ☑️: Developing (only available in dev version) ❌: Not support

¹ Combining Spectral and Temporal Representations for Multipitch Estimation of Polyphonic Music

News & Changelog

version 0.2.2 (1 March 2021): Added filter scale support to various version of CQT classes as requested in #54. Different normalization methods are also added to the forward() method as normalization_type under each CQT class. A bug is discovered in CQT2010, the output is problematic #85.

This version can be obtained via: pip install git+https://github.com/KinWaiCheuk/nnAudio.git#subdirectory=Installation.

version 0.2.1 (15 Jan 2021): Fixed bugs #80, #82, and fulfilled request #83. nnAudio version can be checked with nnAudio.__version__ inside python now. Added two more spectrogram types Gammatonegram() and Combined_Frequency_Periodicity().

version 0.2.0 (8 Nov 2020): Now it is possible to do stft_layer.to(device) to move the spectrogram layers between different devices. No more device argument when creating the spectrogram layers.

To use this version, do pip install nnAudio==0.2.0.

version 0.1.5: Much better iSTFT and Griffin-Lim. Now Griffin-Lim is a separated PyTorch class and requires torch >= 1.6.0 to run. STFT has also been refactored and it is less memory consuming now.

To use this version, do pip install nnAudio==0.1.5.

version 0.1.4a0: Finalized iSTFT and Griffin-Lim. They are now more accurate and stable.

version 0.1.2.dev3: Add win_length to STFT so that it has the same funcationality as librosa.

version 0.1.2.dev2: Fix bugs where the inverse cannot be done using GPU. And add a separated iSTFT layer class

version 0.1.2.dev1: Add Inverse STFT and Griffin-Lim. They are still under development, please use with care.

version 0.1.1 (1 June 2020): Add MFCC

How to cite nnAudio

The paper for nnAudio is avaliable on IEEE Access

K. W. Cheuk, H. Anderson, K. Agres and D. Herremans, "nnAudio: An on-the-Fly GPU Audio to Spectrogram Conversion Toolbox Using 1D Convolutional Neural Networks," in IEEE Access, vol. 8, pp. 161981-162003, 2020, doi: 10.1109/ACCESS.2020.3019084.

BibTex

@ARTICLE{9174990, author={K. W. {Cheuk} and H. {Anderson} and K. {Agres} and D. {Herremans}}, journal={IEEE Access}, title={nnAudio: An on-the-Fly GPU Audio to Spectrogram Conversion Toolbox Using 1D Convolutional Neural Networks}, year={2020}, volume={8}, number={}, pages={161981-162003}, doi={10.1109/ACCESS.2020.3019084}}

Call for Contributions

nnAudio is a fast-growing package. With the increasing number of feature requests, we welcome anyone who is familiar with digital signal processing and neural network to contribute to nnAudio. The current list of pending features includes:

Invertible Constant Q Transform (CQT)
CQT with filter scale factor (see issue #54)
Variable Q Transform (see VQT[https://www.researchgate.net/publication/274009051_A_Matlab_Toolbox_for_Efficient_Perfect_Reconstruction_Time-Frequency_Transforms_with_Log-Frequency_Resolution])
Speed and Performance improvements for Griffin-Lim (see issue #41)
Data Augmentation (see issue #49)

(Quick tips for unit test: cd inside Installation folder, then type pytest. You need at least 1931 MiB GPU memory to pass all the unit tests)

Alternatively, you may also contribute by:

Refactoring the code structure (Now all functions are within the same file, but with the increasing number of features, I think we need to break it down into smaller modules)
Making a better demonstration code or tutorial

Dependencies

Numpy 1.14.5

Scipy 1.2.0

PyTorch >= 1.6.0 (Griffin-Lim only available after 1.6.0)

Python >= 3.6

librosa = 0.7.0 (Theoretically nnAudio depends on librosa. But we only need to use a single function mel from librosa.filters. To save users troubles from installing librosa for this single function, I just copy the chunk of functions corresponding to mel in my code so that nnAudio runs without the need to install librosa)

Other similar libraries

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 428

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (9) 🔗