All Projects → keunwoochoi → Kapre

keunwoochoi / Kapre

Licence: mit
kapre: Keras Audio Preprocessors

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Kapre

Esc 50
ESC-50: Dataset for Environmental Sound Classification
Stars: ✭ 631 (-12.24%)
Mutual labels:  audio
Roc Toolkit
Real-time audio streaming over the network.
Stars: ✭ 673 (-6.4%)
Mutual labels:  audio
Ffmpeg
Mirror of https://git.ffmpeg.org/ffmpeg.git
Stars: ✭ 27,382 (+3708.34%)
Mutual labels:  audio
Hydrogen
The git repository of the advanced drum machine
Stars: ✭ 636 (-11.54%)
Mutual labels:  audio
Rodio
Rust audio playback library
Stars: ✭ 653 (-9.18%)
Mutual labels:  audio
Soundcast
Cast audio from macOS to Chromecast
Stars: ✭ 684 (-4.87%)
Mutual labels:  audio
Trinity
android video record editor muxer sdk
Stars: ✭ 609 (-15.3%)
Mutual labels:  audio
Musiccloudwebapp
🎧vuejs仿网易云音乐
Stars: ✭ 705 (-1.95%)
Mutual labels:  audio
Pyroomacoustics
Pyroomacoustics is a package for audio signal processing for indoor applications. It was developed as a fast prototyping platform for beamforming algorithms in indoor scenarios.
Stars: ✭ 662 (-7.93%)
Mutual labels:  audio
Party Mode
An experimental music visualizer using d3.js and the web audio api.
Stars: ✭ 690 (-4.03%)
Mutual labels:  audio
Speech recognition
Speech recognition module for Python, supporting several engines and APIs, online and offline.
Stars: ✭ 5,999 (+734.35%)
Mutual labels:  audio
Rpi Audio Receiver
Raspberry Pi Audio Receiver with Bluetooth A2DP, AirPlay, UPnP and Spotify Connect
Stars: ✭ 650 (-9.6%)
Mutual labels:  audio
Black candy
A self hosted music streaming server
Stars: ✭ 686 (-4.59%)
Mutual labels:  audio
Ffmpeg Normalize
Audio Normalization for Python/ffmpeg
Stars: ✭ 631 (-12.24%)
Mutual labels:  audio
Waveform Data.js
Audio Waveform Data Manipulation API – resample, offset and segment waveform data in JavaScript.
Stars: ✭ 698 (-2.92%)
Mutual labels:  audio
Audioplayer
AudioPlayer is syntax and feature sugar over AVPlayer. It plays your audio files (local & remote).
Stars: ✭ 614 (-14.6%)
Mutual labels:  audio
Amodem
Audio MODEM Communication Library in Python
Stars: ✭ 679 (-5.56%)
Mutual labels:  audio
Briefing
Secure direct video group chat
Stars: ✭ 710 (-1.25%)
Mutual labels:  audio
Zrythm
a highly automated and intuitive digital audio workstation - official mirror
Stars: ✭ 703 (-2.23%)
Mutual labels:  audio
Awesome Webaudio
A curated list of awesome WebAudio packages and resources.
Stars: ✭ 685 (-4.73%)
Mutual labels:  audio

Kapre

Keras Audio Preprocessors - compute STFT, ISTFT, Melspectrogram, and others on GPU real-time.

Tested on Python 3.6 and 3.7

Why Kapre?

vs. Pre-computation

  • You can optimize DSP parameters
  • Your model deployment becomes much simpler and consistent.
  • Your code and model has less dependencies

vs. Your own implementation

  • Quick and easy!
  • Consistent with 1D/2D tensorflow batch shapes
  • Data format agnostic (channels_first and channels_last)
  • Less error prone - Kapre layers are tested against Librosa (stft, decibel, etc) - which is (trust me) trickier than you think.
  • Kapre layers have some extended APIs from the default tf.signals implementation such as..
    • A perfectly invertible STFT and InverseSTFT pair
    • Mel-spectrogram with more options
  • Reproducibility - Kapre is available on pip with versioning

Workflow with Kapre

  1. Preprocess your audio dataset. Resample the audio to the right sampling rate and store the audio signals (waveforms).
  2. In your ML model, add Kapre layer e.g. kapre.time_frequency.STFT() as the first layer of the model.
  3. The data loader simply loads audio signals and feed them into the model
  4. In your hyperparameter search, include DSP parameters like n_fft to boost the performance.
  5. When deploying the final model, all you need to remember is the sampling rate of the signal. No dependency or preprocessing!

Installation

pip install kapre

API Documentation

Please refer to Kapre API Documentation at https://kapre.readthedocs.io

One-shot example

from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv2D, BatchNormalization, ReLU, GlobalAveragePooling2D, Dense, Softmax
from kapre import STFT, Magnitude, MagnitudeToDecibel
from kapre.composed import get_melspectrogram_layer, get_log_frequency_spectrogram_layer

# 6 channels (!), maybe 1-sec audio signal, for an example.
input_shape = (44100, 6)
sr = 44100
model = Sequential()
# A STFT layer
model.add(STFT(n_fft=2048, win_length=2018, hop_length=1024,
               window_name=None, pad_end=False,
               input_data_format='channels_last', output_data_format='channels_last',
               input_shape=input_shape))
model.add(Magnitude())
model.add(MagnitudeToDecibel())  # these three layers can be replaced with get_stft_magnitude_layer()
# Alternatively, you may want to use a melspectrogram layer
# melgram_layer = get_melspectrogram_layer()
# or log-frequency layer
# log_stft_layer = get_log_frequency_spectrogram_layer() 

# add more layers as you want
model.add(Conv2D(32, (3, 3), strides=(2, 2)))
model.add(BatchNormalization())
model.add(ReLU())
model.add(GlobalAveragePooling2D())
model.add(Dense(10))
model.add(Softmax())

# Compile the model
model.compile('adam', 'categorical_crossentropy') # if single-label classification

# train it with raw audio sample inputs
# for example, you may have functions that load your data as below.
x = load_x() # e.g., x.shape = (10000, 6, 44100)
y = load_y() # e.g., y.shape = (10000, 10) if it's 10-class classification
# then..
model.fit(x, y)
# Done!

Citation

Please cite this paper if you use Kapre for your work.

@inproceedings{choi2017kapre,
  title={Kapre: On-GPU Audio Preprocessing Layers for a Quick Implementation of Deep Neural Network Models with Keras},
  author={Choi, Keunwoo and Joo, Deokjin and Kim, Juho},
  booktitle={Machine Learning for Music Discovery Workshop at 34th International Conference on Machine Learning},
  year={2017},
  organization={ICML}
}
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].