All Projects → chen0040 → keras-audio

chen0040 / keras-audio

Licence: MIT license
keras project for audio deep learning

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to keras-audio

Laurae
Advanced High Performance Data Science Toolbox for R by Laurae
Stars: ✭ 203 (+448.65%)
Mutual labels:  supervised-learning
ACA-Slides
Slides and Code for "An Introduction to Audio Content Analysis," also taught at Georgia Tech as MUSI-6201. This introductory course on Music Information Retrieval is based on the text book "An Introduction to Audio Content Analysis", Wiley 2012/2022
Stars: ✭ 84 (+127.03%)
Mutual labels:  audio-processing
Heart disease prediction
Heart Disease prediction using 5 algorithms
Stars: ✭ 43 (+16.22%)
Mutual labels:  supervised-learning
Face.evolve.pytorch
🔥🔥High-Performance Face Recognition Library on PaddlePaddle & PyTorch🔥🔥
Stars: ✭ 2,719 (+7248.65%)
Mutual labels:  supervised-learning
gensound
Pythonic audio processing and generation framework
Stars: ✭ 69 (+86.49%)
Mutual labels:  audio-processing
MLclass
My main Machine Learning class
Stars: ✭ 56 (+51.35%)
Mutual labels:  supervised-learning
Imodels
Interpretable ML package 🔍 for concise, transparent, and accurate predictive modeling (sklearn-compatible).
Stars: ✭ 194 (+424.32%)
Mutual labels:  supervised-learning
Sound-based-bird-species-detection
Sound-based Bird Classification - using AI, acoustics and ornithology to classify birds in the environment, an environmental awareness project (Web Application, Flask, Python)
Stars: ✭ 56 (+51.35%)
Mutual labels:  audio-processing
sonopy
A simple audio feature extraction library
Stars: ✭ 72 (+94.59%)
Mutual labels:  audio-processing
android-vad
This VAD library can process audio in real-time utilizing GMM which helps identify presence of human speech in an audio sample that contains a mixture of speech and noise.
Stars: ✭ 64 (+72.97%)
Mutual labels:  audio-processing
Data Science Free
Free Resources For Data Science created by Shubham Kumar
Stars: ✭ 232 (+527.03%)
Mutual labels:  supervised-learning
simple-waveform-visualizer
JS Audio API 놀이터
Stars: ✭ 31 (-16.22%)
Mutual labels:  audio-processing
nih-chest-xrays
A collection of projects which explore image classification on chest x-ray images (via the NIH dataset)
Stars: ✭ 32 (-13.51%)
Mutual labels:  supervised-learning
Caffe Deepbinarycode
Supervised Semantics-preserving Deep Hashing (TPAMI18)
Stars: ✭ 206 (+456.76%)
Mutual labels:  supervised-learning
emusic net
Neural network to classify certain styles of Electronic music
Stars: ✭ 22 (-40.54%)
Mutual labels:  audio-processing
Shallowlearn
An experiment about re-implementing supervised learning models based on shallow neural network approaches (e.g. fastText) with some additional exclusive features and nice API. Written in Python and fully compatible with Scikit-learn.
Stars: ✭ 196 (+429.73%)
Mutual labels:  supervised-learning
MusicVisualizer
A music visualizer based on the ATMEGA328P-AU
Stars: ✭ 30 (-18.92%)
Mutual labels:  audio-processing
DDCToolbox
Create and edit DDC headset correction files
Stars: ✭ 70 (+89.19%)
Mutual labels:  audio-processing
FluX
A convenient way of processing digital signals in F#
Stars: ✭ 17 (-54.05%)
Mutual labels:  audio-processing
subwAI
Scripts for training an AI to play the endless runner Subway Surfers using a supervised machine learning approach by imitation and a convolutional neural network (CNN) for image classification
Stars: ✭ 86 (+132.43%)
Mutual labels:  supervised-learning

keras-audio

keras project for audio deep learning

Features

Audio Classification

  • The classifier ResNetV2AudioClassifier converts audio into mel-spectrogram and uses a simplified resnet DCnn architecture to classifier audios based on its associated labels.
  • The classifier Cifar10AudioClassifier converts audio into mel-spectrogram and uses the cifar-10 DCnn architecture to classifier audios based on its associated labels.
  • The classifier ResNet50AudioClassifier converts audio into mel-spectrogram and uses the resnet-50 DCnn architecture to classifier audios based on its associated labels.

The classifiers differ from those used in image classification in that:

  • they use ELU instead RELU.
  • they have elongated max pooling shape (as the mel-spectrogram is elongated "image")
  • Dropout being added

Usage: Audio Classification

Train a audio classifier

The audio classification uses Gtzan data set to train the music classifier to recognize the genre of songs.

The classification works by converting audio or song file into a mel-spectrogram which can be thought of a 3-dimension matrix in a similar manner to an image

To train on the Gtzan data set, run the following command:

cd demo
python cifar10_train.py

The sample codes below show how to train Cifar10AudioClassifier to classify songs based on its genre labels:

from keras_audio.library.cifar10 import Cifar10AudioClassifier
from keras_audio.library.utility.gtzan_loader import download_gtzan_genres_if_not_found


def load_audio_path_label_pairs(max_allowed_pairs=None):
    download_gtzan_genres_if_not_found('./very_large_data/gtzan')
    audio_paths = []
    with open('./data/lists/test_songs_gtzan_list.txt', 'rt') as file:
        for line in file:
            audio_path = './very_large_data/' + line.strip()
            audio_paths.append(audio_path)
    pairs = []
    with open('./data/lists/test_gt_gtzan_list.txt', 'rt') as file:
        for line in file:
            label = int(line)
            if max_allowed_pairs is None or len(pairs) < max_allowed_pairs:
                pairs.append((audio_paths[len(pairs)], label))
            else:
                break
    return pairs


def main():
    audio_path_label_pairs = load_audio_path_label_pairs()
    print('loaded: ', len(audio_path_label_pairs))

    classifier = Cifar10AudioClassifier()
    batch_size = 8
    epochs = 100
    history = classifier.fit(audio_path_label_pairs, model_dir_path='./models', batch_size=batch_size, epochs=epochs)


if __name__ == '__main__':
    main()

After training, the trained models are saved to demo/models.

  • The training accuracy reached over 80% after 29 epochs.
  • The training accuracy reached over 90% after 38 epochs.
  • The training accuracy after 100 epochs is 98.13%, with validation accuracy of 71%.

Model Comparison

Currently ResNet50AudioClassifier is too expensive to run on my hardware (OOM exception from GPU). Below compares training quality of ResNetV2AudioClassifier and Cifar10AudioClassifier:

training-comppare

Test trained model

To test the trained Cifar10AudioClassifier model, run the following command:

cd demo
python cifar10_predict.py

The sample codes shows how to test the trained Cifar10AudioClassifier model:

from random import shuffle

from keras_audio.library.cifar10 import Cifar10AudioClassifier
from keras_audio.library.utility.gtzan_loader import download_gtzan_genres_if_not_found, gtzan_labels


def load_audio_path_label_pairs(max_allowed_pairs=None):
    download_gtzan_genres_if_not_found('./very_large_data/gtzan')
    audio_paths = []
    with open('./data/lists/test_songs_gtzan_list.txt', 'rt') as file:
        for line in file:
            audio_path = './very_large_data/' + line.strip()
            audio_paths.append(audio_path)
    pairs = []
    with open('./data/lists/test_gt_gtzan_list.txt', 'rt') as file:
        for line in file:
            label = int(line)
            if max_allowed_pairs is None or len(pairs) < max_allowed_pairs:
                pairs.append((audio_paths[len(pairs)], label))
            else:
                break
    return pairs


def main():
    audio_path_label_pairs = load_audio_path_label_pairs()
    shuffle(audio_path_label_pairs)
    print('loaded: ', len(audio_path_label_pairs))

    classifier = Cifar10AudioClassifier()
    classifier.load_model(model_dir_path='./models')

    for i in range(0, 20):
        audio_path, actual_label_id = audio_path_label_pairs[i]
        predicted_label_id = classifier.predict_class(audio_path)
        print(audio_path)
        predicted_label = gtzan_labels[predicted_label_id]
        actual_label = gtzan_labels[actual_label_id]
        
        print('predicted: ', predicted_label, 'actual: ', actual_label)


if __name__ == '__main__':
    main()

Configure to run on GPU on Windows

  • Step 1: Change tensorflow to tensorflow-gpu in requirements.txt and install tensorflow-gpu
  • Step 2: Download and install the CUDA® Toolkit 9.0 (Please note that currently CUDA® Toolkit 9.1 is not yet supported by tensorflow, therefore you should download CUDA® Toolkit 9.0)
  • Step 3: Download and unzip the cuDNN 7.4 for CUDA@ Toolkit 9.0 and add the bin folder of the unzipped directory to the $PATH of your Windows environment

Note

On pre-processing

To pre-generate the mel-spectrograms from the audio files for classification, one can also first run the following scripts before starting training, which will make the training faster:

cd demo/utility
python gtzan_loader.py

audioread.NoBackend

The audio processing depends on librosa version 0.6 which depends on audioread.

If you are on Windows and sees the error "audioread.NoBackend", go to ffmpeg and download the shared linking build, unzip to a local directory and then add the bin folder of the ffmpeg to the Windows $PATH environment variable. Restart your cmd or powershell, Python should now be able to locate the backend for audioread in librosa

Export trained model as tensorflow pb model file

To export the trained keras model as tensorflow graph model file, run the following command:

cd demo
python cifar10_tensorflow_export_model.py

The script demo/cifar10_tensorflow_export_model.py export the trained model as demo/mdoels/tensorflow_models/cifar10/cifar10.pb

To test the exported tensorflow graph model file, run the following command:

cd demo
python cifar10_tensorflow_classifier.py

The script demo/cifar10_tensorflow_classifier.py uses pure tensorflow code to load the cifar10.pb and uses it to predict genres of the songs

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].