IliaZenkov / sklearn-audio-classification

Licence: MIT license

An in-depth analysis of audio classification on the RAVDESS dataset. Feature engineering, hyperparameter optimization, model evaluation, and cross-validation with a variety of ML techniques and MLP

Programming Languages

Jupyter Notebook

11667 projects

Projects that are alternatives of or similar to sklearn-audio-classification

Emotion and Polarity SO

An emotion classifier of text containing technical content from the SE domain

Stars: ✭ 74 (+138.71%)

Mutual labels: emotion, emotion-detection, emotion-recognition

XED

XED multilingual emotion datasets

Stars: ✭ 34 (+9.68%)

Mutual labels: classification, emotion-detection, emotion-recognition

playground

A Streamlit application to play with machine learning models directly from the browser

Stars: ✭ 48 (+54.84%)

Mutual labels: scikit-learn, sklearn, machine-learning-tutorials

skrobot

skrobot is a Python module for designing, running and tracking Machine Learning experiments / tasks. It is built on top of scikit-learn framework.

Stars: ✭ 22 (-29.03%)

Mutual labels: scikit-learn, feature-engineering, model-evaluation

Hyperparameter hunter

Easy hyperparameter optimization and automatic result saving across machine learning algorithms and libraries

Stars: ✭ 648 (+1990.32%)

Mutual labels: scikit-learn, sklearn, feature-engineering

emotion-recognition-GAN

This project is a semi-supervised approach to detect emotions on faces in-the-wild using GAN

Stars: ✭ 20 (-35.48%)

Mutual labels: emotion, emotion-recognition

RECCON

This repository contains the dataset and the PyTorch implementations of the models from the paper Recognizing Emotion Cause in Conversations.

Stars: ✭ 126 (+306.45%)

Mutual labels: emotion, emotion-recognition

KMeans elbow

Code for determining optimal number of clusters for K-means algorithm using the 'elbow criterion'

Stars: ✭ 35 (+12.9%)

Mutual labels: scikit-learn, sklearn

Resnet-Emotion-Recognition

Identifies emotion(s) from user facial expressions

Stars: ✭ 21 (-32.26%)

Mutual labels: emotion, emotion-recognition

imbalanced-ensemble

Class-imbalanced / Long-tailed ensemble learning in Python. Modular, flexible, and extensible. | 模块化、灵活、易扩展的类别不平衡/长尾机器学习库

Stars: ✭ 199 (+541.94%)

Mutual labels: scikit-learn, sklearn

Emotion-Investigator

An Exciting Deep Learning-based Flask web app that predicts the Facial Expressions of users and also does Graphical Visualization of the Expressions.

Stars: ✭ 44 (+41.94%)

Mutual labels: emotion, emotion-detection

STEP

Spatial Temporal Graph Convolutional Networks for Emotion Perception from Gaits

Stars: ✭ 39 (+25.81%)

Mutual labels: emotion-detection, emotion-recognition

sklearn-pmml-model

A library to parse and convert PMML models into Scikit-learn estimators.

Stars: ✭ 71 (+129.03%)

Mutual labels: scikit-learn, sklearn

exemplary-ml-pipeline

Exemplary, annotated machine learning pipeline for any tabular data problem.

Stars: ✭ 23 (-25.81%)

Mutual labels: sklearn, feature-engineering

AutoTabular

Automatic machine learning for tabular data. ⚡🔥⚡

Stars: ✭ 51 (+64.52%)

Mutual labels: scikit-learn, feature-engineering

ml-workflow-automation

Python Machine Learning (ML) project that demonstrates the archetypal ML workflow within a Jupyter notebook, with automated model deployment as a RESTful service on Kubernetes.

Stars: ✭ 44 (+41.94%)

Mutual labels: sklearn, classification

hfusion

Multimodal sentiment analysis using hierarchical fusion with context modeling

Stars: ✭ 42 (+35.48%)

Mutual labels: emotion-detection, emotion-recognition

Igel

a delightful machine learning tool that allows you to train, test, and use models without writing code

Stars: ✭ 2,956 (+9435.48%)

Mutual labels: scikit-learn, sklearn

Orange3

🍊 📊 💡 Orange: Interactive data analysis

Stars: ✭ 3,152 (+10067.74%)

Mutual labels: scikit-learn, classification

converse

Conversational text Analysis using various NLP techniques

Stars: ✭ 147 (+374.19%)

Mutual labels: scikit-learn, emotion-recognition

View All Similar Projects ➔

Introduction to Audio Classification with Deep Neural Networks

See Notebook for Code Walk-Through

Power Spectrogram	Chromagram

Mel Spectrogram	MFC Coefficients

Abstract

Purpose

This notebook serves as an introduction to working with audio data for classification problems; it is meant as a learning resource rather than a demonstration of the state-of-the-art. The techniques mentioned in this notebook apply not only to classification problems, but to regression problems and problems dealing with other types of input data as well. I focus particularly on feature engineering techniques for audio data and provide an in-depth look at the logic, concepts, and properties of the Multilayer Perceptron (MLP) model, an ancestor and the origin of deep neural networks (DNNs) today. I also provide an introduction to a few key machine learning models and the logic in choosing their hyperparameters. These objectives are framed by the task of recognizing emotion from snippets of speech audio from the RAVDESS dataset.

Summary

Data cleansing and feature engineering comprise the most crucial aspect of preparing machine and deep learning models alike and is often the difference between success and failure. We can drastically improve the performance of a model with proper attention paid to feature engineering. This stands for input data which is already useable for predictions; even such data can be transformed in myriad ways to improve predictive performance. For features to be useful in classification they must encompass sufficient variance between different classes. We can further improve the performance of our models by understanding the influence of and precisely tuning their hyperparameters, for which there are algorithmic aids such as Grid Search.

Network architecture is a critical factor in determining the computational complexity of DNNs; often, however, simpler models with just one hidden layer perform better than more complicated models. The importance of proper model evaluation cannot be overstressed: training data should be used strictly for training a model, validation data strictly for tuning a model, and test data strictly to evaluate a model once it is tuned - a model should never be tuned to perform better on test data. To this end, K-Fold Cross Validation is a staple tool. Finally, the Random Forest ensemble model makes a robust benchmark model suitable to less-than-clean data with unkown distribution, especially when strapped for time and wishing to evaluate the useability of features extracted from a dataset.

Conclusions

Classic machine learning models such as Support Vector Machines (SVM), k Nearest Neighbours (kNN), and Random Forests have distinct advantages to deep neural networks in many tasks but do not match the performance of even the simplest deep neural network in the task of audio classification. The Multilayer Perceptron (MLP) model is the simplest form of DNN suited to classification tasks, provides decent off-the-shelf performance, and can be precisely tuned to be accurate and relatively quick to train.

The MLP provides appreciable accuracy on the RAVDESS dataset, but suffers from the relatively small number of training samples afforded by this dataset. Long Short Term Memory Recurrent Neural Networks (LSTM RNNs) and Convolutional Neural Networks (CNNs) are excellent DNN candidates for audio data classification: LSTM RNNs because of their excellent ability to interpret sequential data such as the audio waveform represented as a time series, and CNNs because features engineered on audio data such as spectrograms have marked resemblance to images, in which CNNs excel at recognition and discrimination between distinct patterns.

Cite

If you find this work useful in your own research, please cite as follows:

@misc{Zenkov-sklearn-SER-basics,
  author = {Zenkov, Ilia},
  title = {sklearn-audio-classification},
  year = {2020},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/IliaZenkov/sklearn-audio-classification}},
}

Licence

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

IliaZenkov / sklearn-audio-classification

Programming Languages

Labels

Projects that are alternatives of or similar to sklearn-audio-classification

Introduction to Audio Classification with Deep Neural Networks

See Notebook for Code Walk-Through

Table of Contents

Abstract

Purpose

Summary

Conclusions

Cite

Licence