All Projects → IliaZenkov → sklearn-audio-classification

IliaZenkov / sklearn-audio-classification

Licence: MIT license
An in-depth analysis of audio classification on the RAVDESS dataset. Feature engineering, hyperparameter optimization, model evaluation, and cross-validation with a variety of ML techniques and MLP

Programming Languages

Jupyter Notebook
11667 projects

Projects that are alternatives of or similar to sklearn-audio-classification

Emotion and Polarity SO
An emotion classifier of text containing technical content from the SE domain
Stars: ✭ 74 (+138.71%)
Mutual labels:  emotion, emotion-detection, emotion-recognition
XED
XED multilingual emotion datasets
Stars: ✭ 34 (+9.68%)
Mutual labels:  classification, emotion-detection, emotion-recognition
playground
A Streamlit application to play with machine learning models directly from the browser
Stars: ✭ 48 (+54.84%)
Mutual labels:  scikit-learn, sklearn, machine-learning-tutorials
skrobot
skrobot is a Python module for designing, running and tracking Machine Learning experiments / tasks. It is built on top of scikit-learn framework.
Stars: ✭ 22 (-29.03%)
Mutual labels:  scikit-learn, feature-engineering, model-evaluation
Hyperparameter hunter
Easy hyperparameter optimization and automatic result saving across machine learning algorithms and libraries
Stars: ✭ 648 (+1990.32%)
Mutual labels:  scikit-learn, sklearn, feature-engineering
emotion-recognition-GAN
This project is a semi-supervised approach to detect emotions on faces in-the-wild using GAN
Stars: ✭ 20 (-35.48%)
Mutual labels:  emotion, emotion-recognition
RECCON
This repository contains the dataset and the PyTorch implementations of the models from the paper Recognizing Emotion Cause in Conversations.
Stars: ✭ 126 (+306.45%)
Mutual labels:  emotion, emotion-recognition
KMeans elbow
Code for determining optimal number of clusters for K-means algorithm using the 'elbow criterion'
Stars: ✭ 35 (+12.9%)
Mutual labels:  scikit-learn, sklearn
Resnet-Emotion-Recognition
Identifies emotion(s) from user facial expressions
Stars: ✭ 21 (-32.26%)
Mutual labels:  emotion, emotion-recognition
imbalanced-ensemble
Class-imbalanced / Long-tailed ensemble learning in Python. Modular, flexible, and extensible. | 模块化、灵活、易扩展的类别不平衡/长尾机器学习库
Stars: ✭ 199 (+541.94%)
Mutual labels:  scikit-learn, sklearn
Emotion-Investigator
An Exciting Deep Learning-based Flask web app that predicts the Facial Expressions of users and also does Graphical Visualization of the Expressions.
Stars: ✭ 44 (+41.94%)
Mutual labels:  emotion, emotion-detection
STEP
Spatial Temporal Graph Convolutional Networks for Emotion Perception from Gaits
Stars: ✭ 39 (+25.81%)
Mutual labels:  emotion-detection, emotion-recognition
sklearn-pmml-model
A library to parse and convert PMML models into Scikit-learn estimators.
Stars: ✭ 71 (+129.03%)
Mutual labels:  scikit-learn, sklearn
exemplary-ml-pipeline
Exemplary, annotated machine learning pipeline for any tabular data problem.
Stars: ✭ 23 (-25.81%)
Mutual labels:  sklearn, feature-engineering
AutoTabular
Automatic machine learning for tabular data. ⚡🔥⚡
Stars: ✭ 51 (+64.52%)
Mutual labels:  scikit-learn, feature-engineering
ml-workflow-automation
Python Machine Learning (ML) project that demonstrates the archetypal ML workflow within a Jupyter notebook, with automated model deployment as a RESTful service on Kubernetes.
Stars: ✭ 44 (+41.94%)
Mutual labels:  sklearn, classification
hfusion
Multimodal sentiment analysis using hierarchical fusion with context modeling
Stars: ✭ 42 (+35.48%)
Mutual labels:  emotion-detection, emotion-recognition
Igel
a delightful machine learning tool that allows you to train, test, and use models without writing code
Stars: ✭ 2,956 (+9435.48%)
Mutual labels:  scikit-learn, sklearn
Orange3
🍊 📊 💡 Orange: Interactive data analysis
Stars: ✭ 3,152 (+10067.74%)
Mutual labels:  scikit-learn, classification
converse
Conversational text Analysis using various NLP techniques
Stars: ✭ 147 (+374.19%)
Mutual labels:  scikit-learn, emotion-recognition

Introduction to Audio Classification with Deep Neural Networks

See Notebook for Code Walk-Through

Table of Contents

Power Spectrogram Chromagram
Mel Spectrogram MFC Coefficients

Abstract

Purpose

This notebook serves as an introduction to working with audio data for classification problems; it is meant as a learning resource rather than a demonstration of the state-of-the-art. The techniques mentioned in this notebook apply not only to classification problems, but to regression problems and problems dealing with other types of input data as well. I focus particularly on feature engineering techniques for audio data and provide an in-depth look at the logic, concepts, and properties of the Multilayer Perceptron (MLP) model, an ancestor and the origin of deep neural networks (DNNs) today. I also provide an introduction to a few key machine learning models and the logic in choosing their hyperparameters. These objectives are framed by the task of recognizing emotion from snippets of speech audio from the RAVDESS dataset.

Summary

Data cleansing and feature engineering comprise the most crucial aspect of preparing machine and deep learning models alike and is often the difference between success and failure. We can drastically improve the performance of a model with proper attention paid to feature engineering. This stands for input data which is already useable for predictions; even such data can be transformed in myriad ways to improve predictive performance. For features to be useful in classification they must encompass sufficient variance between different classes. We can further improve the performance of our models by understanding the influence of and precisely tuning their hyperparameters, for which there are algorithmic aids such as Grid Search.

Network architecture is a critical factor in determining the computational complexity of DNNs; often, however, simpler models with just one hidden layer perform better than more complicated models. The importance of proper model evaluation cannot be overstressed: training data should be used strictly for training a model, validation data strictly for tuning a model, and test data strictly to evaluate a model once it is tuned - a model should never be tuned to perform better on test data. To this end, K-Fold Cross Validation is a staple tool. Finally, the Random Forest ensemble model makes a robust benchmark model suitable to less-than-clean data with unkown distribution, especially when strapped for time and wishing to evaluate the useability of features extracted from a dataset.

Conclusions

Classic machine learning models such as Support Vector Machines (SVM), k Nearest Neighbours (kNN), and Random Forests have distinct advantages to deep neural networks in many tasks but do not match the performance of even the simplest deep neural network in the task of audio classification. The Multilayer Perceptron (MLP) model is the simplest form of DNN suited to classification tasks, provides decent off-the-shelf performance, and can be precisely tuned to be accurate and relatively quick to train.

The MLP provides appreciable accuracy on the RAVDESS dataset, but suffers from the relatively small number of training samples afforded by this dataset. Long Short Term Memory Recurrent Neural Networks (LSTM RNNs) and Convolutional Neural Networks (CNNs) are excellent DNN candidates for audio data classification: LSTM RNNs because of their excellent ability to interpret sequential data such as the audio waveform represented as a time series, and CNNs because features engineered on audio data such as spectrograms have marked resemblance to images, in which CNNs excel at recognition and discrimination between distinct patterns.

Cite

If you find this work useful in your own research, please cite as follows:

@misc{Zenkov-sklearn-SER-basics,
  author = {Zenkov, Ilia},
  title = {sklearn-audio-classification},
  year = {2020},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/IliaZenkov/sklearn-audio-classification}},
}

Licence

License: MIT

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].