Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → amanbasu → Speech Emotion Recognition

amanbasu / Speech Emotion Recognition

Licence: gpl-3.0

Detecting emotions using MFCC features of human speech using Deep Learning

Labels

jupyter-notebook deep-learning tensorflow speech-recognition rnn emotion

Projects that are alternatives of or similar to Speech Emotion Recognition

Speech Emotion Analyzer

The neural network model is capable of detecting five different male/female emotions from audio speeches. (Deep Learning, NLP, Python)

Stars: ✭ 633 (+611.24%)

Mutual labels: jupyter-notebook, speech-recognition, emotion

Tensorflow cookbook

Code for Tensorflow Machine Learning Cookbook

Stars: ✭ 5,984 (+6623.6%)

Mutual labels: jupyter-notebook, rnn

Stockpriceprediction

Stock Price Prediction using Machine Learning Techniques

Stars: ✭ 700 (+686.52%)

Mutual labels: jupyter-notebook, rnn

Attentive Neural Processes

implementing "recurrent attentive neural processes" to forecast power usage (w. LSTM baseline, MCDropout)

Stars: ✭ 33 (-62.92%)

Mutual labels: jupyter-notebook, rnn

Video Classification

Tutorial for video classification/ action recognition using 3D CNN/ CNN+RNN on UCF101

Stars: ✭ 543 (+510.11%)

Mutual labels: jupyter-notebook, rnn

Telemanom

A framework for using LSTMs to detect anomalies in multivariate time series data. Includes spacecraft anomaly data and experiments from the Mars Science Laboratory and SMAP missions.

Stars: ✭ 589 (+561.8%)

Mutual labels: jupyter-notebook, rnn

Itri Speech Recognition Dataset Generation

Automatic Speech Recognition Dataset Generation

Stars: ✭ 32 (-64.04%)

Mutual labels: jupyter-notebook, speech-recognition

Tsai

Time series Timeseries Deep Learning Pytorch fastai - State-of-the-art Deep Learning with Time Series and Sequences in Pytorch / fastai

Stars: ✭ 407 (+357.3%)

Mutual labels: jupyter-notebook, rnn

Rnn Notebooks

RNN(SimpleRNN, LSTM, GRU) Tensorflow2.0 & Keras Notebooks (Workshop materials)

Stars: ✭ 48 (-46.07%)

Mutual labels: jupyter-notebook, rnn

Tensorflow Tutorials For Time Series

TensorFlow Tutorial for Time Series Prediction

Stars: ✭ 1,067 (+1098.88%)

Mutual labels: jupyter-notebook, rnn

Bitcoin Price Prediction Using Lstm

Bitcoin price Prediction ( Time Series ) using LSTM Recurrent neural network

Stars: ✭ 67 (-24.72%)

Mutual labels: jupyter-notebook, rnn

Deeplearning

深度学习入门教程, 优秀文章, Deep Learning Tutorial

Stars: ✭ 6,783 (+7521.35%)

Mutual labels: jupyter-notebook, rnn

Silero Models

Silero Models: pre-trained STT models and benchmarks made embarrassingly simple

Stars: ✭ 522 (+486.52%)

Mutual labels: jupyter-notebook, speech-recognition

Machine Learning

My Attempt(s) In The World Of ML/DL....

Stars: ✭ 78 (-12.36%)

Mutual labels: jupyter-notebook, rnn

Headlines

Automatically generate headlines to short articles

Stars: ✭ 516 (+479.78%)

Mutual labels: jupyter-notebook, rnn

Lstm Sentiment Analysis

Sentiment Analysis with LSTMs in Tensorflow

Stars: ✭ 886 (+895.51%)

Mutual labels: jupyter-notebook, rnn

Easy Deep Learning With Keras

Keras tutorial for beginners (using TF backend)

Stars: ✭ 367 (+312.36%)

Mutual labels: jupyter-notebook, rnn

Nmtpytorch

Sequence-to-Sequence Framework in PyTorch

Stars: ✭ 392 (+340.45%)

Mutual labels: jupyter-notebook, speech-recognition

Neural Networks

All about Neural Networks!

Stars: ✭ 34 (-61.8%)

Mutual labels: jupyter-notebook, rnn

Patter

speech-to-text in pytorch

Stars: ✭ 71 (-20.22%)

Mutual labels: speech-recognition, rnn

View All Similar Projects ➔

Recognising Human Emotions From Raw Audio

Collaborator: Aman Agarwal, Aditya Mishra

In this project we will use Mel frequency cepstral coefficients (MFCC) to train a recurrent neural network (LSTM) and classify human emotions into happy, sad, angry, frustrated, sad, neutral and fear categories.

The dataset used is The Interactive Emotional Dyadic Motion Capture (IEMOCAP) collected by University of Southern California

the link for the same can be found here

The dataset

The IEMOCAP database consists of 10 emotions. We selected the major 6 emotions viz. angry, neutral, frustrated, sad, excited and happy, in our training set. Features extracted from the raw audio of all sessions were saved along with their length and emotion. We used the first 20 mfcc coefficients as the feature vector, the process can be found in notebook

To convert data into a consistent shape we have applied Bucket Padding. The data is first sorted according to their sequence lengths and then divided into a specific number of buckets. The length of data thus divided is in close range of each other which eliminates extra padding. This method is used in Bucket Iterator which is used to get the batch if desired examples.

For selecting a batch, a bucket is chosen at random containing sorted data, out of that bucket contiguous examples equal to the batch size are chosen. The examples are padded to the shape of maximum sequence length and then shuffled. This gives the desired batch. the code for bucket iterator is taken from R2RT

Model

We used two layers of Bidirectional LSTM followed by attention in the last layer. The batch size was kept as 128 with the learning rate of 1e-4.

Results

The model was trained for 500 epochs and after which the curve almost reached a plateau. The model showed overfitting when the dropout was not used. We then applied a dropout of keep probability 0.8 between the last LSTM layer and the output layer.

Adding dropout reduced the overfitting of the model and increased its overall accuracy. The model showed an unweighted accuracy across six emotions of 45% with the validation accuracy of 42%.

Dropout of 0.2	No Dropout

Tensorflow model

Tensorflow implementation of the model has been added. The repository contains two files, speech_emotion_gpu to run the model on gpu and speech_emotion_gpu_multi which makes the file run parallelly on multiple gpus.

Input data for model can be downloaded from this link.

It consists of the following features: F0 (pitch), voice probability, zero-crossing rate, 12-dimensional Mel-frequency cepstral coefficients (MFCC) with log energy, and their first time derivatives. The features have been taken from this paper.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 89

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (0) 🔗