david-yoon / attentive-modality-hopping-for-SER

Licence: MIT license

TensorFlow implementation of "Attentive Modality Hopping for Speech Emotion Recognition," ICASSP-20

Programming Languages

python

139335 projects - #7 most used programming language

Jupyter Notebook

11667 projects

shell

77523 projects

Projects that are alternatives of or similar to attentive-modality-hopping-for-SER

muscaps

Source code for "MusCaps: Generating Captions for Music Audio" (IJCNN 2021)

Stars: ✭ 39 (+56%)

Mutual labels: multimodal-deep-learning

LIGHT-SERNET

Light-SERNet: A lightweight fully convolutional neural network for speech emotion recognition

Stars: ✭ 20 (-20%)

Mutual labels: speech-emotion-recognition

MSAF

Offical implementation of paper "MSAF: Multimodal Split Attention Fusion"

Stars: ✭ 47 (+88%)

Mutual labels: multimodal-deep-learning

slp

Utils and modules for Speech Language and Multimodal processing using pytorch and pytorch lightning

Stars: ✭ 17 (-32%)

Mutual labels: multimodal-deep-learning

SpeechEmoRec

Speech Emotion Recognition Using Deep Convolutional Neural Network and Discriminant Temporal Pyramid Matching

Stars: ✭ 44 (+76%)

Mutual labels: speech-emotion-recognition

ser-with-w2v2

Official implementation of INTERSPEECH 2021 paper 'Emotion Recognition from Speech Using Wav2vec 2.0 Embeddings'

Stars: ✭ 40 (+60%)

Mutual labels: speech-emotion-recognition

Multimodal-Future-Prediction

The official repository for the CVPR 2019 paper "Overcoming Limitations of Mixture Density Networks: A Sampling and Fitting Framework for Multimodal Future Prediction"

Stars: ✭ 38 (+52%)

Mutual labels: multimodal-deep-learning

Social-IQ

[CVPR 2019 Oral] Social-IQ: A Question Answering Benchmark for Artificial Social Intelligence

Stars: ✭ 37 (+48%)

Mutual labels: multimodal-deep-learning

soxan

Wav2Vec for speech recognition, classification, and audio classification

Stars: ✭ 113 (+352%)

Mutual labels: speech-emotion-recognition

BBFN

This repository contains the implementation of the paper -- Bi-Bimodal Modality Fusion for Correlation-Controlled Multimodal Sentiment Analysis

Stars: ✭ 42 (+68%)

Mutual labels: multimodal-deep-learning

wavenet-classifier

Keras Implementation of Deepmind's WaveNet for Supervised Learning Tasks

Stars: ✭ 54 (+116%)

Mutual labels: speech-emotion-recognition

Interaction-Aware-Attention-Network

[ICASSP19] An Interaction-aware Attention Network for Speech Emotion Recognition in Spoken Dialogs

Stars: ✭ 32 (+28%)

Mutual labels: speech-emotion-recognition

Speech Emotion Recognition

Using Convolutional Neural Networks in speech emotion recognition on the RAVDESS Audio Dataset.

Stars: ✭ 63 (+152%)

Mutual labels: speech-emotion-recognition

scarches

Reference mapping for single-cell genomics

Stars: ✭ 175 (+600%)

Mutual labels: multimodal-deep-learning

MultiGraphGAN

MultiGraphGAN for predicting multiple target graphs from a source graph using geometric deep learning.

Stars: ✭ 16 (-36%)

Mutual labels: multimodal-deep-learning

Robust-Deep-Learning-Pipeline

Deep Convolutional Bidirectional LSTM for Complex Activity Recognition with Missing Data. Human Activity Recognition Challenge. Springer SIST (2020)

Stars: ✭ 20 (-20%)

Mutual labels: multimodal-deep-learning

SER-datasets

A collection of datasets for the purpose of emotion recognition/detection in speech.

Stars: ✭ 74 (+196%)

Mutual labels: speech-emotion-recognition

mmd

This repository contains the Pytorch implementation for our SCAI (EMNLP-2018) submission "A Knowledge-Grounded Multimodal Search-Based Conversational Agent"

Stars: ✭ 28 (+12%)

Mutual labels: multimodal-deep-learning

circDeep

End-to-End learning framework for circular RNA classification from other long non-coding RNA using multimodal deep learning

Stars: ✭ 21 (-16%)

Mutual labels: multimodal-deep-learning

speech-emotion-recognition

Speaker independent emotion recognition

Stars: ✭ 269 (+976%)

Mutual labels: speech-emotion-recognition

View All Similar Projects ➔

attentive-modality-hopping-for-SER

This repository contains the source code used in the following paper,

Attentive Modality Hopping Mechanism for Speech Emotion Recognition, [paper]

[Notice]

I recently found that I use the "precision" metric for the model evaluation. When I change the metric from "precision" to "accuracy," models show similar performance for the "weighted" case. However, models show lower performance for the "unweighted" case. This behavior is similarly observed for other models (MHA, MDRE).

I already revised the source code. You can change the metric at the "project_config.py."

USE_PRECISION = True   --> "precision" metric
USE_PRECISION = False  --> "accuracy" metric

Precision (previously misreported as accuracy)

Model	Modality	Weighted	Unweighted
MDRE[9]	A+T	0.557 ± 0.018	0.536 ± 0.030
MDRE[9]	T+V	0.585 ± 0.040	0.561 ± 0.046
MDRE[9]	A+V	0.481 ± 0.049	0.415 ± 0.047
MHA[12]	A+T	0.583 ± 0.025	0.555 ± 0.040
MHA[12]	T+V	0.590 ± 0.017	0.560 ± 0.032
MHA[12]	A+V	0.490 ± 0.049	0.434 ± 0.060
MDRE[9]	A+T+V	0.602 ± 0.033	0.575 ± 0.046
AMH(ours)	A+T+V	0.624 ± 0.022	0.597 ± 0.040

Accuracy (revised results)

Model	Modality	Weighted	Unweighted
MDRE[9]	A+T	0.498 ± 0.059	0.418 ± 0.077
MDRE[9]	T+V	0.579 ± 0.015	0.524 ± 0.021
MDRE[9]	A+V	0.477 ± 0.025	0.376 ± 0.024
MHA[12]	A+T	0.543 ± 0.026	0.491 ± 0.028
MHA[12]	T+V	0.580 ± 0.019	0.526 ± 0.024
MHA[12]	A+V	0.471 ± 0.047	0.371 ± 0.042
MDRE[9]	A+T+V	0.564 ± 0.043	0.490 ± 0.056
AMH(ours)	A+T+V	0.617 ± 0.016	0.547 ± 0.025

[requirements]

tensorflow==1.14 (tested on cuda-10.1, cudnn-7.6)
python==3.7
scikit-learn>=0.20.0
nltk>=3.3

[download data corpus]

IEMOCAP [link] [paper]
download IEMOCAP data from its original web-page (license agreement is required)

[preprocessing (our approach)]

Get the preprocessed dataset [application link]

If you want to download the "preprocessed dataset," please ask the license to the IEMOCAP team first.
For video modality:
- We first split each video frame into two sub-frames so that each segment contains only one actor.
- Then we crop the center of each frame with size 224*224 to focus on the actor and to remove background in the video frame.
- Finally, we extract 2,048-dimensional visual features from each video data using pretrained ResNet-101 at a frame rate of 3 per second.
Format of the data for our experiments:

Audio: [#samples, 1000, 120] - (#sampels, sequencs(max 10s), dims)
Text (index) : [#samples, 128] - (#sampels, sequencs(max))
Video: [#samples, 32, 2048] - (#sampels, sequencs (max 10.6s), dims)
Emotion Classes :

class #samples

angry 1,103

excited 1,041

happy 595

sad 1,084

frustrated 1,849

surprise 107

neutral 1,708
If you want to use the same processed-data of our experiments, please drop us an email with the IEMOCAP license.
We cannot provide ASR-processed transcription due to the license issue (commercial API); however, we assume that it is moderately easy to extract ASR-transcripts from the audio signal by oneself (we used google-cloud-speech-API).

class	#samples
angry	1,103
excited	1,041
happy	595
sad	1,084
frustrated	1,849
surprise	107
neutral	1,708

[source code]

repository contains code for following models

Attentive Modality Hopping (AMH)

[training]

refer to the "model/train_reference_script.sh"
Results will be displayed in the console.
The final test result will be stored in "./TEST_run_result.txt"

[cite]

Please cite our paper, when you use our code | model | dataset

@inproceedings{yoon2020attentive,
title={Attentive modality hopping mechanism for speech emotion recognition},
author={Yoon, Seunghyun and Dey, Subhadeep and Lee, Hwanhee and Jung, Kyomin},
booktitle={ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
pages={3362--3366},
year={2020},
organization={IEEE}
}

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

david-yoon / attentive-modality-hopping-for-SER

Programming Languages

Labels

Projects that are alternatives of or similar to attentive-modality-hopping-for-SER

attentive-modality-hopping-for-SER

[Notice]

[requirements]

[download data corpus]

[preprocessing (our approach)]

[source code]

[training]

[cite]