All Projects → david-yoon → Multimodal Speech Emotion

david-yoon / Multimodal Speech Emotion

Licence: mit
TensorFlow implementation of "Multimodal Speech Emotion Recognition using Audio and Text," IEEE SLT-18

Projects that are alternatives of or similar to Multimodal Speech Emotion

Introdatascience
Notes on Data Science. 数理统计、机器学习和数据编程的学习笔记。
Stars: ✭ 127 (-0.78%)
Mutual labels:  jupyter-notebook
Jupyter notebooks
Collection of jupyter notebooks
Stars: ✭ 127 (-0.78%)
Mutual labels:  jupyter-notebook
Robust Detection Benchmark
Code, data and benchmark from the paper "Benchmarking Robustness in Object Detection: Autonomous Driving when Winter is Coming" (NeurIPS 2019 ML4AD)
Stars: ✭ 128 (+0%)
Mutual labels:  jupyter-notebook
Sandbox
Play time!
Stars: ✭ 127 (-0.78%)
Mutual labels:  jupyter-notebook
Colour Demosaicing
CFA (Colour Filter Array) Demosaicing Algorithms for Python
Stars: ✭ 127 (-0.78%)
Mutual labels:  jupyter-notebook
Graf
Official code release for "GRAF: Generative Radiance Fields for 3D-Aware Image Synthesis"
Stars: ✭ 127 (-0.78%)
Mutual labels:  jupyter-notebook
Pynq Computervision
Computer Vision Overlays on Pynq
Stars: ✭ 127 (-0.78%)
Mutual labels:  jupyter-notebook
Rasa Ptbr Boilerplate
Um template para criar um FAQ chatbot usando Rasa, Rocket.chat, elastic search
Stars: ✭ 128 (+0%)
Mutual labels:  jupyter-notebook
Data science
daily curated links in DS, DL, NLP, ML
Stars: ✭ 127 (-0.78%)
Mutual labels:  jupyter-notebook
Focal Loss Pytorch
全中文注释.(The loss function of retinanet based on pytorch).(You can use it on one-stage detection task or classifical task, to solve data imbalance influence).用于one-stage目标检测算法,提升检测效果.你也可以在分类任务中使用该损失函数,解决数据不平衡问题.
Stars: ✭ 126 (-1.56%)
Mutual labels:  jupyter-notebook
Cs231a
Computer Vision: From 3D Reconstruction to Recognition
Stars: ✭ 126 (-1.56%)
Mutual labels:  jupyter-notebook
Machine learning beginner
机器学习初学者公众号作品
Stars: ✭ 1,770 (+1282.81%)
Mutual labels:  jupyter-notebook
Blog
Public repo for HF blog posts
Stars: ✭ 126 (-1.56%)
Mutual labels:  jupyter-notebook
Criteo 1tb Benchmark
Benchmark of different ML algorithms on Criteo 1TB dataset
Stars: ✭ 127 (-0.78%)
Mutual labels:  jupyter-notebook
Chinese Chatbot
中文聊天机器人,基于10万组对白训练而成,采用注意力机制,对一般问题都会生成一个有意义的答复。已上传模型,可直接运行,跑不起来直播吃键盘。
Stars: ✭ 124 (-3.12%)
Mutual labels:  jupyter-notebook
Algorithms Illuminated
My notes for Tim Roughgarden's awesome course on Algorithms and his 4 part books
Stars: ✭ 127 (-0.78%)
Mutual labels:  jupyter-notebook
Celegansneuroml
NeuroML based C elegans model, contained in a neuroConstruct project, as well as c302
Stars: ✭ 127 (-0.78%)
Mutual labels:  jupyter-notebook
Insightface Just Works
Insightface face detection and recognition model that just works out of the box.
Stars: ✭ 127 (-0.78%)
Mutual labels:  jupyter-notebook
Doodlenet
A doodle classifier(CNN), trained on all 345 categories from Quickdraw dataset.
Stars: ✭ 128 (+0%)
Mutual labels:  jupyter-notebook
Ipycytoscape
A Cytoscape Jupyter widget
Stars: ✭ 128 (+0%)
Mutual labels:  jupyter-notebook

multimodal-speech-emotion

This repository contains the source code used in the following paper,

Multimodal Speech Emotion Recognition using Audio and Text, IEEE SLT-18, [paper]


[requirements]

tensorflow==1.4 (tested on cuda-8.0, cudnn-6.0)
python==2.7
scikit-learn==0.20.0
nltk==3.3

[download data corpus]

  • IEMOCAP [link] [paper]
  • download IEMOCAP data from its original web-page (license agreement is required)

[preprocessed-data schema (our approach)]

  • Get the preprocessed dataset [application link]

    If you want to download the "preprocessed dataset," please ask the license to the IEMOCAP team first.

  • for the preprocessing, refer to codes in the "./preprocessing"

  • We cannot publish ASR-processed transcription due to the license issue (commercial API), however, we assume that it is moderately easy to extract ASR-transcripts from the audio signal by oneself. (we used google-cloud-speech-api)

  • Format of the data for our experiments:

    MFCC : MFCC features of the audio signal (ex. train_audio_mfcc.npy)
    [#samples, 750, 39] - (#sampels, sequencs(max 7.5s), dims)

    MFCC-SEQN : valid lenght of the sequence of the audio signal (ex. train_seqN.npy)
    [#samples] - (#sampels)

    PROSODY : prosody features of the audio signal (ex. train_audio_prosody.npy)
    [#samples, 35] - (#sampels, dims)

    TRANS : sequences of trasnciption (indexed) of a data (ex. train_nlp_trans.npy)
    [#samples, 128] - (#sampels, sequencs(max))

    LABEL : targe label of the audio signal (ex. train_label.npy)
    [#samples] - (#sampels)

[source code]

  • repository contains code for following models

    Audio Recurrent Encoder (ARE)
    Text Recurrent Encoder (TRE)
    Multimodal Dual Recurrent Encoder (MDRE)
    Multimodal Dual Recurrent Encoder with Attention (MDREA)


[training]

  • refer "reference_script.sh"
  • fianl result will be stored in "./TEST_run_result.txt"

[cite]

  • Please cite our paper, when you use our code | model | dataset

    @inproceedings{yoon2018multimodal,
    title={Multimodal Speech Emotion Recognition Using Audio and Text},
    author={Yoon, Seunghyun and Byun, Seokhyun and Jung, Kyomin},
    booktitle={2018 IEEE Spoken Language Technology Workshop (SLT)},
    pages={112--118},
    year={2018},
    organization={IEEE}
    }

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].