Alternatives and detailed information of SER-datasets

Spoken Emotion Recognition Datasets: A collection of datasets for the purpose of emotion recognition/detection in speech. The table is chronologically ordered and includes a description of the content of each dataset along with the emotions included.

_Dataset	_Year	_Content	_Emotions	_Format	_Size	_Language	_Paper	_Access	_License
_MuSe-CAR	₂₀₂₁	_{40 hours, 6,000+ recordings of 25,000+ sentences by 70+ English speakers (see db link for details).}	_{continuous emotion dimensions characterized using valence, arousal, and trustworthiness.}	_{Audio, Video, Text}	_{15 GB}	_English	_{The Multimodal Sentiment Analysis in Car Reviews (MuSe-CaR) Dataset: Collection, Insights and Improvements}	_{Restricted access}	_{Available under an Academic License & Commercial License}
_{MSP-Podcast corpus}	₂₀₂₀	_{100 hours by over 100 speakers (see db link for details).}	_{This corpus is annotated with emotional labels using attribute-based descriptors (activation, dominance and valence) and categorical labels (anger, happiness, sadness, disgust, surprised, fear, contempt, neutral and other).}	_Audio	_--	_--	_{The MSP-Conversation Corpus}	_{Restricted access}	_{Available under an Academic License & Commercial License}
_{emotiontts_open_db}	₂₀₂₀	_{Recordings and their associated transcriptions by a diverse group of speakers.}	_{4 emotions: general, joy, anger, and sadness.}	_{Audio, Text}	_--	_Korean	_--	_{Partial open access}	_{CC BY-NC-SA 4.0}
_URDU-Dataset	₂₀₂₀	_{400 utterances by 38 speakers (27 male and 11 female).}	_{4 emotions: angry, happy, neutral, and sad.}	_Audio	_{~72.1 MB}	_Urdu	_{Cross Lingual Speech Emotion Recognition: Urdu vs. Western Languages}	_{Open access}	_{None specified}
_BAVED	₂₀₂₀	_{1935 recording by 61 speakers (45 male and 16 female).}	_{3 levels of emotion.}	_Audio	_{~195 MB}	_Arabic	_--	_{Open access}	_{None specified}
_VIVAE	₂₀₂₀	_{non-speech, 1085 audio file by ~12 speakers.}	_{non-speech 6 emotions: achievement, anger, fear, pain, pleasure, and surprise with 3 emotional intensities (low, moderate, strong, peak).}	_Audio	_--	_--	_--	_{Restricted access}	_{CC BY-NC-SA 4.0}
_SEWA	₂₀₁₉	_{more than 2000 minutes of audio-visual data of 398 people (201 male and 197 female) coming from 6 cultures.}	_{emotions are characterized using valence and arousal.}	_{Audio, Video}	_--	_{Chinese, English, German, Greek, Hungarian and Serbian}	_{SEWA DB: A Rich Database for Audio-Visual Emotion and Sentiment Research in the Wild}	_{Restricted access}	_EULA
_MELD	₂₀₁₉	_{1400 dialogues and 14000 utterances from Friends TV series by multiple speakers.}	_{7 emotions: Anger, disgust, sadness, joy, neutral, surprise and fear. MELD also has sentiment (positive, negative and neutral) annotation for each utterance.}	_{Audio, Video, Text}	_{~10.1 GB}	_English	_{MELD: A Multimodal Multi-Party Dataset for Emotion Recognition in Conversations}	_{Open access}	_{GPL-3.0 License}
_ShEMO	₂₀₁₉	_{3000 semi-natural utterances, equivalent to 3 hours and 25 minutes of speech data from online radio plays by 87 native-Persian speakers.}	_{6 emotions: anger, fear, happiness, sadness, neutral and surprise.}	_Audio	_{~1014 MB}	_Persian	_{ShEMO: a large-scale validated database for Persian speech emotion detection}	_{Open access}	_{None sepcified}
_DEMoS	₂₀₁₉	_{9365 emotional and 332 neutral samples produced by 68 native speakers (23 females, 45 males).}	_{7/6 emotions: anger, sadness, happiness, fear, surprise, disgust, and the secondary emotion guilt.}	_Audio	_--	_Italian	_{DEMoS: An Italian emotional speech corpus. Elicitation methods, machine learning, and perception}	_{Restricted access}	_{EULA: End User License Agreement}
_AESDD	₂₀₁₈	_{around 500 utterances by a diverse group of actors (over 5 actors) simlating various emotions.}	_{5 emotions: anger, disgust, fear, happiness, and sadness.}	_Audio	_{~392 MB}	_Greek	_{Speech Emotion Recognition for Performance Interaction}	_{Open access}	_{None specified}
_Emov-DB	₂₀₁₈	_{Recordings for 4 speakers- 2 males and 2 females.}	_{The emotional styles are neutral, sleepiness, anger, disgust and amused.}	_Audio	_{5.88 GB}	_English	_{The emotional voices database: Towards controlling the emotion dimension in voice generation systems}	_{Open access}	_{None specified}
_RAVDESS	₂₀₁₈	_{7356 recordings by 24 actors.}	_{7 emotions: calm, happy, sad, angry, fearful, surprise, and disgust}	_{Audio, Video}	_{~24.8 GB}	_English	_{The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS): A dynamic, multimodal set of facial and vocal expressions in North American English}	_{Open access}	_{CC BY-NC-SA 4.0}
_{JL corpus}	₂₀₁₈	_{2400 recording of 240 sentences by 4 actors (2 males and 2 females).}	_{5 primary emotions: angry, sad, neutral, happy, excited. 5 secondary emotions: anxious, apologetic, pensive, worried, enthusiastic.}	_Audio	_--	_English	_{An Open Source Emotional Speech Corpus for Human Robot Interaction Applications}	_{Open access}	_{CC0 1.0}
_CaFE	₂₀₁₈	_{6 different sentences by 12 speakers (6 fmelaes + 6 males).}	_{7 emotions: happy, sad, angry, fearful, surprise, disgust and neutral. Each emotion is acted in 2 different intensities.}	_Audio	_{~2 GB}	_{French (Canadian)}	_--	_{Open access}	_{CC BY-NC-SA 4.0}
_EmoFilm	₂₀₁₈	_{1115 audio instances sentences extracted from various films.}	_{5 emotions: anger, contempt, happiness, fear, and sadness.}	_Audio	_--	_{English, Italian & Spanish}	_{Categorical vs Dimensional Perception of Italian Emotional Speech}	_{Restricted access}	_{EULA:End User License Agreement}
_ANAD	₂₀₁₈	_{1384 recording by multiple speakers.}	_{3 emotions: angry, happy, surprised.}	_Audio	_{~2 GB}	_Arabic	_{Arabic Natural Audio Dataset}	_{Open access}	_{CC BY-NC-SA 4.0}
_EmoSynth	₂₀₁₈	_{144 audio file labelled by 40 listeners.}	_{Emotion (no speech) defined in regard of valence and arousal.}	_Audio	_{103.4 MB}	_--	_{The Perceived Emotion of Isolated Synthetic Audio: The EmoSynth Dataset and Results}	_{Open access}	_{Creative Commons Attribution 4.0 International}
_CMU-MOSEI	₂₀₁₈	_{65 hours of annotated video from more than 1000 speakers and 250 topics.}	_{6 Emotion (happiness, sadness, anger,fear, disgust, surprise) + Likert scale.}	_{Audio, Video}	_--	_English	_{Multi-attention Recurrent Network for Human Communication Comprehension}	_{Open access}	_License
_CMU-MOSI	₂₀₁₇	_{2199 opinion utterances with annotated sentiment.}	_{Sentiment annotated between very negative to very positive in seven Likert steps.}	_{Audio, Video}	_--	_English	_{Multi-attention Recurrent Network for Human Communication Comprehension}	_{Open access}	_License
_MSP-IMPROV	₂₀₁₇	_{20 sentences by 12 actors.}	_{4 emotions: angry, sad, happy, neutral, other, without agreement}	_{Audio, Video}	_--	_English	_{MSP-IMPROV: An Acted Corpus of Dyadic Interactions to Study Emotion Perception}	_{Restricted access}	_{Available under an Academic License & Commercial License}
_CREMA-D	₂₀₁₇	_{7442 clip of 12 sentences spoken by 91 actors (48 males and 43 females).}	_{6 emotions: angry, disgusted, fearful, happy, neutral, and sad}	_{Audio, Video}	_--	_English	_{CREMA-D: Crowd-sourced Emotional Multimodal Actors Dataset}	_{Open access}	_{Available under the Open Database License & Database Content License}
_{Example emotion videos used in investigation of emotion perception in schizophrenia.}	₂₀₁₇	_{6 videos:Two example videos from each emotion category (angry, happy and neutral) by one female speaker.}	_{3 emotions: angry, happy and neutral.}	_{Audio, Video}	_{~63 MB}	_English	_--	_{Open access}	_{Available under the Permitted Non-commercial Re-use with Acknowledgement}
_EMOVO	₂₀₁₄	_{6 actors who played 14 sentences.}	_{6 emotions: disgust, fear, anger, joy, surprise, sadness.}	_Audio	_{~355 MB}	_Italian	_{EMOVO Corpus: an Italian Emotional Speech Database}	_{Open access}	_{None specified}
_RECOLA	₂₀₁₃	_{3.8 hours of recordings by 46 participants.}	_{negative and positive sentiment (valence and arousal).}	_{Audio, Video}	_--	_--	_{Introducing the RECOLA Multimodal Corpus of Remote Collaborative and Affective Interactions}	_{Restricted access}	_{Available under an Academic License & Commercial License}
_{GEMEP corpus}	₂₀₁₂	_{Videos10 actors portraying 10 states.}	_{12 emotions: amusement, anxiety, cold anger (irritation), despair, hot anger (rage), fear (panic), interest, joy (elation), pleasure(sensory), pride, relief, and sadness. Plus, 5 additional emotions: admiration, contempt, disgust, surprise, and tenderness.}	_{Audio, Video}	_--	_French	_{Introducing the Geneva Multimodal Expression Corpus for Experimental Research on Emotion Perception}	_{Restricted access}	_{None specified}
_OGVC	₂₀₁₂	_{9114 spontaneous utterances and 2656 acted utterances by 4 professional actors (two male and two female).}	_{9 emotional states: fear, surprise, sadness, disgust, anger, anticipation, joy, acceptance and the neutral state.}	_Audio	_--	_Japanese	_{Naturalistic emotional speech collectionparadigm with online game and its psychological and acoustical assessment}	_{Restricted access}	_{None specified}
_{LEGO corpus}	₂₀₁₂	_{347 dialogs with 9,083 system-user exchanges.}	_{Emotions classified as garbage, non-angry, slightly angry and very angry.}	_Audio	_{1.1 GB}	_--	_{A Parameterized and Annotated Spoken Dialog Corpus of the CMU Let’s Go Bus Information System}	_{Open access}	_{Lincense available with the data. Free of charges for research purposes only.}
_SEMAINE	₂₀₁₂	_{95 dyadic conversations from 21 subjects. Each subject converses with another playing one of four characters with emotions.}	_{5 FeelTrace annotations: activation, valence, dominance, power, intensity}	_{Audio, Video, Text}	_{104 GB}	_English	_{The SEMAINE Database: Annotated Multimodal Records of Emotionally Colored Conversations between a Person and a Limited Agent}	_{Restricted access}	_{Academic EULA}
_SAVEE	₂₀₁₁	_{480 British English utterances by 4 males actors.}	_{7 emotions: anger, disgust, fear, happiness, sadness, surprise and neutral.}	_{Audio, Video}	_--	_{English (British)}	_{Multimodal Emotion Recognition}	_{Restrictted access}	_{Free of charges for research purposes only.}
_TESS	₂₀₁₀	_{2800 recording by 2 actresses.}	_{7 emotions: anger, disgust, fear, happiness, pleasant surprise, sadness, and neutral.}	_Audio	_--	_English	_{BEHAVIOURAL FINDINGS FROM THE TORONTO EMOTIONAL SPEECH SET}	_{Open access}	_{CC BY-NC-ND 4.0}
_EEKK	₂₀₀₇	_{26 text passage read by 10 speakers.}	_{4 main emotions: joy, sadness, anger and neutral.}	_--	_{~352 MB}	_Estonian	_{Estonian Emotional Speech Corpus}	_{Open access}	_{CC-BY license}
_IEMOCAP	₂₀₀₇	_{12 hours of audiovisual data by 10 actors.}	_{5 emotions: happiness, anger, sadness, frustration and neutral.}	_--	_--	_English	_{IEMOCAP: Interactive emotional dyadic motion capture database}	_{Restricted access}	_License
_Keio-ESD	₂₀₀₆	_{A set of human speech with vocal emotion spoken by a Japanese male speaker.}	_{47 emotions including angry, joyful, disgusting, downgrading, funny, worried, gentle, relief, indignation, shameful, etc.}	_Audio	_--	_Japanese	_{EMOTIONAL SPEECH SYNTHESIS USING SUBSPACE CONSTRAINTS IN PROSODY}	_{Restricted access}	_{Available for research purposes only}
_EMO-DB	₂₀₀₅	_{800 recording spoken by 10 actors (5 males and 5 females).}	_{7 emotions: anger, neutral, fear, boredom, happiness, sadness, disgust.}	_Audio	_--	_German	_{A Database of German Emotional Speech}	_{Open access}	_{None specified}
_eNTERFACE05	₂₀₀₅	_{Videos by 42 subjects, coming from 14 different nationalities.}	_{6 emotions: anger, fear, surprise, happiness, sadness and disgust.}	_{Audio, Video}	_{~0.8 GB}	_German	_{The eNTERFACE’05 Audio-Visual Emotion Database}	_{Open access}	_{Free of charges for research purposes only}
_DES	₂₀₀₂	_{4 speakers (2 males and 2 females).}	_{5 emotions: neutral, surprise, happiness, sadness and anger}	_--	_--	_Danish	_{Documentation of the Danish Emotional Speech Database}

References

Swain, Monorama & Routray, Aurobinda & Kabisatpathy, Prithviraj, Databases, features and classifiers for speech emotion recognition: a review, International Journal of Speech Technology, paper
Dimitrios Ververidis and Constantine Kotropoulos, A State of the Art Review on Emotional Speech Databases, Artificial Intelligence & Information Analysis Laboratory, Department of Informatics Aristotle, University of Thessaloniki, paper
A. Pramod Reddy and V. Vijayarajan, Extraction of Emotions from Speech-A Survey, VIT University, International Journal of Applied Engineering Research, paper
Emotional Speech Databases, document
Expressive Synthetic Speech, website
Untitled, Technical university Munich, document

Contribution

All contributions are welcome! If you know a dataset that belongs here (see criteria) but is not listed, please feel free to add it. For more information on Contributing, please refer to CONTRIBUTING.md.
If you notice a typo or a mistake, please report this as an issue and help us improve the quality of this list.

Disclaimer

The mainter and the contributors try their best to keep this list up-to-date, and to only include working links (using automated verification with the help of the urlchecker-action). However, we cannot guarantee that all listed links are up-to-date. Read more in DISCLAIMER.md.

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

SuperKogito / SER-datasets

Programming Languages

Labels

Projects that are alternatives of or similar to SER-datasets

References

Contribution

Disclaimer