All Projects → wenliangdai → Modality-Transferable-MER

wenliangdai / Modality-Transferable-MER

Licence: CC-BY-4.0 License
Modality-Transferable-MER, multimodal emotion recognition model with zero-shot and few-shot abilities.

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Modality-Transferable-MER

sklearn-audio-classification
An in-depth analysis of audio classification on the RAVDESS dataset. Feature engineering, hyperparameter optimization, model evaluation, and cross-validation with a variety of ML techniques and MLP
Stars: ✭ 31 (-13.89%)
Mutual labels:  emotion-recognition
Hemuer
An AI Tool to record expressions of users as they watch a video and then visualize the funniest parts of it!
Stars: ✭ 22 (-38.89%)
Mutual labels:  emotion-recognition
MVGL
TCyb 2018: Graph learning for multiview clustering
Stars: ✭ 26 (-27.78%)
Mutual labels:  multimodal
NSP-BERT
The code for our paper "NSP-BERT: A Prompt-based Zero-Shot Learner Through an Original Pre-training Task —— Next Sentence Prediction"
Stars: ✭ 166 (+361.11%)
Mutual labels:  zero-shot
Diverse-Structure-Inpainting
CVPR 2021: "Generating Diverse Structure for Image Inpainting With Hierarchical VQ-VAE"
Stars: ✭ 131 (+263.89%)
Mutual labels:  multimodal
docarray
The data structure for unstructured data
Stars: ✭ 561 (+1458.33%)
Mutual labels:  multimodal
HiGRUs
Implementation of the paper "Hierarchical GRU for Utterance-level Emotion Recognition" in NAACL-2019.
Stars: ✭ 60 (+66.67%)
Mutual labels:  emotion-recognition
slp
Utils and modules for Speech Language and Multimodal processing using pytorch and pytorch lightning
Stars: ✭ 17 (-52.78%)
Mutual labels:  multimodal
lowshot-shapebias
Learning low-shot object classification with explicit shape bias learned from point clouds
Stars: ✭ 37 (+2.78%)
Mutual labels:  few-shot
OpenVINO-EmotionRecognition
OpenVINO+NCS2/NCS+MutiModel(FaceDetection, EmotionRecognition)+MultiStick+MultiProcess+MultiThread+USB Camera/PiCamera. RaspberryPi 3 compatible. Async.
Stars: ✭ 51 (+41.67%)
Mutual labels:  emotion-recognition
emotic
PyTorch implementation of Emotic CNN methodology to recognize emotions in images using context information.
Stars: ✭ 57 (+58.33%)
Mutual labels:  emotion-recognition
Deep-Learning-for-Expression-Recognition-in-Image-Sequences
The project uses state of the art deep learning on collected data for automatic analysis of emotions.
Stars: ✭ 26 (-27.78%)
Mutual labels:  emotion-recognition
erc
Emotion recognition in conversation
Stars: ✭ 34 (-5.56%)
Mutual labels:  emotion-recognition
dissertation
🎓 📜 This repository holds my final year and dissertation project during my time at the University of Lincoln titled 'Deep Learning for Emotion Recognition in Cartoons'.
Stars: ✭ 22 (-38.89%)
Mutual labels:  emotion-recognition
LAVT-pytorch
LAVT: Language-Aware Vision Transformer for Referring Image Segmentation
Stars: ✭ 16 (-55.56%)
Mutual labels:  multimodal
RSTNet
RSTNet: Captioning with Adaptive Attention on Visual and Non-Visual Words (CVPR 2021)
Stars: ✭ 71 (+97.22%)
Mutual labels:  multimodal
CPG
Steven C. Y. Hung, Cheng-Hao Tu, Cheng-En Wu, Chien-Hung Chen, Yi-Ming Chan, and Chu-Song Chen, "Compacting, Picking and Growing for Unforgetting Continual Learning," Thirty-third Conference on Neural Information Processing Systems, NeurIPS 2019
Stars: ✭ 91 (+152.78%)
Mutual labels:  emotion-recognition
few-shot-lm
The source code of "Language Models are Few-shot Multilingual Learners" (MRL @ EMNLP 2021)
Stars: ✭ 32 (-11.11%)
Mutual labels:  few-shot
EmotionalConversionStarGAN
This repository contains code to replicate results from the ICASSP 2020 paper "StarGAN for Emotional Speech Conversion: Validated by Data Augmentation of End-to-End Emotion Recognition".
Stars: ✭ 92 (+155.56%)
Mutual labels:  emotion-recognition
NER-Multimodal-pytorch
Pytorch Implementation of "Adaptive Co-attention Network for Named Entity Recognition in Tweets" (AAAI 2018)
Stars: ✭ 42 (+16.67%)
Mutual labels:  multimodal

Modality-Transferable Emotion Embeddings for Low-Resource Multimodal Emotion Recognition

CC BY 4.0

Paper accepted at the AACL-IJCNLP 2020:

Modality-Transferable Emotion Embeddings for Low-Resource Multimodal Emotion Recognition, by Wenliang Dai, Zihan Liu, Tiezheng Yu, Pascale Fung.

[ACL Anthology][ArXiv][Semantic Scholar]

If your work is inspired by our paper, or you use any code snippets in this repo, please cite this paper, the BibTex is shown below:

@inproceedings{dai-etal-2020-modality,
    title = "Modality-Transferable Emotion Embeddings for Low-Resource Multimodal Emotion Recognition",
    author = "Dai, Wenliang  and
      Liu, Zihan  and
      Yu, Tiezheng  and
      Fung, Pascale",
    booktitle = "Proceedings of the 1st Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processing",
    month = dec,
    year = "2020",
    address = "Suzhou, China",
    publisher = "Association for Computational Linguistics",
    url = "https://www.aclweb.org/anthology/2020.aacl-main.30",
    pages = "269--280",
    abstract = "Despite the recent achievements made in the multi-modal emotion recognition task, two problems still exist and have not been well investigated: 1) the relationship between different emotion categories are not utilized, which leads to sub-optimal performance; and 2) current models fail to cope well with low-resource emotions, especially for unseen emotions. In this paper, we propose a modality-transferable model with emotion embeddings to tackle the aforementioned issues. We use pre-trained word embeddings to represent emotion categories for textual data. Then, two mapping functions are learned to transfer these embeddings into visual and acoustic spaces. For each modality, the model calculates the representation distance between the input sequence and target emotions and makes predictions based on the distances. By doing so, our model can directly adapt to the unseen emotions in any modality since we have their pre-trained embeddings and modality mapping functions. Experiments show that our model achieves state-of-the-art performance on most of the emotion categories. Besides, our model also outperforms existing baselines in the zero-shot and few-shot scenarios for unseen emotions.",
}

Abstract

Despite the recent achievements made in the multi-modal emotion recognition task, two problems still exist and have not been well investigated: 1) the relationship between different emotion categories are not utilized, which leads to sub-optimal performance; and 2) current models fail to cope well with low-resource emotions, especially for unseen emotions. In this paper, we propose a modality-transferable model with emotion embeddings to tackle the aforementioned issues. We use pre-trained word embeddings to represent emotion categories for textual data. Then, two mapping functions are learned to transfer these embeddings into visual and acoustic spaces. For each modality, the model calculates the representation distance between the input sequence and target emotions and makes predictions based on the distances. By doing so, our model can directly adapt to the unseen emotions in any modality since we have their pre-trained embeddings and modality mapping functions. Experiments show that our model achieves state-of-the-art performance on most of the emotion categories. In addition, our model also outperforms existing baselines in the zero-shot and few-shot scenarios for unseen emotions.

Dataset

We use the pre-processed features from the CMU-Multimodal SDK.

Or you can directly download the data from here.

Preparation for running

  1. Create a new folder named data at the root of this project

  2. Download Emotion Embeddings from here, and then put it in the $data$ folder.

  3. Download data

    • For a quick run
      • Just download our saved torch.utils.data.dataset.Dataset datasets from here, unzip it at the root of this project.
    • For a normal run
      • Download the data from here
      • Check the data_folder_structure.txt file, which shows the structure about how to organize data files
      • Put data files correspondingly
  4. Good to go!

Command line arguments and examples

usage: main.py [-h] -bs BATCH_SIZE -lr LEARNING_RATE [-wd WEIGHT_DECAY] -ep
               EPOCHS [-es EARLY_STOP] [-cu CUDA] [-mo MODEL] [-fu FUSION]
               [-cl CLIP] [-sc] [-se SEED] [-pa PATIENCE] [-ez] [--loss LOSS]
               [--optim OPTIM] [--threshold THRESHOLD] [--verbose]
               [-mod MODALITIES] [--valid] [--test] [--dataset DATASET]
               [--aligned] [--data-seq-len DATA_SEQ_LEN]
               [--data-folder DATA_FOLDER] [--glove-emo-path GLOVE_EMO_PATH]
               [--cap] [--iemocap4] [--iemocap9] [--zsl ZSL]
               [--zsl-test ZSL_TEST] [--fsl FSL] [--ckpt CKPT] [-dr DROPOUT]
               [-nl NUM_LAYERS] [-hs HIDDEN_SIZE]
               [-hss HIDDEN_SIZES [HIDDEN_SIZES ...]] [-bi] [--gru]
               [--hidden-dim HIDDEN_DIM]

Multimodal Emotion Recognition

optional arguments:
  -h, --help            show this help message and exit
  -bs BATCH_SIZE, --batch-size BATCH_SIZE
                        Batch size
  -lr LEARNING_RATE, --learning-rate LEARNING_RATE
                        Learning rate
  -wd WEIGHT_DECAY, --weight-decay WEIGHT_DECAY
                        Weight decay
  -ep EPOCHS, --epochs EPOCHS
                        Number of epochs
  -es EARLY_STOP, --early-stop EARLY_STOP
                        Early stop
  -cu CUDA, --cuda CUDA
                        Cude device number
  -mo MODEL, --model MODEL
                        Model type: mult/rnn/transformer/eea
  -fu FUSION, --fusion FUSION
                        Modality fusion type: ef/lf
  -cl CLIP, --clip CLIP
                        Use clip to gradients
  -sc, --scheduler      Use scheduler to optimizer
  -se SEED, --seed SEED
                        Random seed
  -pa PATIENCE, --patience PATIENCE
                        Patience of the scheduler
  -ez, --exclude-zero   Exclude zero in evaluation
  --loss LOSS           loss function: l1/mse/ce/bce
  --optim OPTIM         optimizer function: adam/sgd
  --threshold THRESHOLD
                        Threshold of for multi-label emotion recognition
  --verbose             Verbose mode to print more logs
  -mod MODALITIES, --modalities MODALITIES
                        What modalities to use
  --valid               Valid mode
  --test                Test mode
  --dataset DATASET     Dataset to use
  --aligned             Aligned experiment or not
  --data-seq-len DATA_SEQ_LEN
                        Data sequence length
  --data-folder DATA_FOLDER
                        path for storing the dataset
  --glove-emo-path GLOVE_EMO_PATH
  --cap                 Capitalize the first letter of emotion words
  --iemocap4            Only use 4 emtions in IEMOCAP
  --iemocap9            Only use 9 emtions in IEMOCAP
  --zsl ZSL             Do zero shot learning on which emotion (index)
  --zsl-test ZSL_TEST   Notify which emotion was zsl before
  --fsl FSL             Do few shot learning on which emotion (index)
  --ckpt CKPT
  -dr DROPOUT, --dropout DROPOUT
                        dropout
  -nl NUM_LAYERS, --num-layers NUM_LAYERS
                        num of layers of LSTM
  -hs HIDDEN_SIZE, --hidden-size HIDDEN_SIZE
                        hidden vector size of LSTM
  -hss HIDDEN_SIZES [HIDDEN_SIZES ...], --hidden-sizes HIDDEN_SIZES [HIDDEN_SIZES ...]
                        hidden vector size of LSTM
  -bi, --bidirectional  Use Bi-LSTM
  --gru                 Use GRU rather than LSTM
  --hidden-dim HIDDEN_DIM
                        Transformers hidden unit size

Run the code

main.py is the entry file of the whole project, use corresponding CLIs for different purposes.

Training

Training the model on the CMU-MOSEI dataset

python main.py --cuda=0 -bs=64 -lr=1e-3 -ep=100 --model=eea -bi --hidden-sizes 300 200 100 --num-layers=2 --dropout=0.15 --data-folder=./data/cmu-mosei/ --data-seq-len=20 --dataset=mosei_emo --aligned --loss=bce --clip=1.0 --early-stop=8 -mod=tav --patience=5   

Training the model on the IEMOCAP dataset

python main.py --cuda=0 -bs=64 -lr=1e-3 -ep=100 --model=eea --data-folder=./data/iemocap/ --data-seq-len=50 --dataset=iemocap --loss=bce --clip=1.0 --early-stop=8 --hidden-sizes 300 200 100 -mod=tav --patience=5 --aligned -bi --num-layers=2 --dropout=0.15

Training a early fusion lstm baseline

python main.py --cuda=0 -bs=64 -lr=1e-3 -ep=100 --model=rnn --fusion=ef --data-folder=./data/iemocap/ --data-seq-len=50 --dataset=iemocap --loss=bce --clip=1.0 --early-stop=8 --hidden-sizes 300 200 100 -mod=tav --patience=5 --aligned -bi --num-layers=2 --dropout=0.15

Validating and testing

If you only want to do a validation or testing on a trained model, you can add a --valid or --test flag to the original command, and also include --ckpt=[PathToSavedCheckpoint] to indicate the path of the trained model.

Zero-shot learning (ZSL)

Add a --zsl=[EmotionIndex] cli to the original training command, in which the EmotionIndex is the index of the emotion category that you want to do zero-shot on. As mentioned in the paper, due to different strategies for CMU-MOSEI and IEMOCAP datasets, --zsl=[EmotionIndex] has slightly different meaning for them, we list the correct cli here:

For CMU-MOSEI (ZSL emotion data will be removed from the training data),

  • --zsl=0, do ZSL on anger
  • --zsl=1, do ZSL on disgust
  • --zsl=2, do ZSL on fear
  • --zsl=3, do ZSL on happy
  • --zsl=4, do ZSL on sad
  • --zsl=5, do ZSL on surprise

For IEMOCAP (the training data remains unchanged, as ZSL emotion is from extra low-resource data),

  • --zsl=1, do ZSL on excited
  • --zsl=4, do ZSL on surprised
  • --zsl=5, do ZSL on frustrated

Few-shot learning (FSL)

For few-shot learning, the logic is similar to ZSL, just use --fsl=[EmotionIndex]

Requirements

  1. Python 3.6 +
  2. PyTorch 1.4 +
  3. Nvidia GTX 1080Ti GPU (or more advanced)
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].