All Projects → habla-liaa → ser-with-w2v2

habla-liaa / ser-with-w2v2

Licence: other
Official implementation of INTERSPEECH 2021 paper 'Emotion Recognition from Speech Using Wav2vec 2.0 Embeddings'

Programming Languages

Jupyter Notebook
11667 projects
python
139335 projects - #7 most used programming language
shell
77523 projects

Projects that are alternatives of or similar to ser-with-w2v2

SER-datasets
A collection of datasets for the purpose of emotion recognition/detection in speech.
Stars: ✭ 74 (+85%)
Mutual labels:  speech, speech-emotion-recognition
wav2vec2-live
A live speech recognition using Facebooks wav2vec 2.0 model.
Stars: ✭ 205 (+412.5%)
Mutual labels:  speech, wav2vec2
nlp-class
A Natural Language Processing course taught by Professor Ghassemi
Stars: ✭ 95 (+137.5%)
Mutual labels:  speech
spokestack-android
Extensible Android mobile voice framework: wakeword, ASR, NLU, and TTS. Easily add voice to any Android app!
Stars: ✭ 52 (+30%)
Mutual labels:  speech
fade
A Simulation Framework for Auditory Discrimination Experiments
Stars: ✭ 12 (-70%)
Mutual labels:  speech
opensnips
Open source projects related to Snips https://snips.ai/.
Stars: ✭ 50 (+25%)
Mutual labels:  speech
jackpair
p2p speech encrypting device with analog audio interface suitable for GSM phones
Stars: ✭ 26 (-35%)
Mutual labels:  speech
Voice2Mesh
CVPR 2022: Cross-Modal Perceptionist: Can Face Geometry be Gleaned from Voices?
Stars: ✭ 67 (+67.5%)
Mutual labels:  speech
torch-asg
Auto Segmentation Criterion (ASG) implemented in pytorch
Stars: ✭ 42 (+5%)
Mutual labels:  speech
MelNet-SpeechGeneration
Implementation of MelNet in PyTorch to generate high-fidelity audio samples
Stars: ✭ 19 (-52.5%)
Mutual labels:  speech
nabaztag-php
a simple php implementation of a Nabaztag server
Stars: ✭ 14 (-65%)
Mutual labels:  speech
LIGHT-SERNET
Light-SERNet: A lightweight fully convolutional neural network for speech emotion recognition
Stars: ✭ 20 (-50%)
Mutual labels:  speech-emotion-recognition
speech recognition ctc
Use ctc to do chinese speech recognition by keras / 通过keras和ctc实现中文语音识别
Stars: ✭ 40 (+0%)
Mutual labels:  speech
speech to text
how to use the Google Cloud Speech API to transcribe audio/video files.
Stars: ✭ 35 (-12.5%)
Mutual labels:  speech
Fre-GAN-pytorch
Fre-GAN: Adversarial Frequency-consistent Audio Synthesis
Stars: ✭ 73 (+82.5%)
Mutual labels:  speech
TASNET
Time-domain Audio Separation Network (IN PYTORCH)
Stars: ✭ 18 (-55%)
Mutual labels:  speech
HTK
The Hidden Markov Model Toolkit (HTK) from University of Cambridge, with fixed issues.
Stars: ✭ 23 (-42.5%)
Mutual labels:  speech
ttslearn
ttslearn: Library for Pythonで学ぶ音声合成 (Text-to-speech with Python)
Stars: ✭ 158 (+295%)
Mutual labels:  speech
jarvis
Jarvis Home Automation
Stars: ✭ 81 (+102.5%)
Mutual labels:  speech
wikipron
Massively multilingual pronunciation mining
Stars: ✭ 167 (+317.5%)
Mutual labels:  speech

Official implementation of 'Emotion Recognition from Speech Using Wav2vec 2.0 Embeddings'

Requirements:

We recommend running these scripts using a virtual environment like Anaconda, which should have Tensorflow 2.4.1 and PyTorch 1.7.1 installed.

Install required python packages:

pip install -r requirements.txt

Install sox and libmediainfo in your system

sudo apt-get install sox
sudo apt-get install libmediainfo-dev

RAVDESS and IEMOCAP datasets need to be downloaded and placed at ~/Datasets with a folder structure like this:

├── IEMOCAP
│   ├── Documentation
│   ├── README.txt
│   ├── Session1
│   ├── Session2
│   ├── Session3
│   ├── Session4
│   └── Session5
└── RAVDESS
    └── RAVDESS
        ├── song
        └── speech

Replicating our experiments

In our paper we run many different experiments using 5 seeds for each one. If you want to replicate that procedure, run in a terminal:

./run_seeds.sh <output_path>

If you want to run just 1 seed:

./run_paper_experiments.sh <seed_number> <output_path>

If you don't want to run all the experiments performed in the paper, comment the unwanted experiments in the run_paper_experiments.sh script. For example, our best performing model is trained using the following lines:

#w2v2PT-fusion
errors=1
while (($errors!=0)); do
paiprun configs/main/w2v2-os-exps.yaml --output_path "${OUTPUT_PATH}/w2v2PT-fusion/${SEED}" --mods "${seed_mod}&global/wav2vec2_embedding_layer=enc_and_transformer&global/normalize=global"
errors=$?; done

The experiments outputs will be saved at <output_path>. A cache folder will be generated at the directory from which above line is called. Take into account that run_seeds.sh executes many experiments (all the presented in the paper), and repeats it 5 times (using different seeds for the random number generators), so it is expected that the process takes a very long time and drive space. We ran the experiments using multiple AWS P3.2x large instances, which have a Tesla V100 GPU.

Analyzing the outputs

The outputs saved at <output_path> can be examined from Python using joblib. For example, running:

import joblib
metrics = joblib.load('experiments/w2v2PT-fusion/0123/MainTask/DownstreamRavdess/RavdessMetrics/out')

will load the resulting metrics in the 'metrics' variable.

In this notebook, more examples of how the generated outputs can be analysed are given. Moreover, we provide the results from all our experiments in the experiments folder and the results.ipynb notebook will generate the tables of our paper.

🔥🔥🔥 Using pretrained models 🔥🔥🔥

⚠️WARNING: The models we trained, as most speech emotion recognition models, are very unlikely to generalize to datasets other than the used for training, which are recorded in clean conditions and with actors⚠️

Model Dataset Links
w2v2PT-fusion IEMOCAP Folds: 1 2 3 4 5
w2v2PT-fusion RAVDESS Model
w2v2PT-alllayers-global IEMOCAP Folds: 1 2 3 4 5
w2v2PT-alllayers-global RAVDESS Model
w2v2PT-alllayers IEMOCAP Folds: 1 2 3 4 5
w2v2PT-alllayers RAVDESS Model
Issa et al. eval setup RAVDESS Folds: 1 2 3 4 5

Cite as: Pepino, L., Riera, P., Ferrer, L. (2021) Emotion Recognition from Speech Using wav2vec 2.0 Embeddings. Proc. Interspeech 2021, 3400-3404, doi: 10.21437/Interspeech.2021-703

@inproceedings{pepino21_interspeech,
  author={Leonardo Pepino and Pablo Riera and Luciana Ferrer},
  title={{Emotion Recognition from Speech Using wav2vec 2.0 Embeddings}},
  year=2021,
  booktitle={Proc. Interspeech 2021},
  pages={3400--3404},
  doi={10.21437/Interspeech.2021-703}
}
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].