Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → RayanWang → Speech_emotion_recognition_blstm

RayanWang / Speech_emotion_recognition_blstm

Licence: mit

Bidirectional LSTM network for speech emotion recognition.

Programming Languages

139335 projects - #7 most used programming language

Labels

attention-model

Projects that are alternatives of or similar to Speech emotion recognition blstm

The implementation of "End-to-End Multi-Task Learning with Attention" [CVPR 2019].

Stars: ✭ 364 (+79.31%)

Mutual labels: attention-model

Deep Visual Attention Prediction (TIP18)

Stars: ✭ 65 (-67.98%)

Mutual labels: attention-model

Image Caption Generator

A neural network to generate captions for an image using CNN and RNN with BEAM Search.

Stars: ✭ 126 (-37.93%)

Mutual labels: attention-model

Structured Self Attention

A Structured Self-attentive Sentence Embedding

Stars: ✭ 459 (+126.11%)

Mutual labels: attention-model

Sequence-to-sequence framework with a focus on Neural Machine Translation based on Apache MXNet

Stars: ✭ 990 (+387.68%)

Mutual labels: attention-model

ECG Classification

Stars: ✭ 78 (-61.58%)

Mutual labels: attention-model

Attention ocr.pytorch

This repository implements the the encoder and decoder model with attention model for OCR

Stars: ✭ 278 (+36.95%)

Mutual labels: attention-model

Pytorch Acnn Model

code of Relation Classification via Multi-Level Attention CNNs

Stars: ✭ 170 (-16.26%)

Mutual labels: attention-model

Awesome Attention Mechanism In Cv

计算机视觉中用到的注意力模块和其他即插即用模块PyTorch Implementation Collection of Attention Module and Plug&Play Module

Stars: ✭ 54 (-73.4%)

Mutual labels: attention-model

Linear Attention Recurrent Neural Network

A recurrent attention module consisting of an LSTM cell which can query its own past cell states by the means of windowed multi-head attention. The formulas are derived from the BN-LSTM and the Transformer Network. The LARNN cell with attention can be easily used inside a loop on the cell state, just like any other RNN. (LARNN)

Stars: ✭ 119 (-41.38%)

Mutual labels: attention-model

Neural Machine Translation with Keras

Stars: ✭ 501 (+146.8%)

Mutual labels: attention-model

Reading comprehension tf

Machine Reading Comprehension in Tensorflow

Stars: ✭ 37 (-81.77%)

Mutual labels: attention-model

Attention Gated Networks

Use of Attention Gates in a Convolutional Neural Network / Medical Image Classification and Segmentation

Stars: ✭ 1,237 (+509.36%)

Mutual labels: attention-model

Attention Ocr Chinese Version

Attention OCR Based On Tensorflow

Stars: ✭ 421 (+107.39%)

Mutual labels: attention-model

Code & data accompanying the NAACL 2019 paper "Bidirectional Attentive Memory Networks for Question Answering over Knowledge Bases"

Stars: ✭ 140 (-31.03%)

Mutual labels: attention-model

AttentionGAN for Unpaired Image-to-Image Translation & Multi-Domain Image-to-Image Translation

Stars: ✭ 341 (+67.98%)

Mutual labels: attention-model

Pytorch Attention Guided Cyclegan

Pytorch implementation of Unsupervised Attention-guided Image-to-Image Translation.

Stars: ✭ 67 (-67%)

Mutual labels: attention-model

Snli Entailment

attention model for entailment on SNLI corpus implemented in Tensorflow and Keras

Stars: ✭ 181 (-10.84%)

Mutual labels: attention-model

Soft attention mechanism for video caption generation

Stars: ✭ 154 (-24.14%)

Mutual labels: attention-model

Transformer image caption

Image Captioning based on Bottom-Up and Top-Down Attention model

Stars: ✭ 94 (-53.69%)

Mutual labels: attention-model

View All Similar Projects ➔

Speech_emotion_recognition_BLSTM

Bidirectional LSTM network for speech emotion recognition.

Environment:

Python 2.7/3.6
NVIDIA Geforce GTX 1060 6GB
Conda version 4.5

Dependencies

Tensorflow(1.6) for the backend of keras
keras(2.1.5) for building/training the Bidirectional LSTM network
librosa for audio resampling
pyAudioAnalysis for feature engineering
scikit learn for k-fold cross validation
Hyperas for fine-tuning hyper parameters and find best model
webrtcvad for sentence extraction
pydub for wav extraction

Datasets

Berlin speech dataset

Usage

Since the function "stFeatureSpeed" in pyAudioAnalysis is default unworkable, you have to modify the code in audioFeatureExtraction.py (for index related issue, just cast the value type to integer; for the issue in method stHarmonic, cast M to integer(M = int(M); Comment out the invocation of method 'mfccInitFilterBanks' in stFeatureSpeed).
If you run the code in python 3, please upgrade pyAudioAnalysis to the latest version that compatible with python 3.
You have to prepare at least two different sets of data, one for find the best model and the other for testing.

Long option	Option	Description
--dataset	-d	dataset type
--dataset_path	-p	dataset or the predicted data path
--load_data	-l	load dataset and dump the data stream to a .p file
--feature_extract	-e	extract features from data and dump to a .p file
--model_path	-m	the model path you want to load
--nb_classes	-c	the number of classes of your data
--speaker_indipendence	-s	cross validation is made using different actors for train and test sets

Example find_best_model.py:

python find_best_model.py -d "berlin" -p [berlin data path] -l -e -c 7

The first time you run the script, -l and -e options are mandatory since you need to load data and extract features.
Every time you change the training data and/or the method of feature engineering, you have to specify -l and/or -e respectively to update your .p files.
You can also modify the code for tuning other hyper parameters.

Example prediction.py:

python prediction.py -p [data path] -m [model path] -c 7

Example model_cross_validation.py:

python model_cross_validation.py -d "berlin" -p [berlin data path] -l -e -c 7

Use -s for k-fold cross validation in different actors.

Experimental result

Use hyperas for tuning optimizers, batch_size and epochs, the remaining parameters are the values applied to the paper below.
The average accuracy is about 68.60%(+/- 1.88%, through 10-fold cross validation, using Berlin dataset).

References

S. Mirsamadi, E. Barsoum, and C. Zhang, “Automatic speech emotion recognition using recurrent neural networks with local attention,” in 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), New Orleans, U.S.A., Mar. 2017, IEEE, pp. 2227–2231.
Fei Tao, Gang Liu, “Advanced LSTM: A Study about Better Time Dependency Modeling in Emotion Recognition,” Submitted to 2018 IEEE International Conference on Acoustics, Speech and Signal Processing.
Video from Microsoft Research

Future work

The training data I list above (Berlin) may insufficient, the validation accuracy and loss can't be improved while the training result is not good.
Given sufficient training examples, the parameters of short-term characterization, long-term aggregation, and the attention model can be jointly optimized for best performance.
Update the current network architecture to improve the accuracy (already in progress).

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 203

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (7) 🔗