All Projects → glam-imperial → EmotionalConversionStarGAN

glam-imperial / EmotionalConversionStarGAN

Licence: other
This repository contains code to replicate results from the ICASSP 2020 paper "StarGAN for Emotional Speech Conversion: Validated by Data Augmentation of End-to-End Emotion Recognition".

Programming Languages

python
139335 projects - #7 most used programming language
shell
77523 projects

Projects that are alternatives of or similar to EmotionalConversionStarGAN

Data Augmentation Review
List of useful data augmentation resources. You will find here some not common techniques, libraries, links to github repos, papers and others.
Stars: ✭ 785 (+753.26%)
Mutual labels:  generative-adversarial-network, data-augmentation
coursera-gan-specialization
Programming assignments and quizzes from all courses within the GANs specialization offered by deeplearning.ai
Stars: ✭ 277 (+201.09%)
Mutual labels:  generative-adversarial-network, data-augmentation
Interaction-Aware-Attention-Network
[ICASSP19] An Interaction-aware Attention Network for Speech Emotion Recognition in Spoken Dialogs
Stars: ✭ 32 (-65.22%)
Mutual labels:  emotion-recognition, icassp
OpenVINO-EmotionRecognition
OpenVINO+NCS2/NCS+MutiModel(FaceDetection, EmotionRecognition)+MultiStick+MultiProcess+MultiThread+USB Camera/PiCamera. RaspberryPi 3 compatible. Async.
Stars: ✭ 51 (-44.57%)
Mutual labels:  emotion-recognition
polyagamma
An efficient and flexible sampler of the Pólya-Gamma distribution with a NumPy/SciPy compatible interface.
Stars: ✭ 15 (-83.7%)
Mutual labels:  data-augmentation
FaceRecognition
Face Recognition in real-world images [ICASSP 2017]
Stars: ✭ 36 (-60.87%)
Mutual labels:  icassp
RecycleGAN
The simplest implementation toward the idea of Re-cycle GAN
Stars: ✭ 68 (-26.09%)
Mutual labels:  generative-adversarial-network
DeepSentiPers
Repository for the experiments described in the paper named "DeepSentiPers: Novel Deep Learning Models Trained Over Proposed Augmented Persian Sentiment Corpus"
Stars: ✭ 17 (-81.52%)
Mutual labels:  data-augmentation
private-data-generation
A toolbox for differentially private data generation
Stars: ✭ 80 (-13.04%)
Mutual labels:  generative-adversarial-network
keras-3dgan
Keras implementation of 3D Generative Adversarial Network.
Stars: ✭ 20 (-78.26%)
Mutual labels:  generative-adversarial-network
DeepFlow
Pytorch implementation of "DeepFlow: History Matching in the Space of Deep Generative Models"
Stars: ✭ 24 (-73.91%)
Mutual labels:  generative-adversarial-network
tt-vae-gan
Timbre transfer with variational autoencoding and cycle-consistent adversarial networks. Able to transfer the timbre of an audio source to that of another.
Stars: ✭ 37 (-59.78%)
Mutual labels:  generative-adversarial-network
ezgan
An extremely simple generative adversarial network, built with TensorFlow
Stars: ✭ 36 (-60.87%)
Mutual labels:  generative-adversarial-network
projects
things I help(ed) to build
Stars: ✭ 47 (-48.91%)
Mutual labels:  generative-adversarial-network
GAN-Ensemble-for-Anomaly-Detection
This repository is the PyTorch implementation of GAN Ensemble for Anomaly Detection.
Stars: ✭ 26 (-71.74%)
Mutual labels:  generative-adversarial-network
ADL2019
Applied Deep Learning (2019 Spring) @ NTU
Stars: ✭ 20 (-78.26%)
Mutual labels:  generative-adversarial-network
celeba-gan-pytorch
Generative Adversarial Networks in PyTorch
Stars: ✭ 35 (-61.96%)
Mutual labels:  generative-adversarial-network
TextBoxGAN
Generate text boxes from input words with a GAN.
Stars: ✭ 50 (-45.65%)
Mutual labels:  generative-adversarial-network
voicekit-examples
Examples on how to use Tinkoff Voicekit
Stars: ✭ 35 (-61.96%)
Mutual labels:  speech-synthesis
Tacotron pytorch
Tacotron implementation of pytorch
Stars: ✭ 12 (-86.96%)
Mutual labels:  speech-synthesis

EmotionalConversionStarGAN

This repository contains code to replicate results from the ICASSP 2020 paper "StarGAN for Emotional Speech Conversion: Validated by Data Augmentation of End-to-End Emotion Recognition".

  • stargan: code for training the Emotional StarGAN and performing emotional generation (originally here - https://github.com/max-elliott/StarGAN-Emotional-VC).

  • aug_evaluation: code for performing the data augmentation experiments (coming soon)

  • samples: some samples selectively (coming soon - checking with IEMOCAP if we can publicly share according to GDPR)

The IEMOCAP database requires the signing of an EULA; please communicate with the handlers: https://sail.usc.edu/iemocap/

Preparing

- Requirements:

  • python>3.7.0
  • pytorch
  • numpy
  • argparse
  • librosa
  • scikit-learn
  • tensorflow < 2.0
  • pyworld
  • matplotlib
  • yaml

- Clone repository:

git clone https://github.com/glam-imperial/EmotionalConversionStarGAN.git
cd EmotionalConversionStarGAN

- Download IEMOCAP dataset from https://sail.usc.edu/iemocap/

IEMOCAP Preprocessing

Running the script run_preprocessing.py will prepare the IEMOCAP as needed for training the model. It assumes that IEMOCAP is already downloaded and is stored in an arbitrary directory

with this file structure

<DIR>
  |- Session1  
  |     |- Annotations  
  |     |- Ses01F_impro01  
  |     |- Ses01F_impro02  
  |     |- ...  
  |- ...
  |- Session5
        |- Annotations
        |- Ses05F_impro01
        |- Ses05F_impro02
        |- ...

where Annotations is a directory holding the label .txt files for all Session (Ses01F_impro01.txt etc.), and each other directory (Ses01F_impro01, Ses01F_impro02 etc.) holds the .wav files for each scene in the session.

To preprocess run

python run_preprocessing.py --iemocap_dir <DIR> 

which will move all audio files to ./procesed_data/audio as well as extract all WORLD features and labels needed for training. It will only extract these for samples of the correct emotions (angry, sad, happy) and under the certain hardocded length threshold (to speed up training time). it will also create dictionaries for F0 statistics which are used to alter the F0 of a sample when converting. After running you should have a file structure:

./processed_data
 |- annotations
 |- audio
 |- f0
 |- labels
 |- world

Training EmotionStarGAN

Main training script is train_main.py. However to automatically train a three emotion model (angry, sad, happy) as it was trained for "StarGAN for Emotional Speech Conversion: Validated by Data Augmentation of End-to-End Emotion Recognition", simply call:

./full_training_script.sh

This script runs three steps:

  1. Runs classifier_train.py - Pretrains an auxiliary emotional classifier. Saves best checkpoint to ./checkpoints/cls_checkpoint.ckpt
  2. Runs main training for 200k iterations in --recon_only mode, meaning model learns to simply reconstruct the input audio.
  3. Trains model for a further 100k steps, introducing the pre-trained classifier.

A full training run will take ~24 hours on a decent GPU. The auxiliary emotional classifier can also be trained independently using classifier_train.py.

Sample Conversion

Once a model is trained you can convert IEMOCAP audio samples using convert.py. Running

python convert.py --checkpoint <path/to/model_checkpoint.ckpt> -o ./processed_data/converted

will load a model checkpoint and convert 10 random samples from the test set into each emotion and save the converted samples in /processed_data/converted (currently bugged: run conversion as stated below). Specifying an input directory will convert all the audio files in that directory:

python convert.py --checkpoint <path/to/model_checkpoint.ckpt> -i <path/to/wavs> -o ./processed_data/converted

They currently must be existing files in the IEMOCAP dataset. Code will be updated to convert arbitrary samples later.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].