AlexGidiotis / Multimodal-Gesture-Recognition-with-LSTMs-and-CTC

Licence: MIT license

An end-to-end system that performs temporal recognition of gesture sequences using speech and skeletal input. The model combines three networks with a CTC output layer that recognises gestures from continuous stream.

Programming Languages

python

139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Multimodal-Gesture-Recognition-with-LSTMs-and-CTC

Kerasdeepspeech

A Keras CTC implementation of Baidu's DeepSpeech for model experimentation

Stars: ✭ 245 (+880%)

Mutual labels: speech, ctc

torch-asg

Auto Segmentation Criterion (ASG) implemented in pytorch

Stars: ✭ 42 (+68%)

Mutual labels: speech, ctc

Neural sp

End-to-end ASR/LM implementation with PyTorch

Stars: ✭ 408 (+1532%)

Mutual labels: speech, ctc

speech recognition ctc

Use ctc to do chinese speech recognition by keras / 通过keras和ctc实现中文语音识别

Stars: ✭ 40 (+60%)

Mutual labels: speech, ctc

Pytorch Asr

ASR with PyTorch

Stars: ✭ 124 (+396%)

Mutual labels: speech, ctc

Volute

Raspberry Pi + Nodejs = Speech Robot

Stars: ✭ 224 (+796%)

Mutual labels: speech

Wavegrad

Implementation of Google Brain's WaveGrad high-fidelity vocoder (paper: https://arxiv.org/pdf/2009.00713.pdf). First implementation on GitHub.

Stars: ✭ 245 (+880%)

Mutual labels: speech

Speech Enhancement

Deep learning for audio denoising

Stars: ✭ 207 (+728%)

Mutual labels: speech

Naver-AI-Hackathon-Speech

2019 Clova AI Hackathon : Speech - Rank 12 / Team Kai.Lib

Stars: ✭ 26 (+4%)

Mutual labels: speech

Speechbrain.github.io

The SpeechBrain project aims to build a novel speech toolkit fully based on PyTorch. With SpeechBrain users can easily create speech processing systems, ranging from speech recognition (both HMM/DNN and end-to-end), speaker recognition, speech enhancement, speech separation, multi-microphone speech processing, and many others.

Stars: ✭ 242 (+868%)

Mutual labels: speech

Neural Voice Cloning With Few Samples

Implementation of Neural Voice Cloning with Few Samples Research Paper by Baidu

Stars: ✭ 211 (+744%)

Mutual labels: speech

Source separation

Deep learning based speech source separation using Pytorch

Stars: ✭ 226 (+804%)

Mutual labels: speech

Voice Gender

Gender recognition by voice and speech analysis

Stars: ✭ 248 (+892%)

Mutual labels: speech

Speech Denoiser

A speech denoise lv2 plugin based on RNNoise library

Stars: ✭ 220 (+780%)

Mutual labels: speech

idear

🎙️ Handsfree Audio Development Interface

Stars: ✭ 84 (+236%)

Mutual labels: speech

Tts Cube

End-2-end speech synthesis with recurrent neural networks

Stars: ✭ 213 (+752%)

Mutual labels: speech

Tacotron pytorch

PyTorch implementation of Tacotron speech synthesis model.

Stars: ✭ 242 (+868%)

Mutual labels: speech

browser-apis

🦄 Cool & Fun Browser Web APIs 🥳

Stars: ✭ 21 (-16%)

Mutual labels: speech

Lhotse

Stars: ✭ 236 (+844%)

Mutual labels: speech

Gcc Nmf

Real-time GCC-NMF Blind Speech Separation and Enhancement

Stars: ✭ 231 (+824%)

Mutual labels: speech

View All Similar Projects ➔

Multimodal-Gesture-Recognition-with-LSTMs-and-CTC

This repository contains code for my diploma thesis MULTIMODAL GESTURE RECOGNITION WITH THE USE OF DEEP LEARNING.

Overview

An end-to-end system that performs temporal recognition of gesture sequences using speech and skeletal input. The model combines two LSTM networks with a CTC output layer that spot and classify gestures from two continuous streams.

The basic modules of the model are two bidirectional LSTMs. The first extracts features from speech and the second from skeletal data. Then another bidirectional LSTM combines the uni-modal features and performs the gesture recognition.

Here we provide code for:

a) A BLSTM network for speech recognition.

b) A BLSTM network for skeletal recognition.

c) A BLSTM network that fuses the two uni-modal networks.

d) An implementation of the CTC loss output.

e) Decoders for the different networks.

f) Sample code for skeletal and speech feature extraction.

We used keras and tensorflow to build our model.

This project was built for the ChaLearn 2013 dataset. We trained and tested the model using the dataset of the challenge. The data can be downloaded here. http://sunai.uoc.edu/chalearn/#tabs-2

This model achieves 94% accuracy on the test set of the ChaLearn 2013 challenge.

Usage

In order to train the models provided here you need to preprocess the data:

MFCC features need to be extracted from the audio .wav files. We used 13 MFCC features as well as the first and second order derivatives (total 39 features). We used the HTK toolkit to extract the features. Here we just provide the configuration file for HCopy (the feature extraction tool for HTK). If you want to use HTK for this purpose you can find it here http://htk.eng.cam.ac.uk/
Once the MFCC features are extracted just put the training data all in one big csv file along with the labels (same for the validation and test data) and you are ready to train the speech lstm network.
For the skeletal features you should provide the joint positions for each file in one csv file each and run the following scripts:

a) extract_activity_feats.py

b) gather_skeletal.py

c) skeletal_feature_extraction.py
Run util/mix_data.py to mix some of the dev data to the training set.
Now you are ready to train the skeletal lstm network.
Once both networks are trained you can train the multimodal fusion network.
Use the sequence_decoding.py script to evaluate the trained model with test data.

The training of the complete system takes approximately 100 hours in an nvidia 1060 gtx.

Requirements

Python
NumPy
Pandas
SciPy
OpenCV
scikit-learn
Tensorflow
Keras

Run pip install -r requirements.txt to install the requirements.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

AlexGidiotis / Multimodal-Gesture-Recognition-with-LSTMs-and-CTC

Programming Languages

Labels

Projects that are alternatives of or similar to Multimodal-Gesture-Recognition-with-LSTMs-and-CTC

Multimodal-Gesture-Recognition-with-LSTMs-and-CTC

Overview

Usage

Requirements