Dynamic 3D Hand Gesture Recognition by Learning Weighted Depth Motion Maps
Dynamic 3D Human Hand Gesture Recogntin on RGB-D videos with State of the Art results on public data sets. This Method Learns Human Actions with Aggregating of Spatio-Temporal Description from different representation. If this code helps with your research please consider citing following paper:
R. Azad, M. Asadi, S. Kasaei, Sergio Escalera "Dynamic 3D Hand Gesture Recognition by Learning Weighted Depth Motion Maps", IEEE Transaction on CSVT, 2018, download link.
Updates
- September 2, 2017: First release (Complete implemenation for MSR Action 3D data set)
- May 5, 2018: Complete implemenation for NTU RGB+D data set added. Accuracy rate 75.16 and 68.66 with deep and non deep features achieved respectively. It is worth to mention that our method achieved highest performance on depth data (75.16))
- July 14, 2018: Paper link in IEEE Transaction on Circuits and Systems for Video Technology
Prerequisties and Run
This code has been implemented in Matlab 2016a and tested in both Linux (ubuntu) and Windows 10, though should be compatible with any OS running Matlab. following Environement and Library needed to run the code:
- Matlab 2016
- VL feat 0.9.20
Run Demo
Run the Main_MSRAction3D()
for both feature extraction and classification of dynamic 3D action. The Main_MSRAction3D
uses Step1_Extract_Featues
for extracting spatio-temporal features from different represantion of 3D video and Step2_Description_Classification
for aggregating of descriptions and classification phase. These two functions can be use seperetely too. Function such as Video Summarization()
, Forward Bakward Motion()
, Difference Forward Energy()
, Temporal Sequence Generating()
, Binary Weighted Mapping()
, and extracting Regional LBP and HOG features()
has been implemented in 'Video_Analyser' class. the Description_Classification class
contains functions that related to Vlad representation and dimension reduction phase.
Quick Overview
Results
For evaluating the performance of the proposed method, three public data sets has been considered. In bellow, results of using three different strategies for 3D Action recognition demonstrated.
- Strategy 1 : Vlad Representation of Spatio-Temporal HOG Features from Different Representations
- Strategy 2 : Vlad Representation of Spatio-Temporal LBP Features from Different Representations
- Strategy 3 : Vlad Representation of Spatio-Temporal HOG+LBP Features from Different Representations
Data Set | Strategy 1 | Strategy 2 | Strategy 3 |
---|---|---|---|
MSR Gesture 3D | 96.22 | 96.52 | 98.05 |
SKIG | 95.0 | 95.60 | 97.31 |
MSR Action 3D | 91.94 | 91.57 | 95.24 |
NTU RGB+D | - | - | 75.16 deep |
Effect of Choosing number of Visual Words on each data set has been illustrated in the followin table:
Selecting number of Visual Words on each data sets related to number of classes on each data set. In the following table these information has been evaluated.
Number of Visual Words | 25 | 30 | 40 | 50 | 70 | 100 | 128 |
---|---|---|---|---|---|---|---|
MSR Gesture 3D | 98.05 | 97.50 | 97.50 | 96.94 | 96.66 | 96.38 | 96.38 |
SKIG | 97.13 | 97.22 | 96.67 | 96.48 | 96.76 | 96.30 | 96.02 |
MSR Action 3D | 92.31 | 93.04 | 93.04 | 94.14 | 95.24 | 93.77 | 93.77 |
Choosing appropriate number of PCA components
in the following table accuracy rate for choosing different amount of PCA components depicted.
PCA Components | 70 | 100 | 130 | 160 | 190 | 220 | 250 |
---|---|---|---|---|---|---|---|
MSR Gesture 3D | 97.50 | 97.77 | 98.05 | 97.50 | 98.05 | 97.50 | 97.50 |
SKIG | 96.57 | 97.13 | 97.22 | 97.31 | 97.31 | 96.94 | 97.31 |
MSR Action 3D | 94.54 | 94.87 | 95.24 | 95.25 | 94.87 | 94.87 | 94.87 |
Query
All implementation done by Reza Azad. For any query please contact us for more information.
rezazad68@gmail.com