All Projects → udion → pose2action

udion / pose2action

Licence: other
experiments on classifying actions using poses

Programming Languages

Jupyter Notebook
11667 projects
python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to pose2action

Action-Localization
Action-Localization, Atomic Visual Actions (AVA) Dataset
Stars: ✭ 22 (-8.33%)
Mutual labels:  action-recognition, ntu-rgbd
torch-lrcn
An implementation of the LRCN in Torch
Stars: ✭ 85 (+254.17%)
Mutual labels:  lstm, action-recognition
Robust-Deep-Learning-Pipeline
Deep Convolutional Bidirectional LSTM for Complex Activity Recognition with Missing Data. Human Activity Recognition Challenge. Springer SIST (2020)
Stars: ✭ 20 (-16.67%)
Mutual labels:  lstm, action-recognition
Actionrecognition
Explore Action Recognition
Stars: ✭ 139 (+479.17%)
Mutual labels:  lstm, action-recognition
Ms G3d
[CVPR 2020 Oral] PyTorch implementation of "Disentangling and Unifying Graph Convolutions for Skeleton-Based Action Recognition"
Stars: ✭ 225 (+837.5%)
Mutual labels:  skeleton, action-recognition
Awesome Skeleton Based Action Recognition
Skeleton-based Action Recognition
Stars: ✭ 360 (+1400%)
Mutual labels:  skeleton, action-recognition
Video Classification
Tutorial for video classification/ action recognition using 3D CNN/ CNN+RNN on UCF101
Stars: ✭ 543 (+2162.5%)
Mutual labels:  lstm, action-recognition
MSAF
Offical implementation of paper "MSAF: Multimodal Split Attention Fusion"
Stars: ✭ 47 (+95.83%)
Mutual labels:  action-recognition, ntu-rgbd
MTL-AQA
What and How Well You Performed? A Multitask Learning Approach to Action Quality Assessment [CVPR 2019]
Stars: ✭ 38 (+58.33%)
Mutual labels:  lstm, action-recognition
nixkell
A simple Nix-Haskell skeleton
Stars: ✭ 63 (+162.5%)
Mutual labels:  skeleton
learningspoons
nlp lecture-notes and source code
Stars: ✭ 29 (+20.83%)
Mutual labels:  lstm
LSTM-sentiment-analysis
LSTM sentiment analysis. Please look at my another repo for SVM and Naive algorithem
Stars: ✭ 19 (-20.83%)
Mutual labels:  lstm
lstm-numpy
Vanilla LSTM with numpy
Stars: ✭ 17 (-29.17%)
Mutual labels:  lstm
service-skeleton
Microservice skeleton based on yii2 framework.
Stars: ✭ 14 (-41.67%)
Mutual labels:  skeleton
algorithmia
No description or website provided.
Stars: ✭ 15 (-37.5%)
Mutual labels:  lstm
UAV-Human
[CVPR2021] UAV-Human: A Large Benchmark for Human Behavior Understanding with Unmanned Aerial Vehicles
Stars: ✭ 122 (+408.33%)
Mutual labels:  action-recognition
QTextRecognizer
A gui for tesseractOCR with some preprocessing image options (OpenCV) for improve character recognition
Stars: ✭ 27 (+12.5%)
Mutual labels:  lstm
glfw-skeleton
💀 A skeleton OpenGL C++ app bootstrapped with glfw, glad, and glm.
Stars: ✭ 24 (+0%)
Mutual labels:  skeleton
turbofan failure
Aircraft engine failure prediction model
Stars: ✭ 23 (-4.17%)
Mutual labels:  lstm
LSTM-Time-Series-Analysis
Using LSTM network for time series forecasting
Stars: ✭ 41 (+70.83%)
Mutual labels:  lstm

Action-Recognition

Objective

Given a Video containing Human body Motion you have to recognize the action agent is performing.

Solution Approaches

We started with Action Recognition from skeleton estimates of Human Body. Given 3D ground truth coordinates of Human Body (obtained from Kinect Cameras) we tried to use LSTMS as well as Temporal Convolutions for learning skeleton representation of Human Activity Recognition.

We also tried fancier LSTMs as well where we projected the 3D coordinates onto x-y plane, y-z plane, z-x plane followed by 1D convolutions and subsequently adding the outputs of the 4 LSTMs (x-y, y-z, z-x, 3D). Additionally we tried variants where we chose three out of the four LSTMs and compared performance among different projections.

Then we moved to Action Recognition from Videos. We used pretrained Hourglass Network to estimate joints at each frame in videos and used similar LSTMs to perform the task of Action Recognition.

Dataset

We have used NTU-RGBD Action dataset in this project. It consists of 60 classes of various Human Activities and consist of 56,880 action samples. Of these 60 classes we removed the last 11 classes consisting of multiple people. We trained most our models on subsets of this dataset consisting of

Action label Id
drink water 0
throw 1
tear up paper 2
take off glasses 3
put something inside pocket / take out something from pocket 4
pointing to something with finger 5
wipe face 6
falling 7

or

Action label Id
drink water 0
wear jacket 1
Handwaving 2
Kick something 3
salute 4

We have also trained a some models on the complete dataset using 49 classes.

Pipeline

The input is a sequence of frames (i.e video) which first passes through a trained model available here. This produces the estmates for the pose in 3D, this 3D pose passes through our network (which takes it various projections) and is used as the main features to classify the action from the above 8 categories.

input

x net

2d 3d
input input

main model0

predicted action : tear up paper (check the load_testbed in notebook to verify this example)

We also tried many different variations for our classifier model, which includes simple 2 layered LSTM network, another type of variation included LSTM models based on only some of the 2d projection of the pose (say XY or YZ or ZX) etc.

Some Results

Data Classifier Results (Accuracy) (val%, train%)
Ground-Truth-Skeleton - 5 classes Single LSTM, 3D coordinates 75.5%, 79.5%
Ground-Truth-Skeleton - 5 classes 2-Stacked LSTMs, 3D coordinates 77.1%, 80.4%
Ground-Truth-Skeleton - 5 classes 3-Stacked LSTMs, 3D coordinates 77.2%, 85.6%
Ground-Truth-Skeletons - 49 classes 2-Stacked LSTMs, 3D coordinates 59.7%, 72.5%
Hourglass-Predicted-Skeletons - 8 classes 2-Stacked LSTMs, 3D coordinates 81.25%
Hourglass-Predicted-Skeletons - 8 classes 2D + 3D Projection LSTMs + 1D conv + fusion 82.57%
Hourglass-Predicted-Skeletons - 8 classes All 2D Projection LSTMs + 1D conv + fusion 77.235%
Hourglass-Predicted-Skeletons - 8 classes X-Y projection only + 1D conv 75.36%
Hourglass-Predicted-Skeletons - 8 classes Y-Z projection only + 1D conv 72.94%
Hourglass-Predicted-Skeletons - 8 classes Z-Y projection only + 1D conv 73.86%
Hourglass-Predicted-Skeletons - 49 classes 2-Stacked LSTM, 3D coordinates 54.56%

For the above mentioned 8 classes, some of the top accuracies models and their learning curve is shown below. Note that some of the models are not fully trained and will possibly score higher if training is completed.



Here are the plots of the losses and accuracies of some of the best models

  • 3D+2D projections LSTMS (82.7% accuracy)

  • all 2D projections (77.235% accuracy)

  • Simple 2-Stacked LSTM (81.25% accuracy)

  • Simple 2-Stacked LSTM on entire ground-truth data (59.7% accuracy)

  • Simple 2-Stacked LSTM on entire estimated data (54.5% accuracy)

Some observations

One would expect that the accuracy obtained using only the x-y coordinates would be significantly lesser than that obtained using the 3D pose data. However, we find that the accuracy is 75.26% and 82.57% respectively, which means that the addition of the z coordinate does not affect action recognition as greatly as one would initially expect.
Also, when trained on the entire data, we get an accuracy of almost 60% using a simple doubly stacked LSTM and when using the pose estimated by the hourglass model, we get an accuracy of around 55%. This is better than expected, considering that the ground truth pose has 25 joints and the estimated pose has only 16 as this means that using estimates for pose in place of the ground truth pose does not lead to a very large decrease in accuracy.

Requirements

Kindly use the requirements.txt to set up your machine for replicating this experiment. some dependendecies are :

matplotlib==2.1.1
numpy==1.11.3
torch==0.3.0
scipy==1.0.0
tensorflow==1.3.0
pandas==0.22.0

you can install these dependecies using pip install requirements.txt

Instructions

To train the models run python LSTM_classifierX3cuda<one_of_model_names>.py in the src folder. This will start the training for 50 epochs and keep saving the best and the last model so far along with the accuracy and loss results in tr_models and outputs respectively.

References

  • For the purpose of this experiment to get the poses from the the images and videos we are using the awesome repository https://github.com/xingyizhou/pytorch-pose-hg-3d by @xingyizhou

  • Which is based on this paper

  • Note that some of the ideas that we implemented and tested are something new in the sense that its not has been presented in any paper yet.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].