All Projects → skelemoa → synse-zsl

skelemoa / synse-zsl

Licence: MIT license
Official PyTorch code for the ICIP 2021 paper 'Syntactically Guided Generative Embeddings For Zero Shot Skeleton Action Recognition'

Programming Languages

Jupyter Notebook
11667 projects
python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to synse-zsl

gzsl-od
Out-of-Distribution Detection for Generalized Zero-Shot Action Recognition
Stars: ✭ 47 (+235.71%)
Mutual labels:  action-recognition, zero-shot-learning, generalized-zero-shot-learning
Mmskeleton
A OpenMMLAB toolbox for human pose estimation, skeleton-based action recognition, and action synthesis.
Stars: ✭ 2,378 (+16885.71%)
Mutual labels:  action-recognition, skeleton-based-action-recognition
Keras-for-Co-occurrence-Feature-Learning-from-Skeleton-Data-for-Action-Recognition
Keras implementation for Co-occurrence-Feature-Learning-from-Skeleton-Data-for-Action-Recognition
Stars: ✭ 44 (+214.29%)
Mutual labels:  action-recognition, skeleton-based-action-recognition
ntu-x
NTU-X, which is an extended version of popular NTU dataset
Stars: ✭ 55 (+292.86%)
Mutual labels:  action-recognition, skeleton-based-action-recognition
tfvaegan
[ECCV 2020] Official Pytorch implementation for "Latent Embedding Feedback and Discriminative Features for Zero-Shot Classification". SOTA results for ZSL and GZSL
Stars: ✭ 107 (+664.29%)
Mutual labels:  action-recognition, zero-shot-learning
zero shot learning
A Visual-semantic embedding model using word2vec and CNNs
Stars: ✭ 13 (-7.14%)
Mutual labels:  zero-shot-learning
VidSitu
[CVPR21] Visual Semantic Role Labeling for Video Understanding (https://arxiv.org/abs/2104.00990)
Stars: ✭ 41 (+192.86%)
Mutual labels:  vision-and-language
UAV-Human
[CVPR2021] UAV-Human: A Large Benchmark for Human Behavior Understanding with Unmanned Aerial Vehicles
Stars: ✭ 122 (+771.43%)
Mutual labels:  action-recognition
TCE
This repository contains the code implementation used in the paper Temporally Coherent Embeddings for Self-Supervised Video Representation Learning (TCE).
Stars: ✭ 51 (+264.29%)
Mutual labels:  action-recognition
Zero-Shot-Learning
零样本学习
Stars: ✭ 20 (+42.86%)
Mutual labels:  zero-shot-learning
clip playground
An ever-growing playground of notebooks showcasing CLIP's impressive zero-shot capabilities
Stars: ✭ 80 (+471.43%)
Mutual labels:  vision-and-language
Two-Stream-CNN
Two Stream CNN implemented in Keras using in skeleton-based action recognition with dataset NTU RGB+D
Stars: ✭ 75 (+435.71%)
Mutual labels:  action-recognition
pose2action
experiments on classifying actions using poses
Stars: ✭ 24 (+71.43%)
Mutual labels:  action-recognition
iMIX
A framework for Multimodal Intelligence research from Inspur HSSLAB.
Stars: ✭ 21 (+50%)
Mutual labels:  vision-and-language
wikiHow paper list
A paper list of research conducted based on wikiHow
Stars: ✭ 25 (+78.57%)
Mutual labels:  vision-and-language
ICCV2021-Paper-Code-Interpretation
ICCV2021/2019/2017 论文/代码/解读/直播合集,极市团队整理
Stars: ✭ 2,022 (+14342.86%)
Mutual labels:  action-recognition
Pose2vec
A Repository for maintaining various human skeleton preprocessing steps in numpy and tensorflow along with tensorflow model to learn pose embeddings.
Stars: ✭ 25 (+78.57%)
Mutual labels:  action-recognition
cvxpnpl
A Perspective-n-Points-and-Lines method.
Stars: ✭ 56 (+300%)
Mutual labels:  pose
CBP
Official Tensorflow Implementation of the AAAI-2020 paper "Temporally Grounding Language Queries in Videos by Contextual Boundary-aware Prediction"
Stars: ✭ 52 (+271.43%)
Mutual labels:  vision-and-language
MSPN
Multi-Stage Pose Network
Stars: ✭ 40 (+185.71%)
Mutual labels:  pose

PWC PWC PWC PWC

SynSE - Syntactically Guided Generative Embeddings For Zero Shot Skeleton Action Recognition

Original PyTorch implementation for 'Syntactically Guided Generative Embeddings For Zero Shot Skeleton Action Recognition' , accepted at 'IEEE International Conference on Image Processing (ICIP) 2021'

TL;DR version of the work: HERE


Watch the video
Video Overview(Click on Image above)

Dependencies

  • Python >= 3.5
  • Torch == 1.2.0
  • Scikit-Learn

Data Preparation

Creating the test-train splits.

The unseen classes of the various splits are listed below. These splits are also provided under the synse_resources/resources/label_splits, which can be downloaded from here. Place the resources folder in the root synse-zsl directory. Random unseen 5 classes can be found in the ru5.npy file. This naming scheme is used for all splits. R-random, S-seen, U-unseen, V-validation split.

NTU-60:

Unseen Classes (55/5 split):

A11 reading A12 writing A20 put on a hat/cap A27 jump up A57 touch pocket

Unseen Classes (48/12 split):

A4 brush hair A6 pick up A10 clapping A13 tear up paper A16 put on shoe
A41 sneeze or cough A43 falling down A48 nausea or vomiting A52 pushing A57 touch pocket
A59 walking towards A60 walking apart

NTU-120:

Unseen Classes (110/10 split):

A5 drop A14 put on jacket A38 salute A44 headache A50 punch or slap
A66 juggle table tennis table A89 put object into bag A96 cross arms A100 butt kicks A107 wield knife

Unseen Classes (96/24 split):

A6 pick up A10 clapping A12 writing A17 take off shoe A19 take off glasses
A21 take off hat or cap A23 hand waving A30 type on keyboard A36 shake head A40 cross hands in front
A46 back pain A50 punch or slap A60 walking apart A69 thumb up A71 make ok sign
A82 fold paper A85 apply cream on face A88 take off bag A94 throw up cap or hat A95 capitulate
A105 blow nose A114 carry object A115 take photo A120 rock paper scissors

Visual Feature Generation:

We provide the visual features generated via SHIFT-GCN for the NTU-120 and NTU-60 dataset for the various splits. They can be found under the synse_resources/ntu_results repository, which is downloadable from here. train.npy contains the visual features of the training data from the seen classes. ztest.npy contains the test data from the unseen classes. gtest.npy contains the test data from all the classes.

If you wish to generate the visual features yourself:

  1. Download the NTU-60 and NTU-120 datasets by requesting them from here.
  2. Create the test-train-val splits for the datasets using the split file created in the previous steps.
  3. Train the visual feature generator. Follow this for training Shift-GCN. For each split a new feature generator has to be trained following the zero shot learning assumption. The trained Shift-GCN weights can be found under the repository. synse_resources/ntu_results/shift_5_r/weights/
  4. Save the features for train data, unseen test data(zsl) and the entire test data(gzsl).

Text feature generators

We provide the generated language features as well, for the labels in NTU-60, and NTU-120 dataset. They can be found in ./synse_resources/resources/. Place the resources folder in the root synse-zsl directory.

If you wish to generate the language features yourself.

  1. Word2Vec: Download the Pre-Trained Word2Vec Vectors and extract the contents of the archive. Generate the Word2Vec representations by using the gensim python module as described here
  2. For Sentence-BERT, we use the sentence-transformers package from here. We use the stsb-bert-large model.

Experiments

We provide the scripts necessary to obtain the results shown in the paper. They include training and evaluation scripts for ReViSE [1], JPoSE[2], CADA-VAE[3] and our model SynSE. The scripts for each of the three models are present in their respective folders (jpose, revise, synse).
A README is present in each folder detailing the use of the provided scripts for both training and evaluation.

References:

  1. Hubert Tsai, Yao-Hung, Liang-Kang Huang, and Ruslan Salakhutdinov. "Learning robust visual-semantic embeddings." In Proceedings of the IEEE International Conference on Computer Vision, pp. 3571-3580. 2017.

  2. Wray, Michael, Diane Larlus, Gabriela Csurka, and Dima Damen. "Fine-grained action retrieval through multiple parts-of-speech embeddings." In Proceedings of the IEEE International Conference on Computer Vision, pp. 450-459. 2019.

  3. Schonfeld, Edgar, Sayna Ebrahimi, Samarth Sinha, Trevor Darrell, and Zeynep Akata. "Generalized zero-and few-shot learning via aligned variational autoencoders." In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8247-8255. 2019.
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].