Pytorch implementation of the paper "Glimpse Clouds: Human Activity Recognition from Unstructured Feature Points", F. Baradel, C. Wolf, J. Mille , G.W. Taylor, CVPR 2018

Stars: ✭ 30 (+50%)

Mutual labels: video-understanding

vrview-react

⭐ Virtual Reality React Component for 360º photos, videos and virtual tour visualization

Stars: ✭ 29 (+45%)

Mutual labels: 360-video

Youtube 8m

The 2nd place Solution to the Youtube-8M Video Understanding Challenge by Team Monkeytyping (based on tensorflow)

Stars: ✭ 171 (+755%)

Mutual labels: video-understanding

CS231n

My solutions for Assignments of CS231n: Convolutional Neural Networks for Visual Recognition

Stars: ✭ 30 (+50%)

Mutual labels: saliency-map

NExT-QA

NExT-QA: Next Phase of Question-Answering to Explaining Temporal Actions (CVPR'21)

Stars: ✭ 50 (+150%)

Mutual labels: video-understanding

360WebPlayer

The easiest way to stream 360 videos and pictures on your website or blog.

Stars: ✭ 31 (+55%)

Mutual labels: 360-video

Straas-android-sdk-sample

Straas Android SDK samples and documentation

Stars: ✭ 12 (-40%)

Mutual labels: 360-video

SSTDA

[CVPR 2020] Action Segmentation with Joint Self-Supervised Temporal Domain Adaptation (PyTorch)

Stars: ✭ 150 (+650%)

Mutual labels: video-understanding

ls-psvr-encoder

A simple command line tool to encode your 180 and 360 videos for sideloading with Littlstar's VR Cinema app for PSVR.

Stars: ✭ 61 (+205%)

Mutual labels: 360-video

Paddlevideo

Comprehensive, latest, and deployable video deep learning algorithm, including video recognition, action localization, and temporal action detection tasks. It's a high-performance, light-weight codebase provides practical models for video understanding research and application

Stars: ✭ 218 (+990%)

Mutual labels: video-understanding

just-ask

[TPAMI Special Issue on ICCV 2021 Best Papers, Oral] Just Ask: Learning to Answer Questions from Millions of Narrated Videos

Stars: ✭ 57 (+185%)

Mutual labels: video-understanding

Step

STEP: Spatio-Temporal Progressive Learning for Video Action Detection. CVPR'19 (Oral)

Stars: ✭ 196 (+880%)

Mutual labels: video-understanding

STCNet

STCNet: Spatio-Temporal Cross Network for Industrial Smoke Detection

Stars: ✭ 29 (+45%)

Mutual labels: video-understanding

WhiteBox-Part1

In this part, I've introduced and experimented with ways to interpret and evaluate models in the field of image. (Pytorch)

Stars: ✭ 34 (+70%)

Mutual labels: saliency-map

Awesome-Temporally-Language-Grounding

A curated list of “Temporally Language Grounding” and related area

Stars: ✭ 97 (+385%)

Mutual labels: video-understanding

DINet

A dilated inception network for visual saliency prediction (TMM 2019)

Stars: ✭ 25 (+25%)

Mutual labels: saliency-map

View All Similar Projects ➔

CP-360-Weakly-Supervised-Saliency

This is the code for Cube Padding for Weakly-Supervised Saliency Prediction in 360° Videos, including ResNet-50 static feature extractor and ConvLSTM temporal model.

Getting Started

Clone the repo:

git clone https://github.com/hsientzucheng/CP-360-Weakly-Supervised-Saliency.git

Requirements

Tested under

Python == 3.6
PyTorch >= 0.3
cv2 == 3.4.2
Other dependencies:
- tqdm, scipy, matplotlib, PIL, ruamel_yaml, collections

Model

Pretrained model

You can download our convolution LSTM model here The model should be put into the directory:

[CP-360-Weakly-Supervised-Saliency PATH]/checkpoint/CLSTM_model_released.pth
Performance: AUC 0.898; CC 0.494; AUCB 0.874

CubePadding

The cube padding module in cube_pad.py

python [CP-360-Weakly-Supervised-Saliency PATH]/model/cube_pad.py

Dataset

To get Wild-360 dataset, check our project website.

We use 25 videos for testing and 60 for training as shown in txt files in utils.

Ground truth annotated fixations + sample heatmap visualization

|- Wild360_GT
|	|- video_id_1.mp4
|	|	|- 00000.npy
|	|	|- 00001.npy
|	|	|	...
|	|	|- overlay
|	|	|	|- 00000.jpg
|	|	|	|- 00001.jpg
|	|	|	|	...
|	|- video_id_2.mp4
|	|	|	...

Train/test videos (ID in test set got corresponding ground truth)

|- 360_Discovery
|	|- train
|	|	|- train_video_id_1.mp4
|	|	|- train_video_id_2.mp4
|	|	|	...
|	|- test
|	|	|- test_video_id_1.mp4
|	|	|	...

Inference

To run the inference process, you should first modify the config file

vim [CP-360-Weakly-Supervised-Saliency PATH]/config.yaml

After installing requirements and setting up the configurations, the static model can be run as:

cd static_model
python dataset_feat_extractor.py --mode resnet50 -oi -of

Having the features from the static model, run the temporal model by:

cd temporal_model
python test_temporal.py --dir ../output/static_resnet50 --model CLSTM_model_released.pth --overlay

These commands are in the script, just run:

bash inference.sh

Train

You might want to modify the config file first for some training args:

vim [CP-360-Weakly-Supervised-Saliency PATH]/config.yaml

Extract optical flow to train the temporal model:

cd static_model
python dataset_feat_extractor.py --mode resnet50 -om

Train your model by running:

bash train.sh

The model you train will be saved in (see config.yaml for these args):

vim [CP-360-Weakly-Supervised-Saliency PATH]/checkpoint/CLSTM_s_[l_s]_t_[l_t]_m_[l_m]/CLSTM_[epoch]_[iter].pth

Results

In each block, consecutive frames of various methods, ground truth, and raw videos are shown in the left panel. We highlight regions for comparison using white dash rectangles. In the right panel, one example is zoom-in (red box) and two salient NFoVs (yellow boxes) are rendered.

Notes

Our method to train temporal model is only suitable for stationary videos (without camera motion). For more complicated cases, you might want to compensate camera motion and apply 360 stablization.

Citation

@inproceedings{cheng2018cube,
  title={Cube padding for weakly-supervised saliency prediction in 360 videos},
  author={Cheng, Hsien-Tzu and Chao, Chun-Hung and Dong, Jin-Dong and Wen, Hao-Kai and Liu, Tyng-Luh and Sun, Min},
  booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition},
  pages={1420--1429},
  year={2018}
}

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

hsientzucheng / CP-360-Weakly-Supervised-Saliency

Programming Languages

Labels

Projects that are alternatives of or similar to CP-360-Weakly-Supervised-Saliency

CP-360-Weakly-Supervised-Saliency

Getting Started

Requirements

Model

Pretrained model

CubePadding

Dataset

Ground truth annotated fixations + sample heatmap visualization

Train/test videos (ID in test set got corresponding ground truth)

Inference

Train

Results

Notes

Citation