All Projects → hsientzucheng → CP-360-Weakly-Supervised-Saliency

hsientzucheng / CP-360-Weakly-Supervised-Saliency

Licence: MIT License
CP-360-Weakly-Supervised-Saliency

Programming Languages

python
139335 projects - #7 most used programming language
shell
77523 projects

Projects that are alternatives of or similar to CP-360-Weakly-Supervised-Saliency

Actionvlad
ActionVLAD for video action classification (CVPR 2017)
Stars: ✭ 217 (+985%)
Mutual labels:  video-understanding
MTL-AQA
What and How Well You Performed? A Multitask Learning Approach to Action Quality Assessment [CVPR 2019]
Stars: ✭ 38 (+90%)
Mutual labels:  video-understanding
pytorch-smoothgrad
SmoothGrad implementation in PyTorch
Stars: ✭ 135 (+575%)
Mutual labels:  saliency-map
Awesome Grounding
awesome grounding: A curated list of research papers in visual grounding
Stars: ✭ 247 (+1135%)
Mutual labels:  video-understanding
glimpse clouds
Pytorch implementation of the paper "Glimpse Clouds: Human Activity Recognition from Unstructured Feature Points", F. Baradel, C. Wolf, J. Mille , G.W. Taylor, CVPR 2018
Stars: ✭ 30 (+50%)
Mutual labels:  video-understanding
vrview-react
⭐ Virtual Reality React Component for 360º photos, videos and virtual tour visualization
Stars: ✭ 29 (+45%)
Mutual labels:  360-video
Youtube 8m
The 2nd place Solution to the Youtube-8M Video Understanding Challenge by Team Monkeytyping (based on tensorflow)
Stars: ✭ 171 (+755%)
Mutual labels:  video-understanding
CS231n
My solutions for Assignments of CS231n: Convolutional Neural Networks for Visual Recognition
Stars: ✭ 30 (+50%)
Mutual labels:  saliency-map
NExT-QA
NExT-QA: Next Phase of Question-Answering to Explaining Temporal Actions (CVPR'21)
Stars: ✭ 50 (+150%)
Mutual labels:  video-understanding
360WebPlayer
The easiest way to stream 360 videos and pictures on your website or blog.
Stars: ✭ 31 (+55%)
Mutual labels:  360-video
Straas-android-sdk-sample
Straas Android SDK samples and documentation
Stars: ✭ 12 (-40%)
Mutual labels:  360-video
SSTDA
[CVPR 2020] Action Segmentation with Joint Self-Supervised Temporal Domain Adaptation (PyTorch)
Stars: ✭ 150 (+650%)
Mutual labels:  video-understanding
ls-psvr-encoder
A simple command line tool to encode your 180 and 360 videos for sideloading with Littlstar's VR Cinema app for PSVR.
Stars: ✭ 61 (+205%)
Mutual labels:  360-video
Paddlevideo
Comprehensive, latest, and deployable video deep learning algorithm, including video recognition, action localization, and temporal action detection tasks. It's a high-performance, light-weight codebase provides practical models for video understanding research and application
Stars: ✭ 218 (+990%)
Mutual labels:  video-understanding
just-ask
[TPAMI Special Issue on ICCV 2021 Best Papers, Oral] Just Ask: Learning to Answer Questions from Millions of Narrated Videos
Stars: ✭ 57 (+185%)
Mutual labels:  video-understanding
Step
STEP: Spatio-Temporal Progressive Learning for Video Action Detection. CVPR'19 (Oral)
Stars: ✭ 196 (+880%)
Mutual labels:  video-understanding
STCNet
STCNet: Spatio-Temporal Cross Network for Industrial Smoke Detection
Stars: ✭ 29 (+45%)
Mutual labels:  video-understanding
WhiteBox-Part1
In this part, I've introduced and experimented with ways to interpret and evaluate models in the field of image. (Pytorch)
Stars: ✭ 34 (+70%)
Mutual labels:  saliency-map
Awesome-Temporally-Language-Grounding
A curated list of “Temporally Language Grounding” and related area
Stars: ✭ 97 (+385%)
Mutual labels:  video-understanding
DINet
A dilated inception network for visual saliency prediction (TMM 2019)
Stars: ✭ 25 (+25%)
Mutual labels:  saliency-map

CP-360-Weakly-Supervised-Saliency

This is the code for Cube Padding for Weakly-Supervised Saliency Prediction in 360° Videos, including ResNet-50 static feature extractor and ConvLSTM temporal model.

Getting Started

Clone the repo:

git clone https://github.com/hsientzucheng/CP-360-Weakly-Supervised-Saliency.git

Requirements

Tested under

  • Python == 3.6
  • PyTorch >= 0.3
  • cv2 == 3.4.2
  • Other dependencies:
    • tqdm, scipy, matplotlib, PIL, ruamel_yaml, collections

Model

Pretrained model

You can download our convolution LSTM model here The model should be put into the directory:

[CP-360-Weakly-Supervised-Saliency PATH]/checkpoint/CLSTM_model_released.pth
Performance: AUC 0.898; CC 0.494; AUCB 0.874

CubePadding

The cube padding module in cube_pad.py

python [CP-360-Weakly-Supervised-Saliency PATH]/model/cube_pad.py

Dataset

To get Wild-360 dataset, check our project website.

We use 25 videos for testing and 60 for training as shown in txt files in utils.

Ground truth annotated fixations + sample heatmap visualization

|- Wild360_GT
|	|- video_id_1.mp4
|	|	|- 00000.npy
|	|	|- 00001.npy
|	|	|	...
|	|	|- overlay
|	|	|	|- 00000.jpg
|	|	|	|- 00001.jpg
|	|	|	|	...
|	|- video_id_2.mp4
|	|	|	...

Train/test videos (ID in test set got corresponding ground truth)

|- 360_Discovery
|	|- train
|	|	|- train_video_id_1.mp4
|	|	|- train_video_id_2.mp4
|	|	|	...
|	|- test
|	|	|- test_video_id_1.mp4
|	|	|	...

Inference

  • To run the inference process, you should first modify the config file
vim [CP-360-Weakly-Supervised-Saliency PATH]/config.yaml
  • After installing requirements and setting up the configurations, the static model can be run as:
cd static_model
python dataset_feat_extractor.py --mode resnet50 -oi -of
  • Having the features from the static model, run the temporal model by:
cd temporal_model
python test_temporal.py --dir ../output/static_resnet50 --model CLSTM_model_released.pth --overlay
  • These commands are in the script, just run:
bash inference.sh

Train

  • You might want to modify the config file first for some training args:
vim [CP-360-Weakly-Supervised-Saliency PATH]/config.yaml
  • Extract optical flow to train the temporal model:
cd static_model
python dataset_feat_extractor.py --mode resnet50 -om
  • Train your model by running:
bash train.sh
  • The model you train will be saved in (see config.yaml for these args):
vim [CP-360-Weakly-Supervised-Saliency PATH]/checkpoint/CLSTM_s_[l_s]_t_[l_t]_m_[l_m]/CLSTM_[epoch]_[iter].pth

Results

In each block, consecutive frames of various methods, ground truth, and raw videos are shown in the left panel. We highlight regions for comparison using white dash rectangles. In the right panel, one example is zoom-in (red box) and two salient NFoVs (yellow boxes) are rendered.

Notes

  • Our method to train temporal model is only suitable for stationary videos (without camera motion). For more complicated cases, you might want to compensate camera motion and apply 360 stablization.

Citation

@inproceedings{cheng2018cube,
  title={Cube padding for weakly-supervised saliency prediction in 360 videos},
  author={Cheng, Hsien-Tzu and Chao, Chun-Hung and Dong, Jin-Dong and Wen, Hao-Kai and Liu, Tyng-Luh and Sun, Min},
  booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition},
  pages={1420--1429},
  year={2018}
}
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].