CP-360-Weakly-Supervised-Saliency
This is the code for Cube Padding for Weakly-Supervised Saliency Prediction in 360° Videos, including ResNet-50 static feature extractor and ConvLSTM temporal model.
Getting Started
Clone the repo:
git clone https://github.com/hsientzucheng/CP-360-Weakly-Supervised-Saliency.git
Requirements
Tested under
- Python == 3.6
- PyTorch >= 0.3
- cv2 == 3.4.2
- Other dependencies:
- tqdm, scipy, matplotlib, PIL, ruamel_yaml, collections
Model
Pretrained model
You can download our convolution LSTM model here The model should be put into the directory:
[CP-360-Weakly-Supervised-Saliency PATH]/checkpoint/CLSTM_model_released.pth
Performance: AUC 0.898; CC 0.494; AUCB 0.874
CubePadding
The cube padding module in cube_pad.py
python [CP-360-Weakly-Supervised-Saliency PATH]/model/cube_pad.py
Dataset
To get Wild-360 dataset, check our project website.
We use 25 videos for testing and 60 for training as shown in txt files in utils.
Ground truth annotated fixations + sample heatmap visualization
|- Wild360_GT
| |- video_id_1.mp4
| | |- 00000.npy
| | |- 00001.npy
| | | ...
| | |- overlay
| | | |- 00000.jpg
| | | |- 00001.jpg
| | | | ...
| |- video_id_2.mp4
| | | ...
Train/test videos (ID in test set got corresponding ground truth)
|- 360_Discovery
| |- train
| | |- train_video_id_1.mp4
| | |- train_video_id_2.mp4
| | | ...
| |- test
| | |- test_video_id_1.mp4
| | | ...
Inference
- To run the inference process, you should first modify the config file
vim [CP-360-Weakly-Supervised-Saliency PATH]/config.yaml
- After installing requirements and setting up the configurations, the static model can be run as:
cd static_model
python dataset_feat_extractor.py --mode resnet50 -oi -of
- Having the features from the static model, run the temporal model by:
cd temporal_model
python test_temporal.py --dir ../output/static_resnet50 --model CLSTM_model_released.pth --overlay
- These commands are in the script, just run:
bash inference.sh
Train
- You might want to modify the config file first for some training args:
vim [CP-360-Weakly-Supervised-Saliency PATH]/config.yaml
- Extract optical flow to train the temporal model:
cd static_model
python dataset_feat_extractor.py --mode resnet50 -om
- Train your model by running:
bash train.sh
- The model you train will be saved in (see config.yaml for these args):
vim [CP-360-Weakly-Supervised-Saliency PATH]/checkpoint/CLSTM_s_[l_s]_t_[l_t]_m_[l_m]/CLSTM_[epoch]_[iter].pth
Results In each block, consecutive frames of various methods, ground truth, and raw videos are shown in the left panel. We highlight regions for comparison using white dash rectangles. In the right panel, one example is zoom-in (red box) and two salient NFoVs (yellow boxes) are rendered.
Notes
- Our method to train temporal model is only suitable for stationary videos (without camera motion). For more complicated cases, you might want to compensate camera motion and apply 360 stablization.
Citation
@inproceedings{cheng2018cube,
title={Cube padding for weakly-supervised saliency prediction in 360 videos},
author={Cheng, Hsien-Tzu and Chao, Chun-Hung and Dong, Jin-Dong and Wen, Hao-Kai and Liu, Tyng-Luh and Sun, Min},
booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition},
pages={1420--1429},
year={2018}
}