All Projects → hkchengrex → MiVOS

hkchengrex / MiVOS

Licence: GPL-3.0 license
[CVPR 2021] Modular Interactive Video Object Segmentation: Interaction-to-Mask, Propagation and Difference-Aware Fusion. Semi-supervised VOS as well!

Programming Languages

python
139335 projects - #7 most used programming language
Cuda
1817 projects
C++
36643 projects - #6 most used programming language
cython
566 projects

Projects that are alternatives of or similar to MiVOS

Mask-Propagation
[CVPR 2021] MiVOS - Mask Propagation module. Reproduced STM (and better) with training code 🌟. Semi-supervised video object segmentation evaluation.
Stars: ✭ 71 (-76.49%)
Mutual labels:  segmentation, video-segmentation, video-object-segmentation, cvpr2021
BCNet
Deep Occlusion-Aware Instance Segmentation with Overlapping BiLayers [CVPR 2021]
Stars: ✭ 434 (+43.71%)
Mutual labels:  segmentation, cvpr2021
RMNet
Implementation of "Efficient Regional Memory Network for Video Object Segmentation". (Xie et al., CVPR 2021)
Stars: ✭ 76 (-74.83%)
Mutual labels:  video-object-segmentation, cvpr2021
root painter
RootPainter: Deep Learning Segmentation of Biological Images with Corrective Annotation
Stars: ✭ 28 (-90.73%)
Mutual labels:  segmentation, interactive-segmentation
DeepSegmentor
Sequence Segmentation using Joint RNN and Structured Prediction Models (ICASSP 2017)
Stars: ✭ 17 (-94.37%)
Mutual labels:  segmentation
AttentionGatedVNet3D
Attention Gated VNet3D Model for KiTS19——2019 Kidney Tumor Segmentation Challenge
Stars: ✭ 35 (-88.41%)
Mutual labels:  segmentation
wasr network
WaSR Segmentation Network for Unmanned Surface Vehicles v0.5
Stars: ✭ 32 (-89.4%)
Mutual labels:  segmentation
CVPR2021 PLOP
Official code of CVPR 2021's PLOP: Learning without Forgetting for Continual Semantic Segmentation
Stars: ✭ 102 (-66.23%)
Mutual labels:  cvpr2021
VNet
Prostate MR Image Segmentation 2012
Stars: ✭ 54 (-82.12%)
Mutual labels:  segmentation
superpixelRefinement
Superpixel-based Refinement for Object Proposal Generation (ICPR 2020)
Stars: ✭ 24 (-92.05%)
Mutual labels:  segmentation
nicMSlesions
Easy multiple sclerosis white matter lesion segmentation using convolutional deep neural networks.
Stars: ✭ 33 (-89.07%)
Mutual labels:  segmentation
Fast-SCNN pytorch
A PyTorch Implementation of Fast-SCNN: Fast Semantic Segmentation Network(PyTorch >= 1.4)
Stars: ✭ 30 (-90.07%)
Mutual labels:  segmentation
ProtoTree
ProtoTrees: Neural Prototype Trees for Interpretable Fine-grained Image Recognition, published at CVPR2021
Stars: ✭ 47 (-84.44%)
Mutual labels:  cvpr2021
Brain-MRI-Segmentation
Smart India Hackathon 2019 project given by the Department of Atomic Energy
Stars: ✭ 29 (-90.4%)
Mutual labels:  segmentation
mmrazor
OpenMMLab Model Compression Toolbox and Benchmark.
Stars: ✭ 644 (+113.25%)
Mutual labels:  segmentation
vesseg
Brain vessel segmentation using 3D convolutional neural networks
Stars: ✭ 27 (-91.06%)
Mutual labels:  segmentation
semantic-guidance
Code for our CVPR-2021 paper on Combining Semantic Guidance and Deep Reinforcement Learning For Generating Human Level Paintings.
Stars: ✭ 19 (-93.71%)
Mutual labels:  cvpr2021
Point2Sequence
Point2Sequence: Learning the Shape Representation of 3D Point Clouds with an Attention-based Sequence to Sequence Network
Stars: ✭ 34 (-88.74%)
Mutual labels:  segmentation
Basic-Image-Processing
Implementation of Basic Digital Image Processing Tasks in Python / OpenCV
Stars: ✭ 102 (-66.23%)
Mutual labels:  segmentation
Polygonization-by-Frame-Field-Learning
This repository contains the code for our fast polygonal building extraction from overhead images pipeline.
Stars: ✭ 161 (-46.69%)
Mutual labels:  segmentation

Modular Interactive Video Object Segmentation: Interaction-to-Mask, Propagation and Difference-Aware Fusion (MiVOS)

Ho Kei Cheng, Yu-Wing Tai, Chi-Keung Tang

CVPR 2021

[arXiv] [Paper PDF] [Project Page] [Demo] [Papers with Code] [Supplementary Material]

New! See the STCN branch for a better and faster version.

demo1 demo2 demo3

Credit (left to right): DAVIS 2017, Academy of Historical Fencing, Modern History TV

We manage the project using three different repositories (which are actually in the paper title). This is the main repo, see also Mask-Propagation and Scribble-to-Mask.

Overall structure and capabilities

MiVOS Mask-Propagation Scribble-to-Mask
DAVIS/YouTube semi-supervised evaluation ✔️
DAVIS interactive evaluation ✔️
User interaction GUI tool ✔️
Dense Correspondences ✔️
Train propagation module ✔️
Train S2M (interaction) module ✔️
Train fusion module ✔️
Generate more synthetic data ✔️

Framework

framework

Requirements

We used these packages/versions in the development of this project. It is likely that higher versions of the same package will also work. This is not an exhaustive list -- other common python packages (e.g. pillow) are expected and not listed.

Refer to the official PyTorch guide for installing PyTorch/torchvision. The rest can be installed by:

pip install PyQt5 davisinteractive progressbar2 opencv-python networkx gitpython gdown Cython

Quick start

GUI

  1. python download_model.py to get all the required models.
  2. python interactive_gui.py --video <path to video> or python interactive_gui.py --images <path to a folder of images>. A video has been prepared for you at examples/example.mp4.
  3. If you need to label more than one object, additionally specify --num_objects <number_of_objects>. See all the argument options with python interactive_gui.py --help.
  4. There are instructions in the GUI. You can also watch the demo videos for some ideas.

DAVIS Interactive VOS

See eval_interactive_davis.py. If you have downloaded the datasets and pretrained models using our script, you only need to specify the output path, i.e., python eval_interactive_davis.py --output [somewhere].

DAVIS/YouTube Semi-supervised VOS

Go to this repo: Mask-Propagation.

Main Results

DAVIS/YouTube semi-supervised results

DAVIS Interactive Track

All results are generated using the unmodified official DAVIS interactive bot without saving masks (--save_mask not specified) and with an RTX 2080Ti. We follow the official protocol.

Precomputed result, with the json summary: [Google Drive] [OneDrive]

eval_interactive_davis.py

Model AUC-J&F J&F @ 60s
Baseline 86.0 86.6
(+) Top-k 87.2 87.8
(+) BL30K pretraining 87.4 88.0
(+) Learnable fusion 87.6 88.2
(+) Difference-aware fusion (full model) 87.9 88.5
Full model, without BL30K for propagation/fusion 87.4 88.0
Full model, STCN backbone 88.4 88.8

Pretrained models

python download_model.py should get you all the models that you need. (pip install gdown required.)

[OneDrive Mirror]

Training

Data preparation

Datasets should be arranged as the following layout. You can use download_datasets.py (same as the one Mask-Propagation) to get the DAVIS dataset and manually download and extract fusion_data ([OneDrive]) and BL30K.

├── BL30K
├── DAVIS
│   └── 2017
│       ├── test-dev
│       │   ├── Annotations
│       │   └── ...
│       └── trainval
│           ├── Annotations
│           └── ...
├── fusion_data
└── MiVOS

BL30K

BL30K is a synthetic dataset rendered using Blender with ShapeNet's data. We break the dataset into six segments, each with approximately 5K videos. The videos are organized in a similar format as DAVIS and YouTubeVOS, so dataloaders for those datasets can be used directly. Each video is 160 frames long, and each frame has a resolution of 768*512. There are 3-5 objects per video, and each object has a random smooth trajectory -- we tried to optimize the trajectories greedily to minimize object intersection (not guaranteed), with occlusions still possible (happen a lot in reality). See generation/blender/generate_yaml.py for details.

We noted that using probably half of the data is sufficient to reach full performance (although we still used all), but using less than one-sixth (5K) is insufficient.

Download

You can either use the automatic script download_bl30k.py or download it manually below. Note that each segment is about 115GB in size -- 700GB in total. You are going to need ~1TB of free disk space to run the script (including extraction buffer).

Google Drive is much faster in my experience. Your mileage might vary.

Manual download: [Google Drive] [OneDrive]

[UST Mirror] (Reliability not guaranteed, speed throttled, do not use if others are available): ckcpu1.cse.ust.hk:8080/MiVOS/BL30K_{a-f}.tar (Replace {a-f} with the part that you need).

MD5 Checksum:

35312550b9a75467b60e3b2be2ceac81  BL30K_a.tar
269e2f9ad34766b5f73fa117166c1731  BL30K_b.tar
a3f7c2a62028d0cda555f484200127b9  BL30K_c.tar
e659ed7c4e51f4c06326855f4aba8109  BL30K_d.tar
d704e86c5a6a9e920e5e84996c2e0858  BL30K_e.tar
bf73914d2888ad642bc01be60523caf6  BL30K_f.tar

Generation

  1. Download ShapeNet.
  2. Install Blender. (We used 2.82)
  3. Download a bunch of background and texture images. We used this repo (we specified "non-commercial reuse" in the script) and the list of keywords are provided in generation/blender/*.json.
  4. Generate a list of configuration files (generation/blender/generate_yaml.py).
  5. Run rendering on the configurations. See here (Not documented in detail, ask if you have a question)

Fusion data

We use the propagation module to run through some data and obtain real outputs to train the fusion module. See the script generate_fusion.py.

Or you can download pre-generated fusion data: [Google Drive] [OneDrive]

Training commands

These commands are to train the fusion module only.

CUDA_VISIBLE_DEVICES=[a,b] OMP_NUM_THREADS=4 python -m torch.distributed.launch --master_port [cccc] --nproc_per_node=2 train.py --id [defg] --stage [h]

We implemented training with Distributed Data Parallel (DDP) with two 11GB GPUs. Replace a, b with the GPU ids, cccc with an unused port number, defg with a unique experiment identifier, and h with the training stage (0/1).

The model is trained progressively with different stages (0: BL30K; 1: DAVIS). After each stage finishes, we start the next stage by loading the trained weight. A pretrained propagation model is required to train the fusion module.

One concrete example is:

Pre-training on the BL30K dataset: CUDA_VISIBLE_DEVICES=0,1 OMP_NUM_THREADS=4 python -m torch.distributed.launch --master_port 7550 --nproc_per_node=2 train.py --load_prop saves/propagation_model.pth --stage 0 --id retrain_s0

Main training: CUDA_VISIBLE_DEVICES=0,1 OMP_NUM_THREADS=4 python -m torch.distributed.launch --master_port 7550 --nproc_per_node=2 train.py --load_prop saves/propagation_model.pth --stage 1 --id retrain_s012 --load_network [path_to_trained_s0.pth]

Credit

f-BRS: https://github.com/saic-vul/fbrs_interactive_segmentation

ivs-demo: https://github.com/seoungwugoh/ivs-demo

deeplab: https://github.com/VainF/DeepLabV3Plus-Pytorch

STM: https://github.com/seoungwugoh/STM

BlenderProc: https://github.com/DLR-RM/BlenderProc

Citation

Please cite our paper if you find this repo useful!

@inproceedings{cheng2021mivos,
  title={Modular Interactive Video Object Segmentation: Interaction-to-Mask, Propagation and Difference-Aware Fusion},
  author={Cheng, Ho Kei and Tai, Yu-Wing and Tang, Chi-Keung},
  booktitle={CVPR},
  year={2021}
}

And if you want to cite the datasets:

bibtex

@inproceedings{shi2015hierarchicalECSSD,
  title={Hierarchical image saliency detection on extended CSSD},
  author={Shi, Jianping and Yan, Qiong and Xu, Li and Jia, Jiaya},
  booktitle={TPAMI},
  year={2015},
}

@inproceedings{wang2017DUTS,
  title={Learning to Detect Salient Objects with Image-level Supervision},
  author={Wang, Lijun and Lu, Huchuan and Wang, Yifan and Feng, Mengyang 
  and Wang, Dong, and Yin, Baocai and Ruan, Xiang}, 
  booktitle={CVPR},
  year={2017}
}

@inproceedings{FSS1000,
  title = {FSS-1000: A 1000-Class Dataset for Few-Shot Segmentation},
  author = {Li, Xiang and Wei, Tianhan and Chen, Yau Pun and Tai, Yu-Wing and Tang, Chi-Keung},
  booktitle={CVPR},
  year={2020}
}

@inproceedings{zeng2019towardsHRSOD,
  title = {Towards High-Resolution Salient Object Detection},
  author = {Zeng, Yi and Zhang, Pingping and Zhang, Jianming and Lin, Zhe and Lu, Huchuan},
  booktitle = {ICCV},
  year = {2019}
}

@inproceedings{cheng2020cascadepsp,
  title={{CascadePSP}: Toward Class-Agnostic and Very High-Resolution Segmentation via Global and Local Refinement},
  author={Cheng, Ho Kei and Chung, Jihoon and Tai, Yu-Wing and Tang, Chi-Keung},
  booktitle={CVPR},
  year={2020}
}

@inproceedings{xu2018youtubeVOS,
  title={Youtube-vos: A large-scale video object segmentation benchmark},
  author={Xu, Ning and Yang, Linjie and Fan, Yuchen and Yue, Dingcheng and Liang, Yuchen and Yang, Jianchao and Huang, Thomas},
  booktitle = {ECCV},
  year={2018}
}

@inproceedings{perazzi2016benchmark,
  title={A benchmark dataset and evaluation methodology for video object segmentation},
  author={Perazzi, Federico and Pont-Tuset, Jordi and McWilliams, Brian and Van Gool, Luc and Gross, Markus and Sorkine-Hornung, Alexander},
  booktitle={CVPR},
  year={2016}
}

@inproceedings{denninger2019blenderproc,
  title={BlenderProc},
  author={Denninger, Maximilian and Sundermeyer, Martin and Winkelbauer, Dominik and Zidan, Youssef and Olefir, Dmitry and Elbadrawy, Mohamad and Lodhi, Ahsan and Katam, Harinandan},
  booktitle={arXiv:1911.01911},
  year={2019}
}

@inproceedings{shapenet2015,
  title       = {{ShapeNet: An Information-Rich 3D Model Repository}},
  author      = {Chang, Angel Xuan and Funkhouser, Thomas and Guibas, Leonidas and Hanrahan, Pat and Huang, Qixing and Li, Zimo and Savarese, Silvio and Savva, Manolis and Song, Shuran and Su, Hao and Xiao, Jianxiong and Yi, Li and Yu, Fisher},
  booktitle   = {arXiv:1512.03012},
  year        = {2015}
}

Contact: [email protected]

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].