All Projects → prstrive → EPCDepth

prstrive / EPCDepth

Licence: MIT license
[ICCV 2021] Excavating the Potential Capacity of Self-Supervised Monocular Depth Estimation

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to EPCDepth

SGDepth
[ECCV 2020] Self-Supervised Monocular Depth Estimation: Solving the Dynamic Object Problem by Semantic Guidance
Stars: ✭ 162 (+54.29%)
Mutual labels:  depth-estimation, monocular-depth-estimation
FisheyeDistanceNet
FisheyeDistanceNet
Stars: ✭ 33 (-68.57%)
Mutual labels:  depth-estimation, monocular-depth-estimation
Visualizing-CNNs-for-monocular-depth-estimation
official implementation of "Visualization of Convolutional Neural Networks for Monocular Depth Estimation"
Stars: ✭ 120 (+14.29%)
Mutual labels:  depth-estimation, monocular-depth-estimation
DiverseDepth
The code and data of DiverseDepth
Stars: ✭ 150 (+42.86%)
Mutual labels:  depth-estimation, monocular-depth-estimation
rectified-features
[ECCV 2020] Single image depth prediction allows us to rectify planar surfaces in images and extract view-invariant local features for better feature matching
Stars: ✭ 57 (-45.71%)
Mutual labels:  depth-estimation, monocular-depth-estimation
BridgeDepthFlow
Bridging Stereo Matching and Optical Flow via Spatiotemporal Correspondence, CVPR 2019
Stars: ✭ 114 (+8.57%)
Mutual labels:  depth-estimation, monodepth
Semantic-Mono-Depth
Geometry meets semantics for semi-supervised monocular depth estimation - ACCV 2018
Stars: ✭ 98 (-6.67%)
Mutual labels:  depth-estimation, monodepth
diode-devkit
DIODE Development Toolkit
Stars: ✭ 58 (-44.76%)
Mutual labels:  depth-estimation, monodepth
Depth estimation
Deep learning model to estimate the depth of image.
Stars: ✭ 62 (-40.95%)
Mutual labels:  depth-estimation, monocular-depth-estimation
Indoor-SfMLearner
[ECCV'20] Patch-match and Plane-regularization for Unsupervised Indoor Depth Estimation
Stars: ✭ 115 (+9.52%)
Mutual labels:  depth-estimation, self-supervised
SupervisedDepthPrediction
Pytorch framework for supervised depth prediction
Stars: ✭ 36 (-65.71%)
Mutual labels:  depth-estimation, monocular
Learning2AdaptForStereo
Code for: "Learning To Adapt For Stereo" accepted at CVPR2019
Stars: ✭ 73 (-30.48%)
Mutual labels:  stereo, depth-estimation
Monodepth2
[ICCV 2019] Monocular depth estimation from a single image
Stars: ✭ 2,714 (+2484.76%)
Mutual labels:  depth-estimation, monodepth
tf-monodepth2
Tensorflow implementation(unofficial) of "Digging into Self-Supervised Monocular Depth Prediction"
Stars: ✭ 75 (-28.57%)
Mutual labels:  depth-estimation, monodepth
mrnet
Building an ACL tear detector to spot knee injuries from MRIs with PyTorch (MRNet)
Stars: ✭ 98 (-6.67%)
Mutual labels:  data-augmentation
Image-Rotation-and-Cropping-tensorflow
Image rotation and cropping out the black borders in TensorFlow
Stars: ✭ 14 (-86.67%)
Mutual labels:  data-augmentation
WereSoCool
A language for composing microtonal music built in Rust. Make cool sounds. Impress your friends/pets/plants.
Stars: ✭ 41 (-60.95%)
Mutual labels:  stereo
ChineseNER
中文NER的那些事儿
Stars: ✭ 241 (+129.52%)
Mutual labels:  data-augmentation
ccgl
TKDE 22. CCCL: Contrastive Cascade Graph Learning.
Stars: ✭ 20 (-80.95%)
Mutual labels:  data-augmentation
specAugment
Tensor2tensor experiment with SpecAugment
Stars: ✭ 46 (-56.19%)
Mutual labels:  data-augmentation

EPCDepth

EPCDepth is a self-supervised monocular depth estimation model, whose supervision is coming from the other image in a stereo pair. Details are described in our paper:

Excavating the Potential Capacity of Self-Supervised Monocular Depth Estimation

Rui Peng, Ronggang Wang, Yawen Lai, Luyang Tang, Yangang Cai

ICCV 2021 (arxiv)

EPCDepth can produce the most accurate and sharpest result. In the last example, the depth of the person in the second red box should be greater than that of the road sign because the road sign obscures the person. Only our model accurately captures the cue of occlusion.

Setup

1. Recommended environment

  • PyTorch 1.1
  • Python 3.6

2. KITTI data

You can download the raw KITTI dataset (about 175GB) by running:

wget -i dataset/kitti_archives_to_download.txt -P <your kitti path>/
cd <your kitti path>
unzip "*.zip"

Then, we recommend that you converted the png images to jpeg with this command:

find <your kitti path>/ -name '*.png' | parallel 'convert -quality 92 -sampling-factor 2x2,1x1,1x1 {.}.png {.}.jpg && rm {}'

or you can skip this conversion step and by manually adjusting the suffix of the image from .jpg to .png in dataset/kitti_dataset.py. Our pre-trained model is trained in jpg, and the test performance on png will slightly decrease.

3. Prepare depth hint

Once you have downloaded the KITTI dataset as in the previous step, you need to prepare the depth hint by running:

python precompute_depth_hints.py --data_path <your kitti path>

the generated depth hint will be saved to <your kitti path>/depth_hints. You should also pay attention to the suffix of the image.

📊 Evaluation

1. Download models

Download our pretrained model and put it to <your model path>.

Pre-trained PP HxW Backbone Output Scale Abs Rel Sq Rel RMSE δ < 1.25
model18_lr 192x640 resnet18 (pt) d0 0.0998 0.722 4.475 0.888
d2 0.1 0.712 4.462 0.886
model18 320x1024 resnet18 (pt) d0 0.0925 0.671 4.297 0.899
d2 0.0920 0.655 4.268 0.898
model50 320x1024 resnet50 (pt) d0 0.0905 0.646 4.207 0.901
d2 0.0905 0.629 4.187 0.900

Note: pt refers to pre-trained on ImageNet, and the results of low resolution are a bit different from the paper.

2. KITTI evaluation

This operation will save the estimated disparity map to <your disparity save path>. To recreate the results from our paper, run:

python main.py 
    --val --data_path <your kitti path> --resume <your model path>/model18.pth.tar 
    --use_full_scale --post_process --output_scale 0 --disps_path <your disparity save path>

The shape of saved disparities in numpy data format is (N, H, W).

3. NYUv2 evaluation

We validate the generalization ability on the NYU-Depth-V2 dataset using the mode trained on the KITTI dataset. Download the testing data nyu_test.tar.gz, and unzip it to <your nyuv2 testing date path>. All evaluation codes are in the nyuv2Testing folder. Run:

python nyuv2_testing.py 
    --data_path <your nyuv2 testing date path>
    --resume <your mode path>/model50.pth.tar --post_process
    --save_dir <your nyuv2 disparity save path>

By default, only the visualization results (png format) of the predicted disparity and ground-truth will be saved to <your nyuv2 disparity save path> on NYUv2 dataset.

📦 KITTI Results

You can download our precomputed disparity predictions from the following links:

Disparity PP HxW Backbone Output Scale Abs Rel Sq Rel RMSE δ < 1.25
disps18_lr 192x640 resnet18 (pt) d0 0.0998 0.722 4.475 0.888
disps18 320x1024 resnet18 (pt) d0 0.0925 0.671 4.297 0.899
disps50 320x1024 resnet50 (pt) d0 0.0905 0.646 4.207 0.901

🖼 Visualization

To visualize the disparity map saved in the KITTI evaluation (or other disparities in numpy data format), run:

python main.py --vis --disps_path <your disparity save path>/disps50.npy

The visualized depth map will be saved to <your disparity save path>/disps_vis in png format.

Training

To train the model from scratch, run:

python main.py 
    --data_path <your kitti path> --model_dir <checkpoint save dir> 
    --logs_dir <tensorboard save dir> --pretrained --post_process 
    --use_depth_hint --use_spp_distillation --use_data_graft 
    --use_full_scale

🔧 Suggestion

  1. The magnitude of performance improvement: Data Grafting > Full-Scale > Self-Distillation. We noticed that the performance improvement of self-distillation becomes insignificant when the model capacity is large. Therefore, it is potential to explore more accurate self-distillation label extraction methods and better self-distillation strategies in the future.
  2. According to our experimental experience, the convergence of the self-supervised monocular depth estimation model using a larger backbone network is relatively unstable. You can verify your innovations on the small backbone first, and then adjust the learning rate appropriately to train on the big backbone.
  3. We found that using a pure RSU encoder has better performance than the traditional Resnet encoder, but unfortunately there is no RSU encoder pre-trained on Imagenet. Therefore, we firmly believe that someone can pre-train the RSU encoder on Imagenet and replace the resnet encoder of this model to get huge performance improvement.

Citation

If you find our work useful in your research please consider citing our paper:

@inproceedings{epcdepth,
    title = {Excavating the Potential Capacity of Self-Supervised Monocular Depth Estimation},
    author = {Peng, Rui and Wang, Ronggang and Lai, Yawen and Tang, Luyang and Cai, Yangang},
    booktitle = {Proceedings of the IEEE International Conference on Computer Vision (ICCV)},
    year = {2021}
}

👩‍ Acknowledgements

Our depth hint module refers to DepthHints, the NYUv2 pre-processing refers to P2Net, and the RSU block refers to U2Net.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].