All Projects → dvlab-research → DSGN

dvlab-research / DSGN

Licence: MIT license
DSGN: Deep Stereo Geometry Network for 3D Object Detection (CVPR 2020)

Programming Languages

python
139335 projects - #7 most used programming language
C++
36643 projects - #6 most used programming language
Cuda
1817 projects
shell
77523 projects

Projects that are alternatives of or similar to DSGN

ONNX-HITNET-Stereo-Depth-estimation
Python scripts form performing stereo depth estimation using the HITNET model in ONNX.
Stars: ✭ 21 (-92.39%)
Mutual labels:  stereo-vision, depth-estimation
Semantic-Mono-Depth
Geometry meets semantics for semi-supervised monocular depth estimation - ACCV 2018
Stars: ✭ 98 (-64.49%)
Mutual labels:  stereo-vision, depth-estimation
LUVLi
[CVPR 2020] Re-hosting of the LUVLi Face Alignment codebase. Please download the codebase from the original MERL website by agreeing to all terms and conditions. By using this code, you agree to MERL's research-only licensing terms.
Stars: ✭ 24 (-91.3%)
Mutual labels:  cvpr2020
Visualizing-CNNs-for-monocular-depth-estimation
official implementation of "Visualization of Convolutional Neural Networks for Monocular Depth Estimation"
Stars: ✭ 120 (-56.52%)
Mutual labels:  depth-estimation
zed-openpose
Real-time 3D multi-person with OpenPose and the ZED
Stars: ✭ 37 (-86.59%)
Mutual labels:  stereo-vision
zed-matlab
ZED SDK interface sample for Matlab
Stars: ✭ 23 (-91.67%)
Mutual labels:  stereo-vision
Structured-Light-Laser-Stripe-Reconstruction
Reconstructs a 3D stripe on the area of an object on which a laser falls as seen by the camera
Stars: ✭ 35 (-87.32%)
Mutual labels:  stereo-vision
edlsm pytorch
Pytorch implementation for stereo matching described in the paper: Efficient Deep learning for stereo matching
Stars: ✭ 16 (-94.2%)
Mutual labels:  stereo-vision
UAV-Stereo-Vision
A program for controlling a micro-UAV for obstacle detection and collision avoidance using disparity mapping
Stars: ✭ 30 (-89.13%)
Mutual labels:  stereo-vision
zed-ros2-wrapper
ROS 2 wrapper beta for the ZED SDK
Stars: ✭ 61 (-77.9%)
Mutual labels:  stereo-vision
Calibration-Under Different-Resolution
Stereo Camera Calibration Under Different Resolution
Stars: ✭ 38 (-86.23%)
Mutual labels:  stereo-vision
FastMVSNet
[CVPR'20] Fast-MVSNet: Sparse-to-Dense Multi-View Stereo With Learned Propagation and Gauss-Newton Refinement
Stars: ✭ 193 (-30.07%)
Mutual labels:  cvpr2020
BridgeDepthFlow
Bridging Stereo Matching and Optical Flow via Spatiotemporal Correspondence, CVPR 2019
Stars: ✭ 114 (-58.7%)
Mutual labels:  depth-estimation
Normal-Assisted-Stereo
[CVPR 2020] Normal Assisted Stereo Depth Estimation
Stars: ✭ 95 (-65.58%)
Mutual labels:  depth-estimation
zero virus
Zero-VIRUS: Zero-shot VehIcle Route Understanding System for Intelligent Transportation (CVPR 2020 AI City Challenge Track 1)
Stars: ✭ 25 (-90.94%)
Mutual labels:  cvpr2020
nemar
[CVPR2020] Unsupervised Multi-Modal Image Registration via Geometry Preserving Image-to-Image Translation
Stars: ✭ 120 (-56.52%)
Mutual labels:  cvpr2020
Dual-CNN-Models-for-Unsupervised-Monocular-Depth-Estimation
Dual CNN Models for Unsupervised Monocular Depth Estimation
Stars: ✭ 36 (-86.96%)
Mutual labels:  depth-estimation
SGDepth
[ECCV 2020] Self-Supervised Monocular Depth Estimation: Solving the Dynamic Object Problem by Semantic Guidance
Stars: ✭ 162 (-41.3%)
Mutual labels:  depth-estimation
3D60
Tools accompanying the 3D60 spherical panoramas dataset
Stars: ✭ 83 (-69.93%)
Mutual labels:  stereo-vision
progressive-coordinate-transforms
Progressive Coordinate Transforms for Monocular 3D Object Detection, NeurIPS 2021
Stars: ✭ 55 (-80.07%)
Mutual labels:  3d-detection

DSGN

Deep Stereo Geometry Network for 3D Object Detection (CVPR 2020)

This is the official implementation of DSGN (CVPR 2020), a strong 3D object detector proposed to jointly estimate scene depth and detect 3D objects in 3D world with only input of a stereo image pair.

DSGN: Deep Stereo Geometry Network for 3D Object Detection
Yilun Chen, Shu Liu, Xiaoyong Shen, Jiaya Jia.
[Paper]  [Video] 

Most state-of-the-art 3D object detectors heavily rely on LiDAR sensors and there remains a large gap in terms of performance between image-based and LiDAR-based methods, caused by inappropriate representation for the prediction in 3D scenarios. Our method, called Deep Stereo Geometry Network (DSGN), reduces this gap significantly by detecting 3D objects on a differentiable volumetric representation – 3D geometric volume, which effectively encodes 3D geometric structure for 3D regular space. With this representation, we learn depth information and semantic cues simultaneously. For the first time, we provide a simple and effective one-stage stereo-based 3D detection pipeline that jointly estimates the depth and detects 3D objects in an end-to-end learning manner. Our approach outperforms previous stereo-based 3D detectors (about 10 higher in terms of AP) and even achieves comparable performance with a few LiDAR-based methods on the KITTI 3D object detection leaderboard.

Overall Pipeline

DSGN consists of four components: (a) A 2D image feature extractor for capture of both pixel-level and high-level feature. (b) Constructing the plane-sweep volume and 3D geometric volume. (c) Depth Estimation on the plane-sweep volume. (d) 3D object detection on 3D geometric volume.

Reported Results on KITTI Leaderboard

Requirements

All the codes are tested in the following environment:

  • Ubuntu 16.04
  • Python 3.7
  • PyTorch 1.1.0 or 1.2.0 or 1.3.0
  • Torchvision 0.2.2 or 0.4.1

The models reported in paper are trained with 4 NVIDIA Tesla V100 (32G) GPUs with batch-size 4. The training GPU memory requirement is close to 29G and the testing GPU memory requirement is feasible for a normal NVIDIA TITAN (12G) GPU. One full image pair is fed into the network and used to construct the 3D volume. For your reference, PSMNet is trained with input patch size of 512x256. Please note your GPU memory.

Installation

(1) Clone this repository.

git clone https://github.com/chenyilun95/DSGN.git && cd DSGN

(2) Setup Python environment.

conda activate -n dsgn
pip install -r requirements.txt --user

## conda deactivate dsgn

(3) Compile the rotated IoU library.

cd dsgn/utils/rotate_iou && bash compile.sh & cd ../../../

(4) Compile and install DSGN library.

# the following will install the lib with symbolic links, so that
# you can modify the file if you want and won't need to re-build it.
python3 setup.py build develop --user

Data Preparation

(1) Please download the KITTI dataset and create the model folders. KITTI dataset is avaible here. Download KITTI point clouds, left images, right images, calibrations matrices and object labels.

ln -s /path/to/KITTI_DATA_PATH ./data/kitti/
ln -s /path/to/OUTPUT_PATH ./outputs/

(2) Generate the depth map from the ground-truth LiDAR point cloud and save them in ./data/kitti/training/depth/.

python3 preprocessing/generate_disp.py --data_path ./data/kitti/training/ --split_file ./data/kitti/trainval.txt 
python3 preprocessing/generate_disp.py --data_path ./data/kitti/training/ --split_file ./data/kitti/trainval.txt --right_calib

(3) Pre-compute the bbox targets in pre-defined grid and save them in ./outputs/temp/.

python3 tools/generate_targets.py --cfg CONFIG_PATH

After training the models, the overall directory will look like below:

.                                           (root directory)
|-- dsgn                                    (dsgn library file)
|-- configs                                 (model configurations folder)
|-- ...
|-- data
|   |-- kitti                               (dataset directory)
|       |-- train.txt                       (KITTI train images list (3712 samples))
|       |-- val.txt                         (KITTI val images list (3769 samples))
|       |-- test.txt                        (KITTI test images list (7518 samples))
|       |-- training
|       |   |-- image_2
|       |   |-- image_3
|       |   |-- ...
|       |-- testing
|       |-- depth                           (generated depth map)
|-- outputs
    |-- MODEL_DSGN_v1                       (Model config and snapshots should be saved in the same model folder)
        |-- finetune_53.tar                 (saved model)
        |-- save_config.py                  (saved model configuration file)
        |-- save_config.py.tmp              (automatic generated copy of previous configuration)
        |-- training.log                    (full training log)
        |-- result_kitti_finetune_53.txt    (kitti evaluated results for the saved model)
        |-- kitti_output                    (kitti detection results folder)
    |-- MODEL_DSGN_v2
    |-- temp                                (temporary folder for saving the pre-computed bbox targets)
        |-- ...                             (pre-computed bbox targets under some specific configurations)

Multi-GPU Training

The training scripts support multi-processing distributed training, which is much faster than the typical PyTorch DataParallel interface.

python3 tools/train_net.py --cfg ./configs/config_xxx.py --savemodel ./outputs/MODEL_NAME -btrain 4 -d 0-3 --multiprocessing-distributed

or

bash scripts/mptrain_xxx.sh

The training models, configuration and logs will be saved in the model folder.

To load some pretrained model, you can run

python3 tools/train_net.py --cfg xxx/config.py --loadmodel ./outputs/MODEL_NAMEx --start_epoch xxx --savemodel ./outputs/MODEL_NAME -btrain 4 -d 0-3 --multiprocessing-distributed

If you want to continue training from some epochs, just set the cfg, loadmodel and start_epoch to the respective model path.

Besides, you can start a tensorboard session by

tensorboard --logdir=./outputs/MODEL_NAME/tensorboard --port=6666

and visualize your training process by accessing https://localhost:6666 on your browser.

Inference and Evaluation

Evaluating the models by

python3 tools/test_net.py --loadmodel ./outputs/MODEL_NAME/finetune_xx.tar -btest 8 -d 0-3

KITTI Detection results and evaluation results will be saved in the model folder.

Performance and Model Zoo

We provide several pretrained models for our experiments, which are evaluated on KITTI val set.

Methods Epochs Train Mem (GB/Img) Test Mem (GB/Img) 3D AP BEV AP 2D AP Models
DSGN(Car) 53 ~29 6.05 53.95 64.44 84.62 GoogleDrive
DSGN(Pedestrian) 27 ~27 5.47 31.42 39.35 55.68 GoogleDrive
DSGN(Cyclist) 23.16 24.81 32.86
DSGN_24g(Car) 53 ~24 ~6 51.05 61.04 83.46 TODO
DSGN_12g(Car) 48 10.0 3.0 44.61 55.70 78.25 GoogleDrive

Video Demo

We provide a video demo for showing the result of DSGN. Here we show the predicted depth map and 3D detection results on both front view (the left camera view) and bird's eye view (the ground-truth point cloud).

TODO List

  • Multiprocessing GPU training
  • TensorboardX
  • Reduce training GPU memory usage
  • Result visualization
  • Still in progress

Troubleshooting

If you have issues running or compiling this code, we have compiled a list of common issues in TROUBLESHOOTING.md. If your issue is not present there, please feel free to open a new issue.

Citations

If you find our work useful in your research, please consider citing:

@article{chen2020dsgn,
  title={DSGN: Deep Stereo Geometry Network for 3D Object Detection},
  author={Chen, Yilun and Liu, Shu and Shen, Xiaoyong and Jia, Jiaya},
  journal={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition},
  year={2020}
}

Acknowledgment

This repo borrows code from several repos, like maskrcnn-benchmark, PSMNet, FCOS and kitti-object-eval-python.

Contact

If you have any questions or suggestions about this repo, please feel free to contact me ([email protected]).

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].