Pose Correction for Highly Accurate Visual Localization in Large-scale Indoor Spaces (ICCV 2021)
Janghun Hyeon1* , JooHyung Kim1* , Nakju Doh1, 2
1 Korea University, 2 TeeLabs
* Equally contributed to this work.
Abstract
Indoor visual localization is significant for various applications such as autonomous robots, augmented reality, and mixed reality. Recent advances in visual localization have demonstrated their feasibility in large-scale indoor spaces through coarse-to-fine methods that typically employ three steps: image retrieval, pose estimation, and pose selection. However, further research is needed to improve the accuracy of large-scale indoor visual localization. We demonstrate that the limitations in the previous methods can be attributed to the sparsity of image positions in the database, which causes view-differences between a query and a retrieved image from the database. In this paper, to address this problem, we propose a novel module, named pose correction, that enables re-estimation of the pose with local feature matching in a similar view by reorganizing the local features. This module enhances the accuracy of the initially estimated pose and assigns more reliable ranks. Furthermore, the proposed method achieves a new state-of-the-art performance with an accuracy of more than 90 %within 1.0 m in the challenging indoor benchmark dataset InLoc for the first time.
Dependencies
-
Python 3
-
Pytorch >= 1.1
-
Tensorflow >= 1.13
-
openCV >= 3.4
-
Matplotlib >= 3.1
-
Numpy >= 1.18
-
scipy >= 1.4.1
-
open3d >= 0.7.0.0
-
vlfeat >= 0.9.20
-
vlfeat-ctypes >= 0.1.5
Prerequisite: Model Parameters
PCLoc is based on coarse-to-fine localization, which uses NetVLAD, SuperPoint, and SuperGlue. Thus, the model parameter should be downloaded from the original code.
Download Model
NetVLAD:Download parameter from the above URL, and unzip the file at:
./thirdparty/netvlad_tf/checkpoints/vd16_pitts30k_conv5_3_vlad_preL2_intra_white.data-00000-of-00001
./thirdparty/netvlad_tf/checkpoints/vd16_pitts30k_conv5_3_vlad_preL2_intra_white.index
./thirdparty/netvlad_tf/checkpoints/vd16_pitts30k_conv5_3_vlad_preL2_intra_white.meta
Download Model
SuperPoint and SuperGlue:./thirdparty/SuperGluePretrainedNetwork/models/weights/superglue_outdoor.pth
./thirdparty/SuperGluePretrainedNetwork/models/weights/superpoint_v1.pth
Prerequisite: Dataset
Dataset
To test our model using the InLoc dataset, the dataset should be downloaded. Downloading takes a while (Dataset is about 1.0TB). Click here to download dataset.
Quick Start
- Clone repository
git clone --recurse-submodules https://github.com/JanghunHyeon/PCLoc.git
- Download InLoc dataset
- Install dependencies
- Download model parameters and unzip the files at each of above path
- Modify
database_setup.py
- line 21 (--db_dir) : path to inloc dataset
- line 22 (--save_dir) : path to save directory of database features
- Execute
database_setup.py
in python, which prepares database features. - Modify
main_inference.py
- line 39 (--query_dir) : path to query directory
- line 40 (--db_dir) : path to database features, which was generated by running
database_setup.py
.
- Execute
main_inference.py
in python, and results are saved at--log_dir
.
Database Description
netvlad_feats.npy
: Global descriptors (NetVLAD) of the database images.local_feats
: Local features and the corresponding 3D coordinates to the keypoints of each database image.pc_feats
: Local feature map used for the pose correction.scans_npy.npy
: RGB-D scan data from the dataset (InLoc), which is used for pose verification.
Contents
The provided sample code (06_main_inference.py
) runs pose correction.
This code provides three options:
--opt_div_matching
: Usage of Divided Matching
Example: --opt_div_matching
- False: Table 5 (b-1) from the paper
- True: Table 5 (b-2) from the paper
Results
After running the code, results are shown in the --log_dir
.
Example: ./log/202103241833/IMG_0738/mpv
00_query_img.jpg
: image used for the query.01_final_pose.jpg
: rendered image at the final pose.02_final_err_30.837045.jpg
: error image between the query and the rendered image.pred_IMG_0738.txt
: estimated final pose.all/*
: top-k candidates from the pose correction.
Error [m, 10o] | DUC1 | DUC2 |
---|---|---|
InLoc | 40.9/ 58.1/ 70.2 | 35.9/ 54.2/ 69.5 |
HFNet | 39.9/ 55.6/ 67.2 | 37.4/ 57.3/ 70.2 |
KAPTURE | 41.4/ 60.1/ 73.7 | 47.3/ 67.2/ 73.3 |
D2Net | 43.9/ 61.6/ 73.7 | 42.0/ 60.3/ 74.8 |
Oracle | 43.9/ 66.2/ 78.3 | 43.5/ 63.4/ 76.3 |
Sparse NCNet | 47.0/ 67.2/ 79.8 | 43.5/ 64.9/ 80.2 |
RLOCS | 47.0/ 71.2/ 84.8 | 58.8/ 77.9/ 80.9 |
SuperGlue | 46.5/ 65.7/ 77.8 | 51.9/ 72.5/ 79.4 |
Baseline (3,000) | 53.0/ 76.8/ 85.9 | 61.8/ 80.9/ 87.0 |
Ours (3,000) | 59.6/ 78.3/ 89.4 | 71.0/ 93.1/ 93.9 |
Ours (4,096) | 60.6/ 79.8/ 90.4 | 70.2/ 92.4/ 93.1 |
Every evaluation was conudcted with the online viusal localization benchmark server. visuallocalization.net/benchmark
BibTeX Citation
If you use any ideas from the paper or code from this repo, please consider citing:
@inproceedings{hyeon2021pose,
title={Pose Correction for Highly Accurate Visual Localization in Large-Scale Indoor Spaces},
author={Hyeon, Janghun and Kim, Joohyung and Doh, Nakju},
booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
pages={15974--15983},
year={2021}
}