All Projects → felixchenfy → Monocular Visual Odometry

felixchenfy / Monocular Visual Odometry

Licence: mit
A simple monocular visual odometry (part of vSLAM) by ORB keypoints with initialization, tracking, local map and bundle adjustment. (WARNING: Hi, I'm sorry that this project is just tuned for course demo, not for real world applications !!!)

Programming Languages

cpp
1120 projects

Projects that are alternatives of or similar to Monocular Visual Odometry

Jpdaf tracking
A tracker based on joint probabilistic data association filtering.
Stars: ✭ 107 (-27.21%)
Mutual labels:  opencv, tracking, eigen
Person Detection And Tracking
A tensorflow implementation with SSD model for person detection and Kalman Filtering combined for tracking
Stars: ✭ 193 (+31.29%)
Mutual labels:  opencv, tracking
Poseflow
PoseFlow: Efficient Online Pose Tracking (BMVC'18)
Stars: ✭ 330 (+124.49%)
Mutual labels:  opencv, tracking
Multi Camera Live Object Tracking
Multi-camera live traffic and object counting with YOLO v4, Deep SORT, and Flask.
Stars: ✭ 375 (+155.1%)
Mutual labels:  opencv, tracking
Haskell Opencv
Haskell binding to OpenCV-3.x
Stars: ✭ 145 (-1.36%)
Mutual labels:  opencv
Animoji Animate
Facial-Landmarks Detection based animating application similar to Apple-Animoji™
Stars: ✭ 142 (-3.4%)
Mutual labels:  opencv
Bgslibrary
A C++ Background Subtraction Library with wrappers for Python, MATLAB, Java and GUI on QT
Stars: ✭ 1,838 (+1150.34%)
Mutual labels:  opencv
Legocv
Native OpenCV Swift Framework
Stars: ✭ 140 (-4.76%)
Mutual labels:  opencv
Stb Tester
Automated Testing for Set-Top Boxes and Smart TVs
Stars: ✭ 148 (+0.68%)
Mutual labels:  opencv
Face Detect
A Python based tool to extract faces from any picture.
Stars: ✭ 146 (-0.68%)
Mutual labels:  opencv
Sql tracker
Rails SQL Query Tracker
Stars: ✭ 144 (-2.04%)
Mutual labels:  tracking
Arkitexperiments
Quick and dirty experiments with ARKit
Stars: ✭ 142 (-3.4%)
Mutual labels:  opencv
Dice
Digital Image Correlation Engine (DICe): a stereo DIC application that runs on a desktop or high performance computing platform (massively parallel)
Stars: ✭ 144 (-2.04%)
Mutual labels:  tracking
Find face landmarks
C++ \ Matlab library for finding face landmarks and bounding boxes in video\image sequences.
Stars: ✭ 141 (-4.08%)
Mutual labels:  tracking
Jeelizfacefilter
Javascript/WebGL lightweight face tracking library designed for augmented reality webcam filters. Features : multiple faces detection, rotation, mouth opening. Various integration examples are provided (Three.js, Babylon.js, FaceSwap, Canvas2D, CSS3D...).
Stars: ✭ 2,042 (+1289.12%)
Mutual labels:  tracking
Opencv transforms torchvision
opencv reimplement for transforms in torchvision
Stars: ✭ 141 (-4.08%)
Mutual labels:  opencv
Self Driving Car 3d Simulator With Cnn
Implementing a self driving car using a 3D Driving Simulator. CNN will be used for training
Stars: ✭ 143 (-2.72%)
Mutual labels:  opencv
Scene Text Recognition
Scene text detection and recognition based on Extremal Region(ER)
Stars: ✭ 146 (-0.68%)
Mutual labels:  opencv
Img term
Display image, video or USB camera in your ANSI terminal!
Stars: ✭ 143 (-2.72%)
Mutual labels:  opencv
Hololenswithopencvforunityexample
HoloLens With OpenCVforUnity Example
Stars: ✭ 142 (-3.4%)
Mutual labels:  opencv

Monocular Visual Odometry

A monocular visual odometry (VO) with 4 components: initialization, tracking, local map, and bundle adjustment.

I did this project after I read the Slambook.
It's also my final project for the course EESC-432 Advanced Computer Vision in NWU in 2019 March.

A demo:

In the above figure:
Left is a video and the detected key points.
Right is the camera trajectory corresponding to the left video: White line is from VO; Green line is ground truth. Red markers on white line are the keyframes. Points are the map points, where points with red color are newly triangulated.
You can download video here.

Report

My pdf-version course report is here. It has a more clear decription about the algorithms than this README, so I suggest to read it.

Directory

1. Algorithm

This VO is achieved by the following procedures/algorithms:

1.1. Initialization

Estimate relative camera pose:
Given a video, set the 1st frame(image) as reference, and do feature matching with the 2nd frame. Compute the Essential Matrix (E) and Homography Matrix (H) between the two frames. Compute their Symmetric Transfer Error by method in ORB-SLAM paper and choose the better one (i.e., choose H if H/(E+H)>0.45). Decompose E or H into the relative pose between two frames, which is the rotation (R) and translation (t). By using OpenCV, E gives 1 result, and H gives 2 results, satisfying the criteria that points are in front of camera. For E, only single result to choose; For H, choose the one that makes the image plane and world-points plane more parallel.

Keyframe and local map:
Insert both 1st and K_th frame as keyframe. Triangulate their inlier matched keypoints to obtain the points' world positions. These points are called map points and are pushed to local map.

Check Triangulation Result
If the median triangulation angle is smaller than threshold, I will abandon this 2nd frame, and repeat the above process on frame 3, 4, etc. If at frame K, the triangulation angle is large than threshold, the initialization is completed.

Change scale:
Scale the translation t to be the same length as the ground truth, so that I can make comparison with ground truth. Then, scale the map points correspondingly.

1.2. Tracking

Keep on estimating the next camera pose. First, find map points that are in the camera view. Do feature matching to find 2d-3d correspondance between 3d map points and 2d image keypoints. Estimate camera pose by RANSAC and PnP.

1.3. Local Map

Insert keyframe: If the relative pose between current frame and previous keyframe is large enough with a translation or rotation larger than the threshold, insert current frame as a keyframe.
Do feature matching between current and previous keyframe. Get inliers by epipoloar constraint. If a inlier cv::KeyPoint hasn't been triangulated before, then triangulate it and push it to local map.

Clean up local map: Remove map points that are: (1) not in current view, (2) whose view_angle is larger than threshold, (3) rarely be matched as inlier point. (See Slambook Chapter 9.4.)

Graph/Connections between map points and frames:
Graphs are built at two stages of the algorithm:

  1. After PnP, based on the 3d-2d correspondances, I update the connectionts between map points and current keypoints.
  2. During triangulation, I also update the 2d-3d correspondance between current keypoints and triangulated mappoints, by either a direct link or going through previous keypoints that have been triangulated.

1.4. Bundle Adjustment

Since I've built the graph in previous step, I know what the 3d-2d point correspondances are in all frames.

Apply optimization to the previous N frames, where the cost function is the sum of reprojection error of each 3d-2d point pair. By computing the deriviate wrt (1) points 3d pos and (2) camera poses, we can solve the optimization problem using Gauss-Newton Method and its variants. These are done by g2o and its built-in datatypes of VertexSBAPointXYZ, VertexSE3Expmap, and EdgeProjectXYZ2UV. See Slambook Chapter 4 and Chapter 7.8.2 for more details.

1.5. Other details

Image features:
Extract ORB keypoints and features. Then, a simple grid sampling is applied to obtain keypoints uniformly distributed across image.

Feature matching:
Two methods are implemented, where good match is:
(1) Feature's distance is smaller than threshold, described in Slambook.
(2) Ratio of smallest and second smallest distance is smaller than threshold, proposed in Prof. Lowe's 2004 SIFT paper.
The first one is adopted, which is easier to tune the parameters to generate fewer error matches.

2. File Structure

2.1. Folders

  • include/: c++ header files.
  • src/: c++ definitions.
  • test/: Testing scripts for c++ functions.
  • data/: Store images.

Main scripts and classes for VO are in include/my_slam/vo/. I referenced this structure from the Slambook Chapter 9.

2.2. Functions

Functions are declared in include/. Some of its folders contain a README. See the tree structure for overview:

include/
└── my_slam
    ├── basics
    │   ├── basics.h
    │   ├── config.h
    │   ├── yaml.h
    │   ├── eigen_funcs.h
    │   ├── opencv_funcs.h
    │   └── README.md
    ├── common_include.h
    ├── display
    │   ├── pcl_display.h
    │   └── pcl_display_lib.h
    ├── geometry
    │   ├── camera.h
    │   ├── epipolar_geometry.h
    │   ├── feature_match.h
    │   └── motion_estimation.h
    ├── optimization
    │   └── g2o_ba.h
    └── vo
        ├── frame.h
        ├── map.h
        ├── mappoint.h
        ├── README.md
        ├── vo_commons.h
        ├── vo_io.h
        └── vo.h

3. Dependencies

Require: OpenCV, Eigen, Sophus, g2o.
See details below:

(1) OpenCV 4.0
Tutorial for install OpenCV 4.0: link.

You may need a version newer than 3.4.5, because I used this function:
filterHomographyDecompByVisibleRefpoints, which appears in OpenCV 3.4.5.

(2) Eigen 3
It's about matrix arithmetic. See its official page. Install by:

$ sudo apt-get install libeigen3-dev

(Note: Eigen only has header files. No ".so" or ".a" files.)

(3) Sophus

It's based on Eigen, and contains datatypes for Lie Group and Lie Algebra (SE3/SO3/se3/so3).

Download this lib here: https://github.com/strasdat/Sophus. Do cmake and make. Since I failed to make install it, I manually moved “/Sophus/sophus” to “/usr/include/sophus”, and moved “libSophus.so” to “usr/lib”. Then, in my CMakeLists.txt, I add this: set (THIRD_PARTY_LIBS libSophus.so ).

If there is an error of "unit_complex_.real() = 1.;" replace it and its following line with "unit_complex_ = std::complex(1,0);"

(4) g2o

First install either of the following two packages:

$ sudo apt-get install libsuitesparse $ sudo apt-get install libsuitesparse-dev

Download here: https://github.com/RainerKuemmerle/g2o.
Checkout to the last version in year 2017. Do cmake, make, make install.

4. How to Run

$ mkdir build && mkdir lib && mkdir bin
$ cd build && cmake .. && make && cd ..

Then, set up things in config/config.yaml, and run:

$ bin/run_vo config/config.yaml

5. Results

I tested the current implementation on TUM fr1_desk and fr1_xyz dataset, but both performances are bad. I guess its due to too few detected keypoints, which causes too few keypoints matches. The solution I guess is to use the ORB-SLAM's method for extracting enough uniformly destributed keypoints across different scales, and doing guided matching based on the estimated camera motion.

Despite bad performance on fr1 dataset, my program does work well on this New Tsukuba Stereo Database, whose images and scenes are synthetic and have abundant high quality keypoints. The results are shown below.

I tested my VO with 3 different settings: (1) No optimization. (2) Optimize on map points and current camera pose. (3) Optimize on previous 5 camera poses. See videos below:

(1) No optimization:

(2) Optimize on points + current pose:

(2) Optimize on prev 5 poses:

The result shows: (1) Optimization improves accuracy. (2) The estiamted trajectory is close to the ground truth.

6. Reference

(1) Slambook:
I read this Dr. Xiang Gao's Slambook before writing code. The book provides both vSLAM theory as well as easy-to-read code examples in every chapter.

The framework of my program is based on Chapter 9 of Slambook, which is a RGB-D visual odometry project. Classes declared in include/vo/ are based on this Chapter.

These files are mainly copied or built on top of the Slambook's code:

I also borrowed other codes from the slambook. But since they are small pieces and lines, I didn't list them here.

In short, the Slambook provides huge help for me and my this project.

(2) Matlab VO tutorial:
This is a matlab tutorial of monocular visual odometry. Since Slambook doesn't write a lot about monocular VO, I resorted to this Matlab tutorial for solution. It helped me a lot for getting clear the whole workflow.

The dataset I used is also the same as this Matlab tutorial, which is the New Tsukuba Stereo Database.

(3) ORB-SLAM/ORB-SLAM2 papers

I borrowed its code of the criteria for choosing Essential or Homography (for decomposition to obtain relative camera pose.). The copied functions are checkEssentialScore and checkHomographyScore in motion_estimation.h.

7. To Do

Improvements

  • In bundle adjustment, I cannot optimize (1) multiple frames and (b) map points at the same time. It returns huge error. I haven't figure out why.

  • If certain region of the image has only few keypoints, then extract more.

  • Utilize epipolar constraint to do feature matching.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].