All Projects → mp3guy → Icpcuda

mp3guy / Icpcuda

Super fast implementation of ICP in CUDA for compute capable devices 3.5 or higher

Labels

Projects that are alternatives of or similar to Icpcuda

Nvpipe
NVIDIA-accelerated zero latency video compression library for interactive remoting applications
Stars: ✭ 376 (-9.62%)
Mutual labels:  cuda
Ganet
GA-Net: Guided Aggregation Net for End-to-end Stereo Matching
Stars: ✭ 393 (-5.53%)
Mutual labels:  cuda
Warp Ctc
Fast parallel CTC.
Stars: ✭ 3,954 (+850.48%)
Mutual labels:  cuda
Ilgpu
ILGPU JIT Compiler for high-performance .Net GPU programs
Stars: ✭ 374 (-10.1%)
Mutual labels:  cuda
Amgcl
C++ library for solving large sparse linear systems with algebraic multigrid method
Stars: ✭ 390 (-6.25%)
Mutual labels:  cuda
Integral Human Pose
Integral Human Pose Regression
Stars: ✭ 395 (-5.05%)
Mutual labels:  cuda
Mini Caffe
Minimal runtime core of Caffe, Forward only, GPU support and Memory efficiency.
Stars: ✭ 373 (-10.34%)
Mutual labels:  cuda
Deformable Convolution Pytorch
PyTorch implementation of Deformable Convolution
Stars: ✭ 410 (-1.44%)
Mutual labels:  cuda
Neuralnetwork.net
A TensorFlow-inspired neural network library built from scratch in C# 7.3 for .NET Standard 2.0, with GPU support through cuDNN
Stars: ✭ 392 (-5.77%)
Mutual labels:  cuda
Gocv
Go package for computer vision using OpenCV 4 and beyond.
Stars: ✭ 4,511 (+984.38%)
Mutual labels:  cuda
Hipsycl
Implementation of SYCL for CPUs, AMD GPUs, NVIDIA GPUs
Stars: ✭ 377 (-9.37%)
Mutual labels:  cuda
Cudf
cuDF - GPU DataFrame Library
Stars: ✭ 4,370 (+950.48%)
Mutual labels:  cuda
Cubert
Fast implementation of BERT inference directly on NVIDIA (CUDA, CUBLAS) and Intel MKL
Stars: ✭ 395 (-5.05%)
Mutual labels:  cuda
Cuda.jl
CUDA programming in Julia.
Stars: ✭ 370 (-11.06%)
Mutual labels:  cuda
Ai Lab
All-in-one AI container for rapid prototyping
Stars: ✭ 406 (-2.4%)
Mutual labels:  cuda
Vuda
VUDA is a header-only library based on Vulkan that provides a CUDA Runtime API interface for writing GPU-accelerated applications.
Stars: ✭ 373 (-10.34%)
Mutual labels:  cuda
Cudanative.jl
Julia support for native CUDA programming
Stars: ✭ 393 (-5.53%)
Mutual labels:  cuda
H2o4gpu
H2Oai GPU Edition
Stars: ✭ 416 (+0%)
Mutual labels:  cuda
Tensorrt tutorial
Stars: ✭ 407 (-2.16%)
Mutual labels:  cuda
Pytorch Pwc
a reimplementation of PWC-Net in PyTorch that matches the official Caffe version
Stars: ✭ 402 (-3.37%)
Mutual labels:  cuda

ICPCUDA

Super fast implementation of ICP in CUDA for compute capable devices 3.5 or higher. On an NVIDIA GeForce GTX TITAN X it runs at over 750Hz (using projective data assocation). Last tested with Ubuntu 18.04.2, CUDA 10.1 and NVIDIA drivers 418.39.

Requires CUDA, includes Pangolin, Eigen and Sophus third party submodules. I've built it to take in raw TUM RGB-D datasets to do frame-to-frame dense ICP as an example application.

Install;

sudo apt-get install build-essential cmake libglew-dev libpng-dev
git clone https://github.com/mp3guy/ICPCUDA.git
cd ICPCUDA
git submodule update --init
cd third-party/Pangolin/
mkdir build
cd build/
cmake ../ -DEIGEN_INCLUDE_DIR=<absolute_path_to_Eigen_submodule>
make -j12
cd ../../../
mkdir build
cd build/
cmake ..
make -j12

The particular version of ICP implemented is the one introduced by KinectFusion. This means a three level coarse-to-fine registration pyramid, from 160x120 to 320x240 and finally 640x480 image sizes, with 4, 5 and 10 iterations per level respectively.

Run like;

./ICP ~/Desktop/rgbd_dataset_freiburg1_desk/ -v

Where ~/Desktop/rgbd_dataset_freiburg1_desk/ contains the depth.txt file, for more information see here.

The main idea to getting the best performance is determining the best thread/block sizes to use. I have provided an exhaustive search function to do this, since it varies between GPUs. Simply pass the "-v" switch to the program to activate the search. The code will then first do a search for the best thread/block sizes and then run ICP and output something like this on an nVidia GeForce GTX TITAN X;

GeForce GTX TITAN X
Searching for the best thread/block configuration for your GPU...
Best: 256 threads, 96 blocks (1.3306ms), 100%
ICP: 1.3236ms
ICP speed: 755Hz

The code will output one file; output.poses. You can evaluate it on the TUM benchmark by using their tools. I get something like this;

python ~/stuff/Kinect_Logs/Freiburg/evaluate_ate.py ~/Desktop/rgbd_dataset_freiburg1_desk/groundtruth.txt output.poses 
0.144041

The difference in values comes down to the fact that each method uses a different reduction scheme and floating point operations are not associative.

Also, if you're using this code in academic work and it would be suitable to do so, please consider referencing some of my possibly relevant research in your literature review/related work section.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].