All Projects → stotko → Stdgpu

stotko / Stdgpu

Licence: apache-2.0
stdgpu: Efficient STL-like Data Structures on the GPU

Programming Languages

cpp
1120 projects
cpp17
186 projects
cpp14
131 projects

Projects that are alternatives of or similar to Stdgpu

Hipsycl
Implementation of SYCL for CPUs, AMD GPUs, NVIDIA GPUs
Stars: ✭ 377 (-29%)
Mutual labels:  gpu, gpgpu, cuda, gpu-computing
Heteroflow
Concurrent CPU-GPU Programming using Task Models
Stars: ✭ 57 (-89.27%)
Mutual labels:  gpu, gpu-acceleration, cuda, gpu-computing
Neanderthal
Fast Clojure Matrix Library
Stars: ✭ 927 (+74.58%)
Mutual labels:  gpu, gpgpu, cuda, gpu-computing
Cuda Api Wrappers
Thin C++-flavored wrappers for the CUDA Runtime API
Stars: ✭ 362 (-31.83%)
Mutual labels:  gpu, gpgpu, cuda, gpu-computing
Arraymancer
A fast, ergonomic and portable tensor library in Nim with a deep learning focus for CPU, GPU and embedded devices via OpenMP, Cuda and OpenCL backends
Stars: ✭ 793 (+49.34%)
Mutual labels:  openmp, gpgpu, cuda, gpu-computing
Cekirdekler
Multi-device OpenCL kernel load balancer and pipeliner API for C#. Uses shared-distributed memory model to keep GPUs updated fast while using same kernel on all devices(for simplicity).
Stars: ✭ 76 (-85.69%)
Mutual labels:  gpu, gpgpu, gpu-acceleration, gpu-computing
MatX
An efficient C++17 GPU numerical computing library with Python-like syntax
Stars: ✭ 418 (-21.28%)
Mutual labels:  gpu, cuda, gpgpu, gpu-computing
Vuh
Vulkan compute for people
Stars: ✭ 264 (-50.28%)
Mutual labels:  gpu, gpgpu, gpu-acceleration, gpu-computing
Deepnet
Deep.Net machine learning framework for F#
Stars: ✭ 99 (-81.36%)
Mutual labels:  gpu, gpu-acceleration, cuda, gpu-computing
Emu
The write-once-run-anywhere GPGPU library for Rust
Stars: ✭ 1,350 (+154.24%)
Mutual labels:  gpu, gpgpu, gpu-acceleration, gpu-computing
Occa
JIT Compilation for Multiple Architectures: C++, OpenMP, CUDA, HIP, OpenCL, Metal
Stars: ✭ 230 (-56.69%)
Mutual labels:  openmp, gpu, gpgpu, cuda
Bayadera
High-performance Bayesian Data Analysis on the GPU in Clojure
Stars: ✭ 342 (-35.59%)
Mutual labels:  gpu, gpu-acceleration, cuda, gpu-computing
rbcuda
CUDA bindings for Ruby
Stars: ✭ 57 (-89.27%)
Mutual labels:  cuda, gpu-acceleration, gpu-computing
Ilgpu
ILGPU JIT Compiler for high-performance .Net GPU programs
Stars: ✭ 374 (-29.57%)
Mutual labels:  gpu, gpgpu, cuda
gpubootcamp
This repository consists for gpu bootcamp material for HPC and AI
Stars: ✭ 227 (-57.25%)
Mutual labels:  gpu, openmp, cuda
monolish
monolish: MONOlithic LInear equation Solvers for Highly-parallel architecture
Stars: ✭ 166 (-68.74%)
Mutual labels:  gpu, openmp, cuda
Arrayfire Python
Python bindings for ArrayFire: A general purpose GPU library.
Stars: ✭ 358 (-32.58%)
Mutual labels:  gpu, gpgpu, cuda
Amgcl
C++ library for solving large sparse linear systems with algebraic multigrid method
Stars: ✭ 390 (-26.55%)
Mutual labels:  openmp, gpgpu, cuda
FGPU
No description or website provided.
Stars: ✭ 30 (-94.35%)
Mutual labels:  gpu, openmp, cuda
cuda memtest
Fork of CUDA GPU memtest 👓
Stars: ✭ 68 (-87.19%)
Mutual labels:  gpu, cuda, gpu-computing

stdgpu: Efficient STL-like Data Structures on the GPU

Features | Examples | Documentation | Building | Integration | Contributing | License | Contact

Features

stdgpu is an open-source library providing several generic GPU data structures for fast and reliable data management. Multiple platforms such as CUDA, OpenMP, and HIP are supported allowing you to rapidly write highly complex agnostic and native algorithms that look like sequential CPU code but are executed in parallel on the GPU.

  • Productivity. Previous libraries such as thrust, VexCL, ArrayFire or Boost.Compute focus on the fast and efficient implementation of various algorithms for contiguously stored data to enhance productivity. stdgpu follows an orthogonal approach and focuses on fast and reliable data management to enable the rapid development of more general and flexible GPU algorithms just like their CPU counterparts.

  • Interoperability. Instead of providing yet another ecosystem, stdgpu is designed to be a lightweight container library. Therefore, a core feature of stdgpu is its interoperability with previous established frameworks, i.e. the thrust library, to enable a seamless integration into new as well as existing projects.

  • Maintainability. Following the trend in recent C++ standards of providing functionality for safer and more reliable programming, the philosophy of stdgpu is to provide clean and familiar functions with strong guarantees that encourage users to write more robust code while giving them full control to achieve a high performance.

At its heart, stdgpu offers the following GPU data structures and containers:

atomic & atomic_ref
Atomic primitive types and references
bitset
Space-efficient bit array
deque
Dynamically sized double-ended queue
queue & stack
Container adapters
unordered_map & unordered_set
Hashed collection of unique keys and key-value pairs
vector
Dynamically sized contiguous array

In addition, stdgpu also provides commonly required functionality in algorithm, bit, cmath, contract, cstddef, functional, iterator, limits, memory, mutex, ranges, utility to complement the GPU data structures and to increase their usability and interoperability.

Examples

In order to reliably perform complex tasks on the GPU, stdgpu offers flexible interfaces that can be used in both agnostic code, e.g. via the algorithms provided by thrust, as well as in native code, e.g. in custom CUDA kernels.

For instance, stdgpu is extensively used in SLAMCast, a scalable live telepresence system, to implement real-time, large-scale 3D scene reconstruction as well as real-time 3D data streaming between a server and an arbitrary number of remote clients.

Agnostic code. In the context of SLAMCast, a simple task is the integration of a range of updated blocks into the duplicate-free set of queued blocks for data streaming which can be expressed very conveniently:

#include <stdgpu/cstddef.h>             // stdgpu::index_t
#include <stdgpu/iterator.h>            // stdgpu::make_device
#include <stdgpu/unordered_set.cuh>     // stdgpu::unordered_set

class stream_set
{
public:
    void
    add_blocks(const short3* blocks,
               const stdgpu::index_t n)
    {
        set.insert(stdgpu::make_device(blocks),
                   stdgpu::make_device(blocks + n));
    }

    // Further functions

private:
    stdgpu::unordered_set<short3> set;
    // Further members
};

Native code. More complex operations such as the creation of the duplicate-free set of updated blocks or other algorithms can be implemented natively, e.g. in custom CUDA kernels with stdgpu's CUDA backend enabled:

#include <stdgpu/cstddef.h>             // stdgpu::index_t
#include <stdgpu/unordered_map.cuh>     // stdgpu::unordered_map
#include <stdgpu/unordered_set.cuh>     // stdgpu::unordered_set

__global__ void
compute_update_set(const short3* blocks,
                   const stdgpu::index_t n,
                   const stdgpu::unordered_map<short3, voxel*> tsdf_block_map,
                   stdgpu::unordered_set<short3> mc_update_set)
{
    // Global thread index
    stdgpu::index_t i = blockIdx.x * blockDim.x + threadIdx.x;
    if (i >= n) return;

    short3 b_i = blocks[i];

    // Neighboring candidate blocks for the update
    short3 mc_blocks[8]
    = {
        short3(b_i.x - 0, b_i.y - 0, b_i.z - 0),
        short3(b_i.x - 1, b_i.y - 0, b_i.z - 0),
        short3(b_i.x - 0, b_i.y - 1, b_i.z - 0),
        short3(b_i.x - 0, b_i.y - 0, b_i.z - 1),
        short3(b_i.x - 1, b_i.y - 1, b_i.z - 0),
        short3(b_i.x - 1, b_i.y - 0, b_i.z - 1),
        short3(b_i.x - 0, b_i.y - 1, b_i.z - 1),
        short3(b_i.x - 1, b_i.y - 1, b_i.z - 1),
    };

    for (stdgpu::index_t j = 0; j < 8; ++j)
    {
        // Only consider existing neighbors
        if (tsdf_block_map.contains(mc_blocks[j]))
        {
            mc_update_set.insert(mc_blocks[j]);
        }
    }
}

More examples can be found in the examples directory.

Documentation

A comprehensive introduction into the design and API of stdgpu can be found here:

Since a core feature and design goal of stdgpu is its interoperability with thrust, it offers full support for all thrust algorithms instead of reinventing the wheel. More information about the design can be found in the related research paper.

Building

Before building the library, please make sure that all required tools and dependencies are installed on your system. Newer versions are supported as well.

Required

Required for CUDA backend

Required for OpenMP backend

  • OpenMP 2.0
    • GCC 7
      • (Ubuntu 18.04/20.04) Already installed
    • Clang 6
      • (Ubuntu 18.04/20.04) sudo apt install libomp-dev
    • MSVC 19.20
      • (Windows) Already installed

Required for HIP backend (experimental)

The library can be built as every other project which makes use of the CMake build system.

In addition, we also provide cross-platform scripts to make the build process more convenient. Since these scripts depend on the selected build type, there are scripts for both debug and release builds.

Command Effect
sh scripts/setup_<build_type>.sh Performs a full clean build of the project. Removes old build, configures the project (build path: ./build), builds the project, and runs the unit tests.
sh scripts/build_<build_type>.sh (Re-)Builds the project. Requires that the project is set up.
sh scripts/run_tests_<build_type>.sh Runs the unit tests. Requires that the project is built.
sh scripts/install_<build_type>.sh Installs the project at the configured install path (default: ./bin).

Integration

In the following, we show some examples on how the library can be integrated into and used in a project.

CMake Integration. To use the library in your project, you can either install it externally first and then include it using find_package:

find_package(stdgpu 1.0.0 REQUIRED)

add_library(foo ...)

target_link_libraries(foo PUBLIC stdgpu::stdgpu)

Or you can embed it into your project and build it from a subdirectory:

# Exclude the examples from the build
set(STDGPU_BUILD_EXAMPLES OFF CACHE INTERNAL "")

# Exclude the tests from the build
set(STDGPU_BUILD_TESTS OFF CACHE INTERNAL "")

add_subdirectory(stdgpu)

add_library(foo ...)

target_link_libraries(foo PUBLIC stdgpu::stdgpu)

CMake Options. To configure the library, two sets of options are provided. The following build options control the build process:

Build Option Effect Default
STDGPU_BACKEND Device system backend STDGPU_BACKEND_CUDA
STDGPU_BUILD_SHARED_LIBS Builds the project as a shared library, if set to ON, or as a static library, if set to OFF BUILD_SHARED_LIBS
STDGPU_SETUP_COMPILER_FLAGS Constructs the compiler flags ON if standalone, OFF if included via add_subdirectory
STDGPU_TREAT_WARNINGS_AS_ERRORS Treats compiler warnings as errors OFF
STDGPU_BUILD_EXAMPLES Build the examples ON
STDGPU_BUILD_TESTS Build the unit tests ON
STDGPU_BUILD_TEST_COVERAGE Build a test coverage report OFF
STDGPU_ANALYZE_WITH_CLANG_TIDY Analyzes the code with clang-tidy OFF
STDGPU_ANALYZE_WITH_CPPCHECK Analyzes the code with cppcheck OFF

In addition, the implementation of some functionality can be controlled via configuration options:

Configuration Option Effect Default
STDGPU_ENABLE_CONTRACT_CHECKS Enable contract checks OFF if CMAKE_BUILD_TYPE equals Release or MinSizeRel, ON otherwise
STDGPU_USE_32_BIT_INDEX Use 32-bit instead of 64-bit signed integer for index_t ON

Contributing

For detailed information on how to contribute, see CONTRIBUTING.

License

Distributed under the Apache 2.0 License. See LICENSE for more information.

If you use stdgpu in one of your projects, please cite the following publications:

stdgpu: Efficient STL-like Data Structures on the GPU

@UNPUBLISHED{stotko2019stdgpu,
    author = {Stotko, P.},
     title = {{stdgpu: Efficient STL-like Data Structures on the GPU}},
      year = {2019},
     month = aug,
      note = {arXiv:1908.05936},
       url = {https://arxiv.org/abs/1908.05936}
}

SLAMCast: Large-Scale, Real-Time 3D Reconstruction and Streaming for Immersive Multi-Client Live Telepresence

@article{stotko2019slamcast,
    author = {Stotko, P. and Krumpen, S. and Hullin, M. B. and Weinmann, M. and Klein, R.},
     title = {{SLAMCast: Large-Scale, Real-Time 3D Reconstruction and Streaming for Immersive Multi-Client Live Telepresence}},
   journal = {IEEE Transactions on Visualization and Computer Graphics},
    volume = {25},
    number = {5},
     pages = {2102--2112},
      year = {2019},
     month = may
}

Contact

Patrick Stotko - [email protected]

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].