Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → eyalroz → Cuda Api Wrappers

eyalroz / Cuda Api Wrappers

Licence: bsd-3-clause

Thin C++-flavored wrappers for the CUDA Runtime API

Labels

gpu cuda wrapper nvidia api-wrapper gpgpu gpu-computing

Projects that are alternatives of or similar to Cuda Api Wrappers

Hipsycl

Implementation of SYCL for CPUs, AMD GPUs, NVIDIA GPUs

Stars: ✭ 377 (+4.14%)

Mutual labels: gpu, gpgpu, cuda, gpu-computing

Ilgpu

ILGPU JIT Compiler for high-performance .Net GPU programs

Stars: ✭ 374 (+3.31%)

Mutual labels: nvidia, gpu, gpgpu, cuda

Stdgpu

stdgpu: Efficient STL-like Data Structures on the GPU

Stars: ✭ 531 (+46.69%)

Mutual labels: gpu, gpgpu, cuda, gpu-computing

Parenchyma

An extensible HPC framework for CUDA, OpenCL and native CPU.

Stars: ✭ 71 (-80.39%)

Mutual labels: nvidia, gpu, gpgpu, cuda

Neanderthal

Fast Clojure Matrix Library

Stars: ✭ 927 (+156.08%)

Mutual labels: gpu, gpgpu, cuda, gpu-computing

MatX

An efficient C++17 GPU numerical computing library with Python-like syntax

Stars: ✭ 418 (+15.47%)

Mutual labels: gpu, cuda, gpgpu, gpu-computing

CUDAfy.NET

CUDAfy .NET allows easy development of high performance GPGPU applications completely from the .NET. It's developed in C#.

Stars: ✭ 56 (-84.53%)

Mutual labels: nvidia, gpgpu, gpu-computing

learn-gpgpu

Algorithms implemented in CUDA + resources about GPGPU

Stars: ✭ 37 (-89.78%)

Mutual labels: nvidia, gpgpu, gpu-computing

Deep Diamond

A fast Clojure Tensor & Deep Learning library

Stars: ✭ 288 (-20.44%)

Mutual labels: nvidia, gpu, cuda

Webclgl

GPGPU Javascript library 🐸

Stars: ✭ 313 (-13.54%)

Mutual labels: gpu, gpgpu, gpu-computing

Genomeworks

SDK for GPU accelerated genome assembly and analysis

Stars: ✭ 215 (-40.61%)

Mutual labels: nvidia, gpu, cuda

peakperf

Achieve peak performance on x86 CPUs and NVIDIA GPUs

Stars: ✭ 33 (-90.88%)

Mutual labels: gpu, cuda, nvidia

Bayadera

High-performance Bayesian Data Analysis on the GPU in Clojure

Stars: ✭ 342 (-5.52%)

Mutual labels: gpu, cuda, gpu-computing

Awesome Cuda

This is a list of useful libraries and resources for CUDA development.

Stars: ✭ 274 (-24.31%)

Mutual labels: gpu, gpgpu, cuda

Plotoptix

Data visualisation in Python based on OptiX 7.2 ray tracing framework.

Stars: ✭ 252 (-30.39%)

Mutual labels: nvidia, gpu, cuda

Komputation

Komputation is a neural network framework for the Java Virtual Machine written in Kotlin and CUDA C.

Stars: ✭ 295 (-18.51%)

Mutual labels: nvidia, gpu, cuda

Nvidia Modded Inf

Modified nVidia .inf files to run drivers on all video cards, research & telemetry free drivers

Stars: ✭ 227 (-37.29%)

Mutual labels: nvidia, gpu, cuda

cuda memtest

Fork of CUDA GPU memtest 👓

Stars: ✭ 68 (-81.22%)

Mutual labels: gpu, cuda, gpu-computing

Thrust

The C++ parallel algorithms library.

Stars: ✭ 3,595 (+893.09%)

Mutual labels: nvidia, gpu, cuda

opencv-cuda-docker

Dockerfiles for OpenCV compiled with CUDA, opencv_contrib modules and Python 3 bindings

Stars: ✭ 55 (-84.81%)

Mutual labels: gpu, cuda, nvidia

View All Similar Projects ➔

cuda-api-wrappers:
Thin C++-flavored wrappers for the CUDA runtime API

Branch Build Status: Master | Development:

nVIDIA's Runtime API for CUDA is intended for use both in C and C++ code. As such, it uses a C-style API, the lowest common denominator (with a few notable exceptions of templated function overloads).

This library of wrappers around the Runtime API is intended to allow us to embrace many of the features of C++ (including some C++11) for using the runtime API - but without reducing expressivity or increasing the level of abstraction (as in, e.g., the Thrust library). Using cuda-api-wrappers, you still have your devices, streams, events and so on - but they will be more convenient to work with in more C++-idiomatic ways.

Key features

All functions and methods throw exceptions on failure - no need to check return values (the exceptions carry the status information).
Judicious namespacing (and some internal namespace-like classes) for better clarity and for semantically grouping related functionality together.
There are proxy/wrapper objects for devices, streams, events, kernels and so on, using RAII to relieve you of remembering to free or destroy resources.
You can mostly forget about numeric IDs and handles; the proxy classes will fit everywhere.
Various Plain Old Data structs adorned with convenience methods and operators.
Aims for clarity and straightforwardness in naming and semantics, so that you don't need to look concepts up in the official documentation to understand what each class and function do.
Thin and lightweight:
- No work done behind your back, no caches or indices or any such thing.
- No costly inheritance structure, vtables, virtual methods and so on - vanishes almost entirely on compilation.
- Doesn't really "hide" any of CUDA's complexity or functionality; it only simplifies use of the Runtime API.

Detailed documentation

Detailed nearly-complete Doxygen-genereated documentation is available.

Requirements

CUDA v8.0 or later is recommended and v7.5 should be supported (but is untested). CUDA 6.x should probably be Ok as well.
A C++11-capable compiler compatible with your version of CUDA.
CMake v3.8 or later - although most of the library will work as simple headers with no building.

Coverage of the Runtime API

Considering the list of runtime API modules, the library currently has the following (w.r.t. CUDA 8.x):

Coverage level	Modules
full	Error Handling, Stream Management, Event Management, Version Management, Peer Device Memory Access, Occupancy, Unified Addressing
almost full	Device Management (no chooseDevice, cudaSetValidDevices), Memory Management, Execution Control (no support for working with parameter buffers)
partial	2D & 3D Arrays, Texture Object Management, Texture Reference Management
(deprecated)	Thread management
no coverage	Graph Management, OpenGL Interoperability, Direct3D Interoperability, VDPAU Interoperability, EGL Interoperability, Graphics Interoperability, Surface Reference Management, Surface Object Management

The Milestones indicates some features which aren't covered and are slated for future work.

Since I am not currently working on anything graphics-related, there are no short-term plans to extend coverage to any of the graphics related modules.

A taste of the key features in play

We've all dreamed of being able to type in:

my_stream.enqueue.callback(
	[&foo](cuda::stream_t stream, cuda::status_t status) {
		std::cout << "Hello " << foo << " world!\n";
	}
);

... and have that just work, right? Well, now it does!

On a slightly more serious note, though, let's demonstrate the principles listed above:

Use of namespaces (and internal classes)

With this library, you would do cuda::memory::host::allocate() instead of cudaMallocHost() and cuda::device_t::memory::allocate() instead of setting the current device and then cudaMalloc(). Note, though, that device_t::memory::allocate() is not a freestanding function but a method of an internal class, so a call to it might be cuda::device::get(my_device_id).memory.allocate(my_size). The compiled version of this supposedly complicated construct will be nothing but the sequence of cudaSetDevice() and cudaMalloc() calls.

Adorning POD structs with convenience methods

The expression

my_device.properties().compute_capability() >= cuda::make_compute_capability(50)

is a valid comparison, true for all devices with a Maxwell-or-later micro-architecture. This, despite the fact that struct cuda::compute_capability_t is a POD type with two unsigned integer fields, not a scalar. Note that struct cuda::device::properties_t (which is really basically a struct cudaDeviceProp of the Runtime API itself) does not have a compute_capability field.

Meaningful naming

Instead of using

cudaError_t cudaEventCreateWithFlags(
    cudaEvent_t* event, 
    unsigned int flags)

which requires you remember what you need to specify as flags and how, you create a cuda::event_t proxy object, using the function:

cuda::event_t cuda::event::create(
    cuda::device_t  device,
    bool            uses_blocking_sync,
    bool            records_timing      = cuda::event::do_record_timing,
    bool            interprocess        = cuda::event::not_interprocess)

The default values here are enum : bool's, which you can use yourself when creating non-default-parameter events - to make the call more easily readable than with true or false.

Example programs

More detailed documentation / feature walk-through is forthcoming. For now I'm providing two kinds of short example programs; browsing their source you'll know essentially all there is to know about the API wrappers.

To build and run the examples (just as a sanity check), execute the following:

[[email protected]:/path/to/cuda-api-wrappers/]$ cmake -S . -B build -DBUILD_EXAMPLES=ON . && cmake --build build/ && find build/examples/bin -exec "{}" ";"

Modified CUDA samples

The CUDA distribution contains sample programs demostrating various features and concepts. A few of these - which are not focused on device-side work - have been adapted to use the API wrappers - completely foregoing direct use of the CUDA Runtime API itself. You will find them in the modified CUDA samples example programs folder.

'Coverage' test programs - by module of the Runtime API

Gradually, an example program is being added for each one of the CUDA Runtime API Modules, in which the approach replacing use of those module API calls by use of the API wrappers is demonstrated. These per-module example programs can be found here.

Bugs, suggestions, feedback

I would like some help with building up documentation and perhaps a Wiki here; if you can spare the time - do write me. You can also do so if you're interested in collaborating on some related project or for general comments/feedback/suggestions.

If you notice a specific issue which needs addressing, especially any sort of bug or compilation error, please file the issue here on GitHub.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 362

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (32) 🔗

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

eyalroz / Cuda Api Wrappers

Labels

Projects that are alternatives of or similar to Cuda Api Wrappers

cuda-api-wrappers: Thin C++-flavored wrappers for the CUDA runtime API

Key features

Detailed documentation

Requirements

Coverage of the Runtime API

A taste of the key features in play

Use of namespaces (and internal classes)

Adorning POD structs with convenience methods

Meaningful naming

Example programs

Modified CUDA samples

'Coverage' test programs - by module of the Runtime API

Bugs, suggestions, feedback

cuda-api-wrappers:
Thin C++-flavored wrappers for the CUDA runtime API