A fast, ergonomic and portable tensor library in Nim with a deep learning focus for CPU, GPU and embedded devices via OpenMP, Cuda and OpenCL backends

Stars: ✭ 793 (+2043.24%)

Mutual labels: opencl, parallel-computing, gpgpu, gpu-computing

Cuda Api Wrappers

Thin C++-flavored wrappers for the CUDA Runtime API

Stars: ✭ 362 (+878.38%)

Mutual labels: nvidia, gpgpu, gpu-computing

Fast

A framework for GPU based high-performance medical image processing and visualization

Stars: ✭ 179 (+383.78%)

Mutual labels: opencl, parallel-computing, gpu-computing

Ilgpu

ILGPU JIT Compiler for high-performance .Net GPU programs

Stars: ✭ 374 (+910.81%)

Mutual labels: opencl, nvidia, gpgpu

Hipsycl

Implementation of SYCL for CPUs, AMD GPUs, NVIDIA GPUs

Stars: ✭ 377 (+918.92%)

Mutual labels: opencl, gpgpu, gpu-computing

Clojurecl

ClojureCL is a Clojure library for parallel computations with OpenCL.

Stars: ✭ 266 (+618.92%)

Mutual labels: opencl, nvidia, gpu-computing

Pyopencl

OpenCL integration for Python, plus shiny features

Stars: ✭ 790 (+2035.14%)

Mutual labels: opencl, parallel-computing, nvidia

Neanderthal

Fast Clojure Matrix Library

Stars: ✭ 927 (+2405.41%)

Mutual labels: opencl, gpgpu, gpu-computing

gpuowl

GPU Mersenne primality test.

Stars: ✭ 77 (+108.11%)

Mutual labels: opencl, gpgpu, gpu-computing

Cekirdekler

Multi-device OpenCL kernel load balancer and pipeliner API for C#. Uses shared-distributed memory model to keep GPUs updated fast while using same kernel on all devices(for simplicity).

Stars: ✭ 76 (+105.41%)

Mutual labels: opencl, gpgpu, gpu-computing

gardenia

GARDENIA: Graph Analytics Repository for Designing Efficient Next-generation Accelerators

Stars: ✭ 22 (-40.54%)

Mutual labels: opencl, parallel-computing, gpu-computing

Spoc

Stream Processing with OCaml

Stars: ✭ 115 (+210.81%)

Mutual labels: opencl, gpgpu

Futhark

💥💻💥 A data-parallel functional programming language

Stars: ✭ 1,641 (+4335.14%)

Mutual labels: opencl, gpgpu

Babelstream

STREAM, for lots of devices written in many programming models

Stars: ✭ 121 (+227.03%)

Mutual labels: opencl, gpgpu

Hashcat

World's fastest and most advanced password recovery utility

Stars: ✭ 11,014 (+29667.57%)

Mutual labels: opencl, gpgpu

Clvk

Experimental implementation of OpenCL on Vulkan

Stars: ✭ 158 (+327.03%)

Mutual labels: opencl, gpu-computing

Clinfo

Print all known information about all available OpenCL platforms and devices in the system

Stars: ✭ 186 (+402.7%)

Mutual labels: opencl, gpgpu

Pine

🌲 Aimbot powered by real-time object detection with neural networks, GPU accelerated with Nvidia. Optimized for use with CS:GO.

Stars: ✭ 202 (+445.95%)

Mutual labels: opencl, nvidia

View All Similar Projects ➔

Awesome GPGPU

This is a curated list of of examples of using GPU in general-purpose computings, libraries and papers.

Examples

CUDA

Linear algebra

Vector addition - Simplest fast one-dimensional vectors addition [CUDA]
Sum of elements in an array - Parallel sum of elements in an array [CUDA]
cuBlas SAXPY - Implementation of SAXPY with cuBlas [CUDA]

Image processing

2D convolution - Naïve implementation of 2D convolution [CUDA]
Median filter - Median filter with arbitrary size kernel [CUDA]
Sobel edge-detection filter - Parallel implementation of Sobel Operator which is used in image processing [CUDA]

Clustering

K Means clustering - Fast Floyd K Means on GPU. Shared memory and two-step reduction (partial and global) are used to implement finding cluster centers [CUDA]
Fuzzy C Means clustering - Fuzzy C Means. Shared memory and two-step reduction (partial and global) are used to implement finding cluster centers [CUDA]

Simulation

Calculating PI with Monte Carlo method - Find PI with Monte Carlo method [CPU | CUDA]

Libraries

CUDA is a parallel computing platform and programming model developed by NVIDIA for general computing on graphical processing units (GPUs).
Thrust is a powerful library of parallel algorithms and data structures. Thrust provides a flexible, high-level interface for GPU programming that greatly enhances developer productivity. Using Thrust, C++ developers can write just a few lines of code to perform GPU-accelerated sort, scan, transform, and reduction operations orders of magnitude faster than the latest multi-core CPUs. For example, the thrust::sort algorithm delivers 5x to 100x faster sorting performance than STL and TBB.
OpenCL is the open, royalty-free standard for cross-platform, parallel programming of diverse processors found in personal computers, servers, mobile devices and embedded platforms. OpenCL greatly improves the speed and responsiveness of a wide spectrum of applications in numerous market categories including gaming and entertainment titles, scientific and medical software, professional creative tools, vision processing, and neural network training and inferencing.
Boost.Compute is a GPU/parallel-computing library for C++ based on OpenCL. The core library is a thin C++ wrapper over the OpenCL API and provides access to compute devices, contexts, command queues and memory buffers. On top of the core library is a generic, STL-like interface providing common algorithms (e.g. transform(), accumulate(), sort()) along with common containers (e.g. vector, flat_set). It also features a number of extensions including parallel-computing algorithms (e.g. exclusive_scan(), scatter(), reduce()) and a number of fancy iterators (e.g. transform_iterator<>, permutation_iterator<>, zip_iterator<>).
PyCUDA lets you access Nvidia‘s CUDA parallel computation API from Python. Several wrappers of the CUDA API already exist–so what’s so special about PyCUDA?
PyOpenCL gives you easy, Pythonic access to the OpenCL parallel computation API.
OpenACC is a user-driven directive-based performance-portable parallel programming model designed for scientists and engineers interested in porting their codes to a wide-variety of heterogeneous HPC hardware platforms and architectures with significantly less programming effort than required with a low-level model.
Hemi simplifies writing portable CUDA C/C++ code. With Hemi, you can write parallel kernels like you write for loops in line in your CPU code and run them on your GPUю
CUDPP is the CUDA Data Parallel Primitives Library. CUDPP is a library of data-parallel algorithm primitives such as parallel-prefix-sum ("scan"), parallel sort and parallel reduction. Primitives such as these are important building blocks for a wide variety of data-parallel algorithms, including sorting, stream compaction, and building data structures such as trees and summed-area tables.

Other awesome lists and repositories

Awesome CUDA by Erkaman is a list of useful libraries and resources for CUDA development
CUDA Awesome by gmarciani is a collection of awesome algorithms, implemented in CUDA

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

rbaygildin / learn-gpgpu

Programming Languages

Labels

Projects that are alternatives of or similar to learn-gpgpu

Awesome GPGPU

Examples

CUDA

Linear algebra

Image processing

Clustering

Simulation

Libraries

Other awesome lists and repositories