All Projects → rbaygildin → learn-gpgpu

rbaygildin / learn-gpgpu

Licence: other
Algorithms implemented in CUDA + resources about GPGPU

Programming Languages

Cuda
1817 projects
Jupyter Notebook
11667 projects
c
50402 projects - #5 most used programming language
C++
36643 projects - #6 most used programming language
Makefile
30231 projects

Projects that are alternatives of or similar to learn-gpgpu

Parenchyma
An extensible HPC framework for CUDA, OpenCL and native CPU.
Stars: ✭ 71 (+91.89%)
Mutual labels:  opencl, parallel-computing, nvidia, gpgpu
CUDAfy.NET
CUDAfy .NET allows easy development of high performance GPGPU applications completely from the .NET. It's developed in C#.
Stars: ✭ 56 (+51.35%)
Mutual labels:  opencl, nvidia, gpgpu, gpu-computing
Arraymancer
A fast, ergonomic and portable tensor library in Nim with a deep learning focus for CPU, GPU and embedded devices via OpenMP, Cuda and OpenCL backends
Stars: ✭ 793 (+2043.24%)
Mutual labels:  opencl, parallel-computing, gpgpu, gpu-computing
Cuda Api Wrappers
Thin C++-flavored wrappers for the CUDA Runtime API
Stars: ✭ 362 (+878.38%)
Mutual labels:  nvidia, gpgpu, gpu-computing
Fast
A framework for GPU based high-performance medical image processing and visualization
Stars: ✭ 179 (+383.78%)
Mutual labels:  opencl, parallel-computing, gpu-computing
Ilgpu
ILGPU JIT Compiler for high-performance .Net GPU programs
Stars: ✭ 374 (+910.81%)
Mutual labels:  opencl, nvidia, gpgpu
Hipsycl
Implementation of SYCL for CPUs, AMD GPUs, NVIDIA GPUs
Stars: ✭ 377 (+918.92%)
Mutual labels:  opencl, gpgpu, gpu-computing
Clojurecl
ClojureCL is a Clojure library for parallel computations with OpenCL.
Stars: ✭ 266 (+618.92%)
Mutual labels:  opencl, nvidia, gpu-computing
Pyopencl
OpenCL integration for Python, plus shiny features
Stars: ✭ 790 (+2035.14%)
Mutual labels:  opencl, parallel-computing, nvidia
Neanderthal
Fast Clojure Matrix Library
Stars: ✭ 927 (+2405.41%)
Mutual labels:  opencl, gpgpu, gpu-computing
gpuowl
GPU Mersenne primality test.
Stars: ✭ 77 (+108.11%)
Mutual labels:  opencl, gpgpu, gpu-computing
Cekirdekler
Multi-device OpenCL kernel load balancer and pipeliner API for C#. Uses shared-distributed memory model to keep GPUs updated fast while using same kernel on all devices(for simplicity).
Stars: ✭ 76 (+105.41%)
Mutual labels:  opencl, gpgpu, gpu-computing
gardenia
GARDENIA: Graph Analytics Repository for Designing Efficient Next-generation Accelerators
Stars: ✭ 22 (-40.54%)
Mutual labels:  opencl, parallel-computing, gpu-computing
Spoc
Stream Processing with OCaml
Stars: ✭ 115 (+210.81%)
Mutual labels:  opencl, gpgpu
Futhark
💥💻💥 A data-parallel functional programming language
Stars: ✭ 1,641 (+4335.14%)
Mutual labels:  opencl, gpgpu
Babelstream
STREAM, for lots of devices written in many programming models
Stars: ✭ 121 (+227.03%)
Mutual labels:  opencl, gpgpu
Hashcat
World's fastest and most advanced password recovery utility
Stars: ✭ 11,014 (+29667.57%)
Mutual labels:  opencl, gpgpu
Clvk
Experimental implementation of OpenCL on Vulkan
Stars: ✭ 158 (+327.03%)
Mutual labels:  opencl, gpu-computing
Clinfo
Print all known information about all available OpenCL platforms and devices in the system
Stars: ✭ 186 (+402.7%)
Mutual labels:  opencl, gpgpu
Pine
🌲 Aimbot powered by real-time object detection with neural networks, GPU accelerated with Nvidia. Optimized for use with CS:GO.
Stars: ✭ 202 (+445.95%)
Mutual labels:  opencl, nvidia

Awesome GPGPU

This is a curated list of of examples of using GPU in general-purpose computings, libraries and papers.

Examples

CUDA

Linear algebra

Image processing

Clustering

  • K Means clustering - Fast Floyd K Means on GPU. Shared memory and two-step reduction (partial and global) are used to implement finding cluster centers [CUDA]

  • Fuzzy C Means clustering - Fuzzy C Means. Shared memory and two-step reduction (partial and global) are used to implement finding cluster centers [CUDA]

Simulation

Libraries

  • CUDA is a parallel computing platform and programming model developed by NVIDIA for general computing on graphical processing units (GPUs).

  • Thrust is a powerful library of parallel algorithms and data structures. Thrust provides a flexible, high-level interface for GPU programming that greatly enhances developer productivity. Using Thrust, C++ developers can write just a few lines of code to perform GPU-accelerated sort, scan, transform, and reduction operations orders of magnitude faster than the latest multi-core CPUs. For example, the thrust::sort algorithm delivers 5x to 100x faster sorting performance than STL and TBB.

  • OpenCL is the open, royalty-free standard for cross-platform, parallel programming of diverse processors found in personal computers, servers, mobile devices and embedded platforms. OpenCL greatly improves the speed and responsiveness of a wide spectrum of applications in numerous market categories including gaming and entertainment titles, scientific and medical software, professional creative tools, vision processing, and neural network training and inferencing.

  • Boost.Compute is a GPU/parallel-computing library for C++ based on OpenCL. The core library is a thin C++ wrapper over the OpenCL API and provides access to compute devices, contexts, command queues and memory buffers. On top of the core library is a generic, STL-like interface providing common algorithms (e.g. transform(), accumulate(), sort()) along with common containers (e.g. vector, flat_set). It also features a number of extensions including parallel-computing algorithms (e.g. exclusive_scan(), scatter(), reduce()) and a number of fancy iterators (e.g. transform_iterator<>, permutation_iterator<>, zip_iterator<>).

  • PyCUDA lets you access Nvidia‘s CUDA parallel computation API from Python. Several wrappers of the CUDA API already exist–so what’s so special about PyCUDA?

  • PyOpenCL gives you easy, Pythonic access to the OpenCL parallel computation API.

  • OpenACC is a user-driven directive-based performance-portable parallel programming model designed for scientists and engineers interested in porting their codes to a wide-variety of heterogeneous HPC hardware platforms and architectures with significantly less programming effort than required with a low-level model.

  • Hemi simplifies writing portable CUDA C/C++ code. With Hemi, you can write parallel kernels like you write for loops in line in your CPU code and run them on your GPUю

  • CUDPP is the CUDA Data Parallel Primitives Library. CUDPP is a library of data-parallel algorithm primitives such as parallel-prefix-sum ("scan"), parallel sort and parallel reduction. Primitives such as these are important building blocks for a wide variety of data-parallel algorithms, including sorting, stream compaction, and building data structures such as trees and summed-area tables.

Other awesome lists and repositories

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].