Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → tugrul512bit → Cekirdekler

tugrul512bit / Cekirdekler

Licence: gpl-3.0

Multi-device OpenCL kernel load balancer and pipeliner API for C#. Uses shared-distributed memory model to keep GPUs updated fast while using same kernel on all devices(for simplicity).

Labels

gpu dynamic opencl gpgpu load-balancer gpu-computing pool parallelism gpu-acceleration batch-processing

Projects that are alternatives of or similar to Cekirdekler

Hipsycl

Implementation of SYCL for CPUs, AMD GPUs, NVIDIA GPUs

Stars: ✭ 377 (+396.05%)

Mutual labels: gpu, gpgpu, opencl, gpu-computing

Vuh

Vulkan compute for people

Stars: ✭ 264 (+247.37%)

Mutual labels: gpu, gpgpu, gpu-acceleration, gpu-computing

Neanderthal

Fast Clojure Matrix Library

Stars: ✭ 927 (+1119.74%)

Mutual labels: gpu, gpgpu, opencl, gpu-computing

Emu

The write-once-run-anywhere GPGPU library for Rust

Stars: ✭ 1,350 (+1676.32%)

Mutual labels: gpu, gpgpu, gpu-acceleration, gpu-computing

Bayadera

High-performance Bayesian Data Analysis on the GPU in Clojure

Stars: ✭ 342 (+350%)

Mutual labels: gpu, opencl, gpu-acceleration, gpu-computing

Stdgpu

stdgpu: Efficient STL-like Data Structures on the GPU

Stars: ✭ 531 (+598.68%)

Mutual labels: gpu, gpgpu, gpu-acceleration, gpu-computing

Parenchyma

An extensible HPC framework for CUDA, OpenCL and native CPU.

Stars: ✭ 71 (-6.58%)

Mutual labels: gpu, gpgpu, opencl

Aparapi

The New Official Aparapi: a framework for executing native Java and Scala code on the GPU.

Stars: ✭ 352 (+363.16%)

Mutual labels: gpu, gpgpu, opencl

Arrayfire Python

Python bindings for ArrayFire: A general purpose GPU library.

Stars: ✭ 358 (+371.05%)

Mutual labels: gpu, gpgpu, opencl

Ilgpu

ILGPU JIT Compiler for high-performance .Net GPU programs

Stars: ✭ 374 (+392.11%)

Mutual labels: gpu, gpgpu, opencl

Cuda Api Wrappers

Thin C++-flavored wrappers for the CUDA Runtime API

Stars: ✭ 362 (+376.32%)

Mutual labels: gpu, gpgpu, gpu-computing

Arraymancer

A fast, ergonomic and portable tensor library in Nim with a deep learning focus for CPU, GPU and embedded devices via OpenMP, Cuda and OpenCL backends

Stars: ✭ 793 (+943.42%)

Mutual labels: gpgpu, opencl, gpu-computing

Arrayfire

ArrayFire: a general purpose GPU library.

Stars: ✭ 3,693 (+4759.21%)

Mutual labels: gpu, gpgpu, opencl

Webclgl

GPGPU Javascript library 🐸

Stars: ✭ 313 (+311.84%)

Mutual labels: gpu, gpgpu, gpu-computing

Heteroflow

Concurrent CPU-GPU Programming using Task Models

Stars: ✭ 57 (-25%)

Mutual labels: gpu, gpu-acceleration, gpu-computing

John

John the Ripper jumbo - advanced offline password cracker, which supports hundreds of hash and cipher types, and runs on many operating systems, CPUs, GPUs, and even some FPGAs

Stars: ✭ 5,656 (+7342.11%)

Mutual labels: gpu, gpgpu, opencl

Arrayfire Rust

Rust wrapper for ArrayFire

Stars: ✭ 525 (+590.79%)

Mutual labels: gpu, gpgpu, opencl

MatX

An efficient C++17 GPU numerical computing library with Python-like syntax

Stars: ✭ 418 (+450%)

Mutual labels: gpu, gpgpu, gpu-computing

rindow-neuralnetworks

Neural networks library for machine learning on PHP

Stars: ✭ 37 (-51.32%)

Mutual labels: gpu, opencl, gpgpu

Bitcracker

BitCracker is the first open source password cracking tool for memory units encrypted with BitLocker

Stars: ✭ 463 (+509.21%)

Mutual labels: gpu, gpgpu, opencl

View All Similar Projects ➔

Cekirdekler

C# Multi-device GPGPU(OpenCL) compute API with an iterative interdevice-loadbalancing feature using multiple pipelining on read/write/compute operations for developers' custom opencl kernels. Main idea is to treat N devices as a single device when possible, taking advantage of entire platform, easily, through shared-distributed memory model under the hood.

64-bit only. "project settings -> build -> platform target -> x64" Also configuration manager needs to look like this:

Needs extra C++ dll built in 64-bit(x86_64) from https://github.com/tugrul512bit/CekirdeklerCPP which must be named KutuphaneCL.dll

The other needed dll is Microsoft's System.Threading.dll and its xml helper for .Net 2.0 - or - you can adjust "using" and use .Net 3.5+ for your own project and don't need System.Threading.dll.

In total, Cekirdekler.dll and KutuphaneCL.dll and using .Net 3.5 should be enough.

Usage: add only Cekirdekler.dll and system.threading.dll as references to your C# projects. Other files needs to exist in same folder with Cekirdekler.dll or the executable of main project.

This project is being enhanced using ZenHub:

Features

Implicit multi device control: from CPUs to any number of GPUs and ACCelerators. Explicit in library-side for compatibility and performance, implicit for client-coder for the ease of GPGPU to concentrate on opencl kernel code. Selection of devices can be done implicitly or explicitly to achieve ease-of-setup or detailed device query. Handling(computing things) of devices are implicit, selection can be both implicit or explicit. Explicitly chosen multiple devices can be added together with a simple + operator.
Iterative load balancing between devices: uniquely done for each different compute(explicit control with user-given compute-id). Multiple devices get more and more fair work loads until the ratio of work distribution converges to some point. Partitionig workload completes a kernel with less latency which is applicable for hot-spot loops and some simple embarrassingly-parallel algorithms. Even better for streaming data with pipelining option enabled.
Pipelining for reads, computes and writes(host - device link): either by the mercy of device drivers or explicit event-based queue management. Hides the latency of least time consuming part(such as writes) behind the most time consuming part(such as compute). GPUs can run buffer copies and opencl kernels concurrently.
Pipelining between devices(device - host - device): Concurrently run multiple stages to overlap them in timeline and gain advantage of multiple GPUs(and FPGAa, CPUs) for even non-separable(because of atomics and low-level optimizations) kernels of a time-consuming pipeline. Each device runs a different kernel but at the same time with other devices and uses double buffers to overlap even data movements between pipeline stages.
Batch computing using task pools and device pools: Use every async pipeline of every gpu in system, for a pool of non-separable kernels(as tasks to compute later). Uses greedy scheduling algorithm to keep all GPUs busy.
Working with different numeric arrays: Either C#-arrays like float[], int[], byte[],... or C++-array wrappers like ClFloatArray, ClArray<float>, ClByteArray, ClArray<byte>
Automatic buffer copy optimizations for devices: If a device shares RAM with CPU, it uses map/unmap commands to reduce number of array copies(instead of read/write). If also that device is given a C++ wrapper array(such as ClArray<float>), it also uses cl_use_host_ptr flag on buffer for a zero-copy access aka" streaming". By default, all devices have their own buffers.
Two different usage types: First one lets the developer choose all kernel parameters as arrays more explicitly for a more explicitly readable execution, second one creates same thing using a much shorter definition to complete in less code lines and change only the necessary flags instead of all.
Automatic resource dispose: When C++ array wrappers are finalized(out-of-scope, garbage collected), they release resources. Also dispose method can be called explicitly by developer.
Uses OpenCL 1.2: C++ bindings from Khronos.org for its base. Developers are expected to know C99 and its OpenCL kernel constraints to write their own genuine GPGPU kernels. CekirdeklerCPP project produces OpenCL 1.2 backend dll file.
Uses OpenCL 2.0: C++ bindings from Khronos.org for its base. Developers are expected to know C99 and its OpenCL kernel constraints to write their own genuine GPGPU kernels. CekirdeklerCPP2 project produces OpenCL 2.0 backend dll file.(needs to be renamed to KutuphaneCL.dll)

Documentation

You can see details and tutorial here in Cekirdekler-wiki

Known Issues

For C++ array wrappers like Array<float> there is no out-of-bounds-check, don't cross boundaries when accessing array indexing.
Don't use C++ array wrappers after they are disposed. These features are not added to speed-up array indexing.
Don't use ClNumberCruncher or Core instances after they are disposed.
Pay attention to "number of array elements used" per workitem in kernel and how they are given as parameters from API compute() method.
Pay attenton to "partial read"/"read"/"write" array copy modifiers when your kernel is altering(or reading) whole array or just a part of it.
No performance output at first iteration. Load balancer needs at least several iterations to distribute fairly and performance report needs at least 2 iterations for console output.

Example that computes 1000 workitems accross all GPUs in a PC: GPU1 computes global id range from 0 to M, GPU2 computes from M+1 to K and GPU_N computes for global id range of Y to Z

        Cekirdekler.ClNumberCruncher cr = new Cekirdekler.ClNumberCruncher(
            Cekirdekler.AcceleratorType.GPU, @"
                __kernel void hello(__global char * arr)
                {
                    printf(""hello world"");
                }
            ");

        Cekirdekler.ClArrays.ClArray<byte> array = new Cekirdekler.ClArrays.ClArray<byte>(1000);
        // Cekirdekler.ClArrays.ClArray<byte> array = new byte[1000]; // host arrays are usable too!
        array.compute(cr, 1, "hello", 1000, 100); 
        // local id range is 100 here. so this example spawns 10x workgroups and all GPUs share them like GPU1 computes 2 groups,
        // GPU2 computes 5 groups and another GPU computes 3 groups. Global id values are continuous through all global workitems,
        // local id values are also safe to use. 
        // faster GPUs get more work share over iterations. Performance aware over repeatations of a work.
        
        // no need to dispose anything at the end. they do it themselves when out of scope or gc.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 76

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (20) 🔗