All Projects → JuliaFolds → FoldsCUDA.jl

JuliaFolds / FoldsCUDA.jl

Licence: MIT license
Data-parallelism on CUDA using Transducers.jl and for loops (FLoops.jl)

Programming Languages

julia
2034 projects
Makefile
30231 projects

Projects that are alternatives of or similar to FoldsCUDA.jl

data-parallelism
juliafolds.github.io/data-parallelism/
Stars: ✭ 22 (-54.17%)
Mutual labels:  high-performance, parallel, map-reduce, iterators, transducers
hack parallel
The core parallel and shared memory library used by Hack, Flow, and Pyre
Stars: ✭ 39 (-18.75%)
Mutual labels:  parallel, map-reduce
MultiHttp
This is a high performance , very useful multi-curl tool written in php. 一个超级好用的并发CURL工具!!!(httpful,restful, concurrency)
Stars: ✭ 79 (+64.58%)
Mutual labels:  high-performance, parallel
Corium
Corium is a modern scripting language which combines simple, safe and efficient programming.
Stars: ✭ 18 (-62.5%)
Mutual labels:  high-performance, parallel
Transducers.jl
Efficient transducers for Julia
Stars: ✭ 226 (+370.83%)
Mutual labels:  high-performance, parallel
Highs
Linear optimization software
Stars: ✭ 107 (+122.92%)
Mutual labels:  high-performance, parallel
hatrack
Fast, multi-reader, multi-writer, lockless data structures for parallel programming
Stars: ✭ 55 (+14.58%)
Mutual labels:  high-performance, parallel
Floops.jl
Fast sequential, threaded, and distributed for-loops for Julia—fold for humans™
Stars: ✭ 96 (+100%)
Mutual labels:  high-performance, parallel
Threadsx.jl
Parallelized Base functions
Stars: ✭ 126 (+162.5%)
Mutual labels:  high-performance, parallel
streaming
Fast, safe and composable streaming abstractions.
Stars: ✭ 104 (+116.67%)
Mutual labels:  iterators, transducers
quickstep
Quickstep project
Stars: ✭ 22 (-54.17%)
Mutual labels:  high-performance
visual-heatmap
Open source javascript module for high performance, large scale heatmap rendering.
Stars: ✭ 21 (-56.25%)
Mutual labels:  high-performance
optuna-examples
Examples for https://github.com/optuna/optuna
Stars: ✭ 238 (+395.83%)
Mutual labels:  parallel
BeLibnids
It is a platform to use multiprocess to combine dpdk and libnids together to support analyse packets in 10G port.
Stars: ✭ 36 (-25%)
Mutual labels:  high-performance
netty-in-action-cn
Netty In Action 中文版
Stars: ✭ 1,389 (+2793.75%)
Mutual labels:  high-performance
dbaTDPMon
dbaTDPMon - Troubleshoot Database Performance and Monitoring
Stars: ✭ 20 (-58.33%)
Mutual labels:  parallel
workerman
An asynchronous event driven PHP socket framework. Supports HTTP, Websocket, SSL and other custom protocols. PHP>=5.4.
Stars: ✭ 10,005 (+20743.75%)
Mutual labels:  high-performance
ProtoPromise
Robust and efficient library for management of asynchronous operations in C#/.Net.
Stars: ✭ 20 (-58.33%)
Mutual labels:  parallel
cephgeorep
An efficient unidirectional remote backup daemon for CephFS.
Stars: ✭ 27 (-43.75%)
Mutual labels:  parallel
parallel
Lwt-enabled distributed computing library
Stars: ✭ 36 (-25%)
Mutual labels:  parallel

FoldsCUDA

Dev Buildkite status Run tests w/o GPU

FoldsCUDA.jl provides Transducers.jl-compatible fold (reduce) implemented using CUDA.jl. This brings the transducers and reducing function combinators implemented in Transducers.jl to GPU. Furthermore, using FLoops.jl, you can write parallel for loops that run on GPU.

API

FoldsCUDA exports CUDAEx, a parallel loop executor. It can be used with the parallel for loop created with FLoops.@floop, Base-like high-level parallel API in Folds.jl, and extensible transducers provided by Transducers.jl.

Examples

findmax using FLoops.jl

You can pass CUDA executor FoldsCUDA.CUDAEx() to @floop to run a parallel for loop on GPU:

julia> using FoldsCUDA, CUDA, FLoops

julia> using GPUArrays: @allowscalar

julia> xs = CUDA.rand(10^8);

julia> @allowscalar xs[100] = 2;

julia> @allowscalar xs[200] = 2;

julia> @floop CUDAEx() for (x, i) in zip(xs, eachindex(xs))
           @reduce() do (imax = -1; i), (xmax = -Inf32; x)
               if xmax < x
                   xmax = x
                   imax = i
               end
           end
       end

julia> xmax
2.0f0

julia> imax  # the *first* position for the largest value
100

extrema using Transducers.TeeRF

julia> using Transducers, Folds

julia> @allowscalar xs[300] = -0.5;

julia> Folds.reduce(TeeRF(min, max), xs, CUDAEx())
(-0.5f0, 2.0f0)

julia> Folds.reduce(TeeRF(min, max), (2x for x in xs), CUDAEx())  # iterator comprehension works
(-1.0f0, 4.0f0)

julia> Folds.reduce(TeeRF(min, max), Map(x -> 2x)(xs), CUDAEx())  # equivalent, using a transducer
(-1.0f0, 4.0f0)

More examples

For more examples, see the examples section in the documentation.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].