Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → JuliaFolds → FoldsCUDA.jl

JuliaFolds / FoldsCUDA.jl

Licence: MIT license

Data-parallelism on CUDA using Transducers.jl and for loops (FLoops.jl)

Programming Languages

2034 projects

30231 projects

Labels

gpu high-performance parallel cuda map-reduce iterators transducers

Projects that are alternatives of or similar to FoldsCUDA.jl

data-parallelism

juliafolds.github.io/data-parallelism/

Stars: ✭ 22 (-54.17%)

Mutual labels: high-performance, parallel, map-reduce, iterators, transducers

The core parallel and shared memory library used by Hack, Flow, and Pyre

Stars: ✭ 39 (-18.75%)

Mutual labels: parallel, map-reduce

This is a high performance , very useful multi-curl tool written in php. 一个超级好用的并发CURL工具！！！(httpful，restful, concurrency)

Stars: ✭ 79 (+64.58%)

Mutual labels: high-performance, parallel

Corium is a modern scripting language which combines simple, safe and efficient programming.

Stars: ✭ 18 (-62.5%)

Mutual labels: high-performance, parallel

Efficient transducers for Julia

Stars: ✭ 226 (+370.83%)

Mutual labels: high-performance, parallel

Linear optimization software

Stars: ✭ 107 (+122.92%)

Mutual labels: high-performance, parallel

Fast, multi-reader, multi-writer, lockless data structures for parallel programming

Stars: ✭ 55 (+14.58%)

Mutual labels: high-performance, parallel

Fast sequential, threaded, and distributed for-loops for Julia—fold for humans™

Stars: ✭ 96 (+100%)

Mutual labels: high-performance, parallel

Parallelized Base functions

Stars: ✭ 126 (+162.5%)

Mutual labels: high-performance, parallel

Fast, safe and composable streaming abstractions.

Stars: ✭ 104 (+116.67%)

Mutual labels: iterators, transducers

Quickstep project

Stars: ✭ 22 (-54.17%)

Mutual labels: high-performance

Open source javascript module for high performance, large scale heatmap rendering.

Stars: ✭ 21 (-56.25%)

Mutual labels: high-performance

optuna-examples

Examples for https://github.com/optuna/optuna

Stars: ✭ 238 (+395.83%)

Mutual labels: parallel

It is a platform to use multiprocess to combine dpdk and libnids together to support analyse packets in 10G port.

Stars: ✭ 36 (-25%)

Mutual labels: high-performance

netty-in-action-cn

Netty In Action 中文版

Stars: ✭ 1,389 (+2793.75%)

Mutual labels: high-performance

dbaTDPMon - Troubleshoot Database Performance and Monitoring

Stars: ✭ 20 (-58.33%)

Mutual labels: parallel

An asynchronous event driven PHP socket framework. Supports HTTP, Websocket, SSL and other custom protocols. PHP>=5.4.

Stars: ✭ 10,005 (+20743.75%)

Mutual labels: high-performance

Robust and efficient library for management of asynchronous operations in C#/.Net.

Stars: ✭ 20 (-58.33%)

Mutual labels: parallel

An efficient unidirectional remote backup daemon for CephFS.

Stars: ✭ 27 (-43.75%)

Mutual labels: parallel

Lwt-enabled distributed computing library

Stars: ✭ 36 (-25%)

Mutual labels: parallel

View All Similar Projects ➔

FoldsCUDA

FoldsCUDA.jl provides Transducers.jl-compatible fold (reduce) implemented using CUDA.jl. This brings the transducers and reducing function combinators implemented in Transducers.jl to GPU. Furthermore, using FLoops.jl, you can write parallel for loops that run on GPU.

API

FoldsCUDA exports CUDAEx, a parallel loop executor. It can be used with the parallel for loop created with FLoops.@floop, Base-like high-level parallel API in Folds.jl, and extensible transducers provided by Transducers.jl.

Examples

`findmax` using FLoops.jl

You can pass CUDA executor FoldsCUDA.CUDAEx() to @floop to run a parallel for loop on GPU:

julia> using FoldsCUDA, CUDA, FLoops

julia> using GPUArrays: @allowscalar

julia> xs = CUDA.rand(10^8);

julia> @allowscalar xs[100] = 2;

julia> @allowscalar xs[200] = 2;

julia> @floop CUDAEx() for (x, i) in zip(xs, eachindex(xs))
           @reduce() do (imax = -1; i), (xmax = -Inf32; x)
               if xmax < x
                   xmax = x
                   imax = i
               end
           end
       end

julia> xmax
2.0f0

julia> imax  # the *first* position for the largest value
100

`extrema` using `Transducers.TeeRF`

julia> using Transducers, Folds

julia> @allowscalar xs[300] = -0.5;

julia> Folds.reduce(TeeRF(min, max), xs, CUDAEx())
(-0.5f0, 2.0f0)

julia> Folds.reduce(TeeRF(min, max), (2x for x in xs), CUDAEx())  # iterator comprehension works
(-1.0f0, 4.0f0)

julia> Folds.reduce(TeeRF(min, max), Map(x -> 2x)(xs), CUDAEx())  # equivalent, using a transducer
(-1.0f0, 4.0f0)

More examples

For more examples, see the examples section in the documentation.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 48

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (3) 🔗