All Projects → rxwei → cuda-swift

rxwei / cuda-swift

Licence: MIT license
Parallel Computing Library for Linux and macOS & NVIDIA CUDA Wrapper

Programming Languages

swift
15916 projects
Makefile
30231 projects

Projects that are alternatives of or similar to cuda-swift

Js
turbo.js - perform massive parallel computations in your browser with GPGPU.
Stars: ✭ 2,591 (+3179.75%)
Mutual labels:  parallel
ips2ra
In-place Parallel Super Scalar Radix Sort (IPS²Ra)
Stars: ✭ 22 (-72.15%)
Mutual labels:  parallel
await
28Kb, small memory footprint, single binary that run list of commands in parallel and waits for their termination
Stars: ✭ 73 (-7.59%)
Mutual labels:  parallel
Qawolf
🐺 Create browser tests 10x faster
Stars: ✭ 2,912 (+3586.08%)
Mutual labels:  parallel
Ray
An open source framework that provides a simple, universal API for building distributed applications. Ray is packaged with RLlib, a scalable reinforcement learning library, and Tune, a scalable hyperparameter tuning library.
Stars: ✭ 18,547 (+23377.22%)
Mutual labels:  parallel
snmpman
Easy massive SNMP-agent simulation with the use of simple YAML files
Stars: ✭ 28 (-64.56%)
Mutual labels:  parallel
Deepgraph
Analyze Data with Pandas-based Networks. Documentation:
Stars: ✭ 232 (+193.67%)
Mutual labels:  parallel
ParMmg
Distributed parallelization of 3D volume mesh adaptation
Stars: ✭ 19 (-75.95%)
Mutual labels:  parallel
IPpy
🚀 Ping IP addresses and domains in parallel to find the accessible and inaccessible ones.
Stars: ✭ 54 (-31.65%)
Mutual labels:  parallel
Java-AgentSpeak
LightJason - AgentSpeak(L++) for Java
Stars: ✭ 21 (-73.42%)
Mutual labels:  parallel
Pomegranate
Fast, flexible and easy to use probabilistic modelling in Python.
Stars: ✭ 2,789 (+3430.38%)
Mutual labels:  parallel
Marathon
Cross-platform test runner written for Android and iOS projects
Stars: ✭ 250 (+216.46%)
Mutual labels:  parallel
hp2p
Heavy Peer To Peer: a MPI based benchmark for network diagnostic
Stars: ✭ 17 (-78.48%)
Mutual labels:  parallel
Rls
Reinforcement Learning Algorithms Based on TensorFlow 2.x
Stars: ✭ 239 (+202.53%)
Mutual labels:  parallel
WAND-PIC
WAND-PIC
Stars: ✭ 20 (-74.68%)
Mutual labels:  parallel
Ocaramba
C# Framework to automate tests using Selenium WebDriver
Stars: ✭ 234 (+196.2%)
Mutual labels:  parallel
MatlabProgressBar
This MATLAB class provides a smart progress bar like tqdm in the command window and is optimized for progress information in simple iterations or large frameworks with full support of parallel parfor loops provided by the MATLAB Parallel Computing Toolbox.
Stars: ✭ 44 (-44.3%)
Mutual labels:  parallel
pooljs
Browser computing unleashed!
Stars: ✭ 17 (-78.48%)
Mutual labels:  parallel
nemesyst
Generalised and highly customisable, hybrid-parallelism, database based, deep learning framework.
Stars: ✭ 17 (-78.48%)
Mutual labels:  parallel
parallel-event-emitter
Parallel event emitter built on futures-rs
Stars: ✭ 29 (-63.29%)
Mutual labels:  parallel

cuda-swift

This project provides a native Swift interface to CUDA with the following modules:

  • CUDA Driver API import CUDADriver
  • CUDA Runtime API import CUDARuntime
  • NVRTC - CUDA Runtime Compiler import NVRTC
  • cuBLAS - CUDA Basic Linear Algebra Subprograms import CuBLAS
  • Warp - GPU Acceleration Library import Warp (Thrust counterpart)

Any machine with CUDA 7.0+ and a CUDA-capable GPU is supported. Xcode Playground is supported as well. Please refer to Usage and Components.

Quick look

Value types

CUDA Driver, Runtime, cuBLAS, and NVRTC (real-time compiler) are wrapped in native Swift types. Warp provides higher level value types, DeviceArray and DeviceValue, with copy-on-write semantics.

import Warp

/// Initialize two arrays on device
var x: DeviceArray<Float> = [1.0, 2.0, 3.0, 4.0, 5.0]
let y: DeviceArray<Float> = [1.0, 2.0, 3.0, 4.0, 5.0]

/// Scalar map operations
x.incrementElements(by: 2) // x => [2.0, 3.0, 4.0, 5.0, 6.0] on device
x.multiplyElements(by: 2) // x => [2.0, 4.0, 6.0, 8.0, 10.0] on device

/// Addition
x.formElementwise(.addition, with: y) // x => [3.0, 6.0, 9.0, 12.0, 15.0] on device

/// Dot product
x  y // => 165.0

/// Sum
x.sum() // => 15

/// Absolute sum
x.sumOfAbsoluteValues() // => 15

/// Transform by 1-place math functions
x.transform(by: .sin)
x.transform(by: .tanh)
x.transform(by: .ceil)

/// Elementwise operation
x.formElementwise(.addition, with: y)
x.formElementwise(.subtraction, with: y)
x.formElementwise(.multiplication, with: y)
x.formElementwise(.division, with: y)

/// Fill with the same value
var z = y
z.fill(with: 10.0)

/// Composite assignment
x.assign(from: .subtraction, left: y, multipliedBy: 100.0, right: z)

Real-time compilation

Compile source string to PTX

import NVRTC
import CUDADriver
import Warp

let source: String =
  + "extern \"C\" __global__ void saxpy(float a, float *x, float *y, float *out, int n) {"
  + "    size_t tid = blockIdx.x * blockDim.x + threadIdx.x;"
  + "    if (tid < n) out[tid] = a * x[tid] + y[tid];"
  + "}";
let ptx = try Compiler.compile(source)

JIT-compile and load PTX using Driver API within a device context

try Device.main.withContext { context in
    let module = try Module(ptx: ptx)
    let function = module.function(named: "saxpy")!
    
    let x: DeviceArray<Float> = [1, 2, 3, 4, 5, 6, 7, 8]
    let y: DeviceArray<Float> = [2, 3, 4, 5, 6, 7, 8, 9]
    var result = DeviceArray<Float>(capacity: 8)

    try function<<<(1, 8)>>>[.float(1.0), .constPointer(to: x), .constPointer(to: y), .pointer(to: &result), .int(8)]
    /// result => [3, 5, 7, 9, 11, 13, 15, 17] on device
}

Package Information

Add a dependency:

.Package(url: "https://github.com/rxwei/cuda-swift", majorVersion: 1)

You may use the Makefile in this repository for you own project. No extra path configuration is needed.

Otherwise, specify the path to your CUDA headers and library at swift build.

macOS

swift build -Xcc -I/usr/local/cuda/include -Xlinker -L/usr/local/cuda/lib

Linux

swift build -Xcc -I/usr/local/cuda/include -Xlinker -L/usr/local/cuda/lib64

Components

Core

  • CUDADriver - CUDA Driver API
    • Context
    • Device
    • Function
    • PTX
    • Module
    • Stream
    • Unsafe(Mutable)DevicePointer<T>
    • DriverError (all error codes from CUDA C API)
  • CUDARuntime - CUDA Runtime API
    • Unsafe(Mutable)DevicePointer<T>
    • Device
    • Stream
    • RuntimeError (all error codes from CUDA C API)
  • NVRTC - CUDA Runtime Compiler
    • Compiler
  • CuBLAS - GPU Basic Linear Algebra Subprograms (in-progress)
    • Level 1 BLAS operations
    • Level 2 BLAS operations (GEMV)
    • Level 3 BLAS operations (GEMM)
  • Warp - GPU Acceleration Library (Thrust counterpart)
    • DeviceArray<T> (generic array in device memory)
    • DeviceValue<T> (generic value in device memory)
    • Acclerated vector operations
    • Type-safe kernel argument helpers

Optional

  • Swift Playground
    • CUDADriver works in the playground. But other modules cause the "couldn't lookup symbols" problem for which we don't have a solution until Xcode is fixed.
    • To use the playground, open the Xcode workspace file, and add a library for every modulemap under Frameworks.

Dependencies

License

MIT License

CUDA is a registered trademark of NVIDIA Corporation.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].