All Projects → NVIDIA → Multi Gpu Programming Models

NVIDIA / Multi Gpu Programming Models

Licence: other
Examples demonstrating available options to program multiple GPUs in a single node or a cluster

Labels

Projects that are alternatives of or similar to Multi Gpu Programming Models

Remotery
Single C file, Realtime CPU/GPU Profiler with Remote Web Viewer
Stars: ✭ 1,908 (+1056.36%)
Mutual labels:  cuda
Jetson
Helmut Hoffer von Ankershoffen experimenting with arm64 based NVIDIA Jetson (Nano and AGX Xavier) edge devices running Kubernetes (K8s) for machine learning (ML) including Jupyter Notebooks, TensorFlow Training and TensorFlow Serving using CUDA for smart IoT.
Stars: ✭ 151 (-8.48%)
Mutual labels:  cuda
Xmrminer
🐜 A CUDA based miner for Monero
Stars: ✭ 158 (-4.24%)
Mutual labels:  cuda
Volumetric Path Tracer
☁️ Volumetric path tracer using cuda
Stars: ✭ 145 (-12.12%)
Mutual labels:  cuda
Ginkgo
Numerical linear algebra software package
Stars: ✭ 149 (-9.7%)
Mutual labels:  cuda
Dsmnet
Domain-invariant Stereo Matching Networks
Stars: ✭ 153 (-7.27%)
Mutual labels:  cuda
Libgdf
[ARCHIVED] C GPU DataFrame Library
Stars: ✭ 142 (-13.94%)
Mutual labels:  cuda
Cx db8
a contextual, biasable, word-or-sentence-or-paragraph extractive summarizer powered by the latest in text embeddings (Bert, Universal Sentence Encoder, Flair)
Stars: ✭ 164 (-0.61%)
Mutual labels:  cuda
Lantern
Stars: ✭ 150 (-9.09%)
Mutual labels:  cuda
3dunderworld Sls Gpu cpu
A structured light scanner
Stars: ✭ 157 (-4.85%)
Mutual labels:  cuda
Optical Flow Filter
A real time optical flow algorithm implemented on GPU
Stars: ✭ 146 (-11.52%)
Mutual labels:  cuda
Cuda Cnn
CNN accelerated by cuda. Test on mnist and finilly get 99.76%
Stars: ✭ 148 (-10.3%)
Mutual labels:  cuda
Cumf als
CUDA Matrix Factorization Library with Alternating Least Square (ALS)
Stars: ✭ 154 (-6.67%)
Mutual labels:  cuda
Gpurir
Python library for Room Impulse Response (RIR) simulation with GPU acceleration
Stars: ✭ 145 (-12.12%)
Mutual labels:  cuda
Clojurecuda
Clojure library for CUDA development
Stars: ✭ 158 (-4.24%)
Mutual labels:  cuda
Hoomd Blue
Molecular dynamics and Monte Carlo soft matter simulation on GPUs.
Stars: ✭ 143 (-13.33%)
Mutual labels:  cuda
Compactcnncascade
A binary library for very fast face detection using compact CNNs.
Stars: ✭ 152 (-7.88%)
Mutual labels:  cuda
Primitiv
A Neural Network Toolkit.
Stars: ✭ 164 (-0.61%)
Mutual labels:  cuda
Khiva
An open-source library of algorithms to analyse time series in GPU and CPU.
Stars: ✭ 161 (-2.42%)
Mutual labels:  cuda
Rmm
RAPIDS Memory Manager
Stars: ✭ 154 (-6.67%)
Mutual labels:  cuda

Multi GPU Programming Models

This project implements the well known multi GPU Jacobi solver with different multi GPU Programming Models:

  • single_threaded_copy Single Threaded using cudaMemcpy for inter GPU communication
  • multi_threaded_copy Multi Threaded with OpenMP using cudaMemcpy for inter GPU communication
  • multi_threaded_copy_overlapp Multi Threaded with OpenMP using cudaMemcpy for itner GPU communication with overlapping communication
  • multi_threaded_p2p Multi Threaded with OpenMP using GPUDirect P2P mappings for inter GPU communication
  • multi_threaded_p2p_opt Multi Threaded with OpenMP using GPUDirect P2P mappings for inter GPU communication with delayed norm execution
  • multi_threaded_um Multi Threaded with OpenMP relying on transparent peer mappings with Unified Memory for inter GPU communication
  • mpi Multi Process with MPI using CUDA-aware MPI for inter GPU communication
  • mpi_overlapp Multi Process with MPI using CUDA-aware MPI for inter GPU communication with overlapping communication
  • nvshmem Multi Process with MPI and NVSHMEM using NVSHMEM for inter GPU communication. Other approach, nvshmem_opt, might be better for portable performance.
  • nvshmem_opt Multi Process with MPI and NVSHMEM using NVSHMEM for inter GPU communication with NVSHMEM extension API

Each variant is a stand alone Makefile project and all variants have been described in the GTC 2019 Talk Multi GPU Programming Models

Requirements

  • CUDA: verison 11.0 (9.2 if build with DISABLE_CUB=1) or later is required by all variants.
  • OpenMP capable compiler: Required by the Multi Threaded variants. The examples have been developed and tested with gcc.
  • CUDA-aware MPI: Required by the MPI and NVSHMEM variants. The examples have been developed and tested with OpenMPI.
  • NVSHMEM (version 0.4.1 or later): Required by the NVSHMEM variant.

Building

Each variant come with a Makefile and can be build by simply issuing make, e.g.

multi-gpu-programming-models$ cd multi_threaded_copy
multi_threaded_copy$ make
nvcc -DHAVE_CUB -Xcompiler -fopenmp -lineinfo -DUSE_NVTX -lnvToolsExt -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_80,code=compute_80 -std=c++14 jacobi.cu -o jacobi
multi_threaded_copy$ ls jacobi
jacobi

Run instructions

All variant have the following command line options

  • -niter: How many iterations to carry out (default 1000)
  • -nccheck: How often to check for convergence (default 1)
  • -nx: Size of the domain in x direction (default 16384)
  • -ny: Size of the domain in y direction (default 16384)
  • -csv: Print performance results as -csv

The provided script bench.sh contains some examples executing all the benchmarks presented in the GTC 2019 Talk Multi GPU Programming Models.

Developers guide

The code applies the style guide implemented in .clang-format file. clang-format version 7 or later should be used to format the code prior to submitting it. E.g. with

multi-gpu-programming-models$ cd multi_threaded_copy
multi_threaded_copy$ clang-format -style=file -i jacobi.cu
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].