Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → NVIDIA → Nccl Tests

NVIDIA / Nccl Tests

Licence: other

NCCL Tests

Labels

cuda

Projects that are alternatives of or similar to Nccl Tests

Optical Flow Filter

A real time optical flow algorithm implemented on GPU

Stars: ✭ 146 (-12.05%)

Mutual labels: cuda

Cumf als

CUDA Matrix Factorization Library with Alternating Least Square (ALS)

Stars: ✭ 154 (-7.23%)

Mutual labels: cuda

Cx db8

a contextual, biasable, word-or-sentence-or-paragraph extractive summarizer powered by the latest in text embeddings (Bert, Universal Sentence Encoder, Flair)

Stars: ✭ 164 (-1.2%)

Mutual labels: cuda

Cuda Cnn

CNN accelerated by cuda. Test on mnist and finilly get 99.76%

Stars: ✭ 148 (-10.84%)

Mutual labels: cuda

Compactcnncascade

A binary library for very fast face detection using compact CNNs.

Stars: ✭ 152 (-8.43%)

Mutual labels: cuda

3dunderworld Sls Gpu cpu

A structured light scanner

Stars: ✭ 157 (-5.42%)

Mutual labels: cuda

Gpurir

Python library for Room Impulse Response (RIR) simulation with GPU acceleration

Stars: ✭ 145 (-12.65%)

Mutual labels: cuda

Opencuda

Stars: ✭ 164 (-1.2%)

Mutual labels: cuda

Dsmnet

Domain-invariant Stereo Matching Networks

Stars: ✭ 153 (-7.83%)

Mutual labels: cuda

Khiva

An open-source library of algorithms to analyse time series in GPU and CPU.

Stars: ✭ 161 (-3.01%)

Mutual labels: cuda

Ginkgo

Numerical linear algebra software package

Stars: ✭ 149 (-10.24%)

Mutual labels: cuda

Jetson

Helmut Hoffer von Ankershoffen experimenting with arm64 based NVIDIA Jetson (Nano and AGX Xavier) edge devices running Kubernetes (K8s) for machine learning (ML) including Jupyter Notebooks, TensorFlow Training and TensorFlow Serving using CUDA for smart IoT.

Stars: ✭ 151 (-9.04%)

Mutual labels: cuda

Xmrminer

🐜 A CUDA based miner for Monero

Stars: ✭ 158 (-4.82%)

Mutual labels: cuda

Sketchgraphs

A dataset of 15 million CAD sketches with geometric constraint graphs.

Stars: ✭ 148 (-10.84%)

Mutual labels: cuda

Primitiv

A Neural Network Toolkit.

Stars: ✭ 164 (-1.2%)

Mutual labels: cuda

Volumetric Path Tracer

☁️ Volumetric path tracer using cuda

Stars: ✭ 145 (-12.65%)

Mutual labels: cuda

Rmm

RAPIDS Memory Manager

Stars: ✭ 154 (-7.23%)

Mutual labels: cuda

Jcuda

JCuda - Java bindings for CUDA

Stars: ✭ 165 (-0.6%)

Mutual labels: cuda

Multi Gpu Programming Models

Examples demonstrating available options to program multiple GPUs in a single node or a cluster

Stars: ✭ 165 (-0.6%)

Mutual labels: cuda

Clojurecuda

Clojure library for CUDA development

Stars: ✭ 158 (-4.82%)

Mutual labels: cuda

View All Similar Projects ➔

NCCL Tests

These tests check both the performance and the correctness of NCCL operations.

Build

To build the tests, just type make.

If CUDA is not installed in /usr/local/cuda, you may specify CUDA_HOME. Similarly, if NCCL is not installed in /usr, you may specify NCCL_HOME.

$ make CUDA_HOME=/path/to/cuda NCCL_HOME=/path/to/nccl

NCCL tests rely on MPI to work on multiple processes, hence multiple nodes. If you want to compile the tests with MPI support, you need to set MPI=1 and set MPI_HOME to the path where MPI is installed.

$ make MPI=1 MPI_HOME=/path/to/mpi CUDA_HOME=/path/to/cuda NCCL_HOME=/path/to/nccl

Usage

NCCL tests can run on multiple processes, multiple threads, and multiple CUDA devices per thread. The number of process is managed by MPI and is therefore not passed to the tests as argument. The total number of ranks (=CUDA devices) will be equal to (number of processes)*(number of threads)*(number of GPUs per thread).

Quick examples

Run on 8 GPUs (-g 8), scanning from 8 Bytes to 128MBytes :

$ ./build/all_reduce_perf -b 8 -e 128M -f 2 -g 8

Run with MPI on 40 processes (potentially on multiple nodes) with 4 GPUs each :

$ mpirun -np 40 ./build/all_reduce_perf -b 8 -e 128M -f 2 -g 4

Performance

See the Performance page for explanation about numbers, and in particular the "busbw" column.

Arguments

All tests support the same set of arguments :

Number of GPUs
- -t,--nthreads <num threads> number of threads per process. Default : 1.
- -g,--ngpus <GPUs per thread> number of gpus per thread. Default : 1.
Sizes to scan
- -b,--minbytes <min size in bytes> minimum size to start with. Default : 32M.
- -e,--maxbytes <max size in bytes> maximum size to end at. Default : 32M.
- Increments can be either fixed or a multiplication factor. Only one of those should be used
  - -i,--stepbytes <increment size> fixed increment between sizes. Default : (max-min)/10.
  - -f,--stepfactor <increment factor> multiplication factor between sizes. Default : disabled.
NCCL operations arguments
- -o,--op <sum/prod/min/max/all> Specify which reduction operation to perform. Only relevant for reduction operations like Allreduce, Reduce or ReduceScatter. Default : Sum.
- -d,--datatype <nccltype/all> Specify which datatype to use. Default : Float.
- -r,--root <root/all> Specify which root to use. Only for operations with a root like broadcast or reduce. Default : 0.
Performance
- -n,--iters <iteration count> number of iterations. Default : 20.
- -w,--warmup_iters <warmup iteration count> number of warmup iterations (not timed). Default : 5.
- -m,--agg_iters <aggregation count> number of operations to aggregate together in each iteration. Default : 1.
Test operation
- -p,--parallel_init <0/1> use threads to initialize NCCL in parallel. Default : 0.
- -c,--check <0/1> check correctness of results. This can be quite slow on large numbers of GPUs. Default : 1.
- -z,--blocking <0/1> Make NCCL collective blocking, i.e. have CPUs wait and sync after each collective. Default : 0.

Copyright

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 166

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (32) 🔗