All Projects → NVIDIA → nvbench

NVIDIA / nvbench

Licence: Apache-2.0 license
CUDA Kernel Benchmarking Library

Programming Languages

Cuda
1817 projects
C++
36643 projects - #6 most used programming language
CMake
9771 projects
python
139335 projects - #7 most used programming language
shell
77523 projects

Projects that are alternatives of or similar to nvbench

Deeppicar
Deep Learning Autonomous Car based on Raspberry Pi, SunFounder PiCar-V Kit, TensorFlow, and Google's EdgeTPU Co-Processor
Stars: ✭ 242 (+13.62%)
Mutual labels:  nvidia
fahclient
Dockerized Folding@home client with NVIDIA GPU support to help battle COVID-19
Stars: ✭ 38 (-82.16%)
Mutual labels:  nvidia
CUDAfy.NET
CUDAfy .NET allows easy development of high performance GPGPU applications completely from the .NET. It's developed in C#.
Stars: ✭ 56 (-73.71%)
Mutual labels:  nvidia
F1-demo
Real-time vehicle telematics analytics demo using OmniSci
Stars: ✭ 27 (-87.32%)
Mutual labels:  nvidia
nvidia-vaapi-driver
A VA-API implemention using NVIDIA's NVDEC
Stars: ✭ 789 (+270.42%)
Mutual labels:  nvidia
isaac ros dnn inference
Hardware-accelerated DNN model inference ROS2 packages using NVIDIA Triton/TensorRT for both Jetson and x86_64 with CUDA-capable GPU
Stars: ✭ 67 (-68.54%)
Mutual labels:  nvidia
Nvidia Modded Inf
Modified nVidia .inf files to run drivers on all video cards, research & telemetry free drivers
Stars: ✭ 227 (+6.57%)
Mutual labels:  nvidia
nix-install-vendor-gl
Ensure that a system-compatible OpenGL driver is available for `nix-shell`-encapsulated programs.
Stars: ✭ 22 (-89.67%)
Mutual labels:  nvidia
gl dynamic lod
GPU classifies how to render millions of particles
Stars: ✭ 63 (-70.42%)
Mutual labels:  nvidia
nvidia-auto-installer-for-fedora-linux
A CLI tool which lets you install proprietary NVIDIA drivers and much more easily on Fedora Linux (32 or above and Rawhide)
Stars: ✭ 270 (+26.76%)
Mutual labels:  nvidia
nvidia-docker-bootstrap
For those times when nvidia-docker is not possible (like AWS ECS)
Stars: ✭ 19 (-91.08%)
Mutual labels:  nvidia
xnxpilot
Openpilot on Jetson Xavier NX
Stars: ✭ 81 (-61.97%)
Mutual labels:  nvidia
cucim
No description or website provided.
Stars: ✭ 218 (+2.35%)
Mutual labels:  nvidia
Plotoptix
Data visualisation in Python based on OptiX 7.2 ray tracing framework.
Stars: ✭ 252 (+18.31%)
Mutual labels:  nvidia
rtx-voice-script
A python script that takes an input MP3/FLAC and outputs an acapella/background noise stripped WAV using the power of NVIDIA's RTX Voice
Stars: ✭ 50 (-76.53%)
Mutual labels:  nvidia
Jetson Containers
Machine Learning Containers for NVIDIA Jetson and JetPack-L4T
Stars: ✭ 223 (+4.69%)
Mutual labels:  nvidia
RTX-Mesh-Shaders
Different mesh shading techniques using the NVIDIA RTX (Turing) technology.
Stars: ✭ 84 (-60.56%)
Mutual labels:  nvidia
Geforce-Kepler-patcher
Install Nvidia binaries files on Snapshot disk for macOS Monterey 12
Stars: ✭ 285 (+33.8%)
Mutual labels:  nvidia
ros-docker-gui
ROS Docker Containers with X11 (GUI) support [Linux]
Stars: ✭ 137 (-35.68%)
Mutual labels:  nvidia
NVOC
No description or website provided.
Stars: ✭ 26 (-87.79%)
Mutual labels:  nvidia

Overview

This project is a work-in-progress. Everything is subject to change.

NVBench is a C++17 library designed to simplify CUDA kernel benchmarking. It features:

  • Parameter sweeps: a powerful and flexible "axis" system explores a kernel's configuration space. Parameters may be dynamic numbers/strings or static types.
  • Runtime customization: A rich command-line interface allows redefinition of parameter axes, CUDA device selection, locking GPU clocks (Volta+), changing output formats, and more.
  • Throughput calculations: Compute and report:
    • Item throughput (elements/second)
    • Global memory bandwidth usage (bytes/second and per-device %-of-peak-bw)
  • Multiple output formats: Currently supports markdown (default) and CSV output.
  • Manual timer mode: (optional) Explicitly start/stop timing in a benchmark implementation.
  • Multiple measurement types:
    • Cold Measurements:
      • Each sample runs the benchmark once with a clean device L2 cache.
      • GPU and CPU times are reported.
    • Batch Measurements:
      • Executes the benchmark multiple times back-to-back and records total time.
      • Reports the average execution time (total time / number of executions).

Getting Started

Minimal Benchmark

A basic kernel benchmark can be created with just a few lines of CUDA C++:

void my_benchmark(nvbench::state& state) {
  state.exec([](nvbench::launch& launch) { 
    my_kernel<<<num_blocks, 256, 0, launch.get_stream()>>>();
  });
}
NVBENCH_BENCH(my_benchmark);

See Benchmarks for information on customizing benchmarks and implementing parameter sweeps.

Command Line Interface

Each benchmark executable produced by NVBench provides a rich set of command-line options for configuring benchmark execution at runtime. See the CLI overview and CLI axis specification for more information.

Examples

This repository provides a number of examples that demonstrate various NVBench features and usecases:

Building Examples

To build the examples:

mkdir -p build
cd build
cmake -DNVBench_ENABLE_EXAMPLES=ON -DCMAKE_CUDA_ARCHITECTURES=70 .. && make

Be sure to set CMAKE_CUDA_ARCHITECTURE based on the GPU you are running on.

Examples are built by default into build/bin and are prefixed with nvbench.example.

Example output from `nvbench.example.throughput`
# Devices

## [0] `Quadro GV100`
* SM Version: 700 (PTX Version: 700)
* Number of SMs: 80
* SM Default Clock Rate: 1627 MHz
* Global Memory: 32163 MiB Free / 32508 MiB Total
* Global Memory Bus Peak: 870 GiB/sec (4096-bit DDR @850MHz)
* Max Shared Memory: 96 KiB/SM, 48 KiB/Block
* L2 Cache Size: 6144 KiB
* Maximum Active Blocks: 32/SM
* Maximum Active Threads: 2048/SM, 1024/Block
* Available Registers: 65536/SM, 65536/Block
* ECC Enabled: No

# Log

Run:  throughput_bench [Device=0]
Warn: Current measurement timed out (15.00s) while over noise threshold (1.26% > 0.50%)
Pass: Cold: 0.262392ms GPU, 0.267860ms CPU, 7.19s total GPU, 27393x
Pass: Batch: 0.261963ms GPU, 7.18s total GPU, 27394x

# Benchmark Results

## throughput_bench

### [0] Quadro GV100

| NumElements |  DataSize  | Samples |  CPU Time  | Noise |  GPU Time  | Noise | Elem/s  | GlobalMem BW  | BWPeak | Batch GPU  | Batch  |
|-------------|------------|---------|------------|-------|------------|-------|---------|---------------|--------|------------|--------|
|    16777216 | 64.000 MiB |  27393x | 267.860 us | 1.25% | 262.392 us | 1.26% | 63.940G | 476.387 GiB/s | 58.77% | 261.963 us | 27394x |

Demo Project

To get started using NVBench with your own kernels, consider trying out the NVBench Demo Project.

nvbench_demo provides a simple CMake project that uses NVBench to build an example benchmark. It's a great way to experiment with the library without a lot of investment.

Contributing

Contributions are welcome!

For current issues, see the issue board. Issues labeled with are good for first time contributors.

Tests

To build nvbench tests:

mkdir -p build
cd build
cmake -DNVBench_ENABLE_TESTING=ON .. && make

Tests are built by default into build/bin and prefixed with nvbench.test.

To run all tests:

make test

or

ctest

License

NVBench is released under the Apache 2.0 License with LLVM exceptions. See LICENSE.

Scope and Related Projects

NVBench will measure the CPU and CUDA GPU execution time of a single host-side critical region per benchmark. It is intended for regression testing and parameter tuning of individual kernels. For in-depth analysis of end-to-end performance of multiple applications, the NVIDIA Nsight tools are more appropriate.

NVBench is focused on evaluating the performance of CUDA kernels and is not optimized for CPU microbenchmarks. This may change in the future, but for now, consider using Google Benchmark for high resolution CPU benchmarks.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].