Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → UoB-HPC → Babelstream

UoB-HPC / Babelstream

Licence: other

STREAM, for lots of devices written in many programming models

Labels

cuda benchmark opencl gpgpu openmp parallel-processing

Projects that are alternatives of or similar to Babelstream

Occa

JIT Compilation for Multiple Architectures: C++, OpenMP, CUDA, HIP, OpenCL, Metal

Stars: ✭ 230 (+90.08%)

Mutual labels: openmp, gpgpu, opencl, cuda

Amgcl

C++ library for solving large sparse linear systems with algebraic multigrid method

Stars: ✭ 390 (+222.31%)

Mutual labels: openmp, gpgpu, opencl, cuda

Arraymancer

A fast, ergonomic and portable tensor library in Nim with a deep learning focus for CPU, GPU and embedded devices via OpenMP, Cuda and OpenCL backends

Stars: ✭ 793 (+555.37%)

Mutual labels: openmp, gpgpu, opencl, cuda

Ilgpu

ILGPU JIT Compiler for high-performance .Net GPU programs

Stars: ✭ 374 (+209.09%)

Mutual labels: gpgpu, opencl, cuda

Neanderthal

Fast Clojure Matrix Library

Stars: ✭ 927 (+666.12%)

Mutual labels: gpgpu, opencl, cuda

Arrayfire Python

Python bindings for ArrayFire: A general purpose GPU library.

Stars: ✭ 358 (+195.87%)

Mutual labels: gpgpu, opencl, cuda

Spoc

Stream Processing with OCaml

Stars: ✭ 115 (-4.96%)

Mutual labels: gpgpu, opencl, cuda

Hipsycl

Implementation of SYCL for CPUs, AMD GPUs, NVIDIA GPUs

Stars: ✭ 377 (+211.57%)

Mutual labels: gpgpu, opencl, cuda

Parenchyma

An extensible HPC framework for CUDA, OpenCL and native CPU.

Stars: ✭ 71 (-41.32%)

Mutual labels: gpgpu, opencl, cuda

Arrayfire Rust

Rust wrapper for ArrayFire

Stars: ✭ 525 (+333.88%)

Mutual labels: gpgpu, opencl, cuda

John

John the Ripper jumbo - advanced offline password cracker, which supports hundreds of hash and cipher types, and runs on many operating systems, CPUs, GPUs, and even some FPGAs

Stars: ✭ 5,656 (+4574.38%)

Mutual labels: openmp, gpgpu, opencl

Stdgpu

stdgpu: Efficient STL-like Data Structures on the GPU

Stars: ✭ 531 (+338.84%)

Mutual labels: openmp, gpgpu, cuda

Arrayfire

ArrayFire: a general purpose GPU library.

Stars: ✭ 3,693 (+2952.07%)

Mutual labels: gpgpu, opencl, cuda

crowdsource-video-experiments-on-android

Crowdsourcing video experiments (such as collaborative benchmarking and optimization of DNN algorithms) using Collective Knowledge Framework across diverse Android devices provided by volunteers. Results are continuously aggregated in the open repository:

Stars: ✭ 29 (-76.03%)

Mutual labels: opencl, openmp, cuda

HeCBench

software.intel.com/content/www/us/en/develop/articles/repo-evaluating-performance-productivity-oneapi.html

Stars: ✭ 85 (-29.75%)

Mutual labels: benchmark, openmp, cuda

Futhark

💥💻💥 A data-parallel functional programming language

Stars: ✭ 1,641 (+1256.2%)

Mutual labels: gpgpu, opencl, cuda

Mixbench

A GPU benchmark tool for evaluating GPUs on mixed operational intensity kernels (CUDA, OpenCL, HIP, SYCL)

Stars: ✭ 130 (+7.44%)

Mutual labels: opencl, cuda, benchmark

Bitcracker

BitCracker is the first open source password cracking tool for memory units encrypted with BitLocker

Stars: ✭ 463 (+282.64%)

Mutual labels: gpgpu, opencl, cuda

Vexcl

VexCL is a C++ vector expression template library for OpenCL/CUDA/OpenMP

Stars: ✭ 626 (+417.36%)

Mutual labels: gpgpu, opencl, cuda

Hashcat

World's fastest and most advanced password recovery utility

Stars: ✭ 11,014 (+9002.48%)

Mutual labels: gpgpu, opencl, cuda

View All Similar Projects ➔

BabelStream

Measure memory transfer rates to/from global device memory on GPUs. This benchmark is similar in spirit, and based on, the STREAM benchmark [1] for CPUs.

Unlike other GPU memory bandwidth benchmarks this does not include the PCIe transfer time.

There are multiple implementations of this benchmark in a variety of programming models. Currently implemented are:

OpenCL
CUDA
OpenACC
OpenMP 3 and 4.5
C++ Parallel STL
Kokkos
RAJA
SYCL

This code was previously called GPU-STREAM.

How is this different to STREAM?

BabelStream implements the four main kernels of the STREAM benchmark (along with a dot product), but by utilising different programming models expands the platforms which the code can run beyond CPUs.

The key differences from STREAM are that:

the arrays are allocated on the heap
the problem size is unknown at compile time
wider platform and programming model support

With stack arrays of known size at compile time, the compiler is able to align data and issue optimal instructions (such as non-temporal stores, remove peel/remainder vectorisation loops, etc.). But this information is not typically available in real HPC codes today, where the problem size is read from the user at runtime.

BabelStream therefore provides a measure of what memory bandwidth performance can be attained (by a particular programming model) if you follow today's best parallel programming best practice.

BabelStream also includes the nstream kernel from the Parallel Research Kernels (PRK) project, available on GitHub. Details about PRK can be found in the following references:

Van der Wijngaart, Rob F., and Timothy G. Mattson. The parallel research kernels. IEEE High Performance Extreme Computing Conference (HPEC). IEEE, 2014.

R. F. Van der Wijngaart, A. Kayi, J. R. Hammond, G. Jost, T. St. John, S. Sridharan, T. G. Mattson, J. Abercrombie, and J. Nelson. Comparing runtime systems with exascale ambitions using the Parallel Research Kernels. ISC 2016, DOI: 10.1007/978-3-319-41321-1_17.

Jeff R. Hammond and Timothy G. Mattson. Evaluating data parallelism in C++ using the Parallel Research Kernels. IWOCL 2019, DOI: 10.1145/3318170.3318192.

Website

uob-hpc.github.io/BabelStream/

Usage

Drivers, compiler and software applicable to whichever implementation you would like to build against is required.

We have supplied a series of Makefiles, one for each programming model, to assist with building. The Makefiles contain common build options, and should be simple to customise for your needs too.

General usage is make -f <Model>.make Common compiler flags and names can be set by passing a COMPILER option to Make, e.g. make COMPILER=GNU. Some models allow specifying a CPU or GPU style target, and this can be set by passing a TARGET option to Make, e.g. make TARGET=GPU.

Pass in extra flags via the EXTRA_FLAGS option.

The binaries are named in the form <model>-stream.

Building Kokkos

Kokkos version >= 3 requires setting the KOKKOS_PATH flag to the source directory of a distribution. For example:

cd 
wget https://github.com/kokkos/kokkos/archive/3.1.01.tar.gz
tar -xvf 3.1.01.tar.gz # should end up with ~/kokkos-3.1.01
cd BabelStream
make -f Kokkos.make KOKKOS_PATH=~/kokkos-3.1.01

See make output for more information on supported flags.

Building RAJA

We use the following command to build RAJA using the Intel Compiler.

cmake .. -DCMAKE_INSTALL_PREFIX=<prefix> -DCMAKE_C_COMPILER=icc -DCMAKE_CXX_COMPILER=icpc -DRAJA_PTR="RAJA_USE_RESTRICT_ALIGNED_PTR" -DCMAKE_BUILD_TYPE=ICCBuild -DRAJA_ENABLE_TESTS=Off

For building with CUDA support, we use the following command.

cmake .. -DCMAKE_INSTALL_PREFIX=<prefix> -DRAJA_PTR="RAJA_USE_RESTRICT_ALIGNED_PTR" -DRAJA_ENABLE_CUDA=1 -DRAJA_ENABLE_TESTS=Off

Results

Sample results can be found in the results subdirectory. If you would like to submit updated results, please submit a Pull Request.

Contributing

As of v4.0, the main branch of this repository will hold the latest released version.

The develop branch will contain unreleased features due for the next (major and/or minor) release of BabelStream. Pull Requests should be made against the develop branch.

Citing

Please cite BabelStream via this reference:

Deakin T, Price J, Martineau M, McIntosh-Smith S. GPU-STREAM v2.0: Benchmarking the achievable memory bandwidth of many-core processors across diverse parallel programming models. 2016. Paper presented at P^3MA Workshop at ISC High Performance, Frankfurt, Germany.

Other BabelStream publications:

Deakin T, McIntosh-Smith S. GPU-STREAM: Benchmarking the achievable memory bandwidth of Graphics Processing Units. 2015. Poster session presented at IEEE/ACM SuperComputing, Austin, United States. You can view the Poster and Extended Abstract.

Deakin T, Price J, Martineau M, McIntosh-Smith S. GPU-STREAM: Now in 2D!. 2016. Poster session presented at IEEE/ACM SuperComputing, Salt Lake City, United States. You can view the Poster and Extended Abstract.

Raman K, Deakin T, Price J, McIntosh-Smith S. Improving achieved memory bandwidth from C++ codes on Intel Xeon Phi Processor (Knights Landing). IXPUG Spring Meeting, Cambridge, UK, 2017.

Deakin T, Price J, Martineau M, McIntosh-Smith S. Evaluating attainable memory bandwidth of parallel programming models via BabelStream. International Journal of Computational Science and Engineering. Special issue (in press). 2017.

Deakin T, Price J, McIntosh-Smith S. Portable methods for measuring cache hierarchy performance. 2017. Poster sessions presented at IEEE/ACM SuperComputing, Denver, United States. You can view the Poster and Extended Abstract

[1]: McCalpin, John D., 1995: "Memory Bandwidth and Machine Balance in Current High Performance Computers", IEEE Computer Society Technical Committee on Computer Architecture (TCCA) Newsletter, December 1995.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 121

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (18) 🔗