All Projects → LLNL → mpiBench

LLNL / mpiBench

Licence: other
MPI benchmark to test and measure collective performance

Programming Languages

c
50402 projects - #5 most used programming language
perl
6916 projects
Makefile
30231 projects

Projects that are alternatives of or similar to mpiBench

Mpi.jl
MPI wrappers for Julia
Stars: ✭ 197 (+405.13%)
Mutual labels:  mpi
api-spec
API Specififications
Stars: ✭ 30 (-23.08%)
Mutual labels:  mpi
ravel
Ravel MPI trace visualization tool
Stars: ✭ 26 (-33.33%)
Mutual labels:  mpi
Dmtcp
DMTCP: Distributed MultiThreaded CheckPointing
Stars: ✭ 229 (+487.18%)
Mutual labels:  mpi
hp2p
Heavy Peer To Peer: a MPI based benchmark for network diagnostic
Stars: ✭ 17 (-56.41%)
Mutual labels:  mpi
ParMmg
Distributed parallelization of 3D volume mesh adaptation
Stars: ✭ 19 (-51.28%)
Mutual labels:  mpi
Timemory
Modular C++ Toolkit for Performance Analysis and Logging. Profiling API and Tools for C, C++, CUDA, Fortran, and Python. The C++ template API is essentially a framework to creating tools: it is designed to provide a unifying interface for recording various performance measurements alongside data logging and interfaces to other tools.
Stars: ✭ 192 (+392.31%)
Mutual labels:  mpi
arbor
The Arbor multi-compartment neural network simulation library.
Stars: ✭ 87 (+123.08%)
Mutual labels:  mpi
Foundations of HPC 2021
This repository collects the materials from the course "Foundations of HPC", 2021, at the Data Science and Scientific Computing Department, University of Trieste
Stars: ✭ 22 (-43.59%)
Mutual labels:  mpi
Theano-MPI
MPI Parallel framework for training deep learning models built in Theano
Stars: ✭ 55 (+41.03%)
Mutual labels:  mpi
Batch Shipyard
Simplify HPC and Batch workloads on Azure
Stars: ✭ 240 (+515.38%)
Mutual labels:  mpi
azurehpc
This repository provides easy automation scripts for building a HPC environment in Azure. It also includes examples to build e2e environment and run some of the key HPC benchmarks and applications.
Stars: ✭ 102 (+161.54%)
Mutual labels:  mpi
az-hop
The Azure HPC On-Demand Platform provides an HPC Cluster Ready solution
Stars: ✭ 33 (-15.38%)
Mutual labels:  mpi
Abyss
🔬 Assemble large genomes using short reads
Stars: ✭ 219 (+461.54%)
Mutual labels:  mpi
Singularity-tutorial
Singularity 101
Stars: ✭ 31 (-20.51%)
Mutual labels:  mpi
Raxml Ng
RAxML Next Generation: faster, easier-to-use and more flexible
Stars: ✭ 191 (+389.74%)
Mutual labels:  mpi
GenomicsDB
Highly performant data storage in C++ for importing, querying and transforming variant data with C/C++/Java/Spark bindings. Used in gatk4.
Stars: ✭ 77 (+97.44%)
Mutual labels:  mpi
scr
SCR caches checkpoint data in storage on the compute nodes of a Linux cluster to provide a fast, scalable checkpoint / restart capability for MPI codes.
Stars: ✭ 84 (+115.38%)
Mutual labels:  mpi
alsvinn
The fast Finite Volume simulator with UQ support.
Stars: ✭ 22 (-43.59%)
Mutual labels:  mpi
t8code
Parallel algorithms and data structures for tree-based AMR with arbitrary element shapes.
Stars: ✭ 37 (-5.13%)
Mutual labels:  mpi

mpiBench

Times MPI collectives over a series of message sizes

What is mpiBench?

mpiBench.c

This program measures MPI collective performance for a range of message sizes. The user may specify:

  • the collective to perform,
  • the message size limits,
  • the number of iterations to perform,
  • the maximum memory a process may allocate for MPI buffers,
  • the maximum time permitted for a given test,
  • and the number of Cartesian dimensions to divide processes into.

The default behavior of mpiBench will run from 0-256K byte messages for all supported collectives on MPI_COMM_WORLD with a 1G buffer limit. Each test will execute as many iterations as it can to fit within a default time limit of 50000 usecs.

crunch_mpiBench

This is a perl script which can be used to filter data and generate reports from mpiBench output files. It can merge data from multiple mpiBench output files into a single report. It can also filter output to a subset of collectives. By default, it reports the operation duration time (i.e., how long the collective took to complete). For some collectives, it can also report the effective bandwidth. If provided two datasets, it computes a speedup factor.

What is measured

mpiBench measures the total time required to iterate through a loop of back-to-back invocations of the same collective (optionally separated by a barrier), and divides by the number of iterations. In other words the timing kernel looks like the following:

time_start = timer();
for (i=0 ; i < iterations; i++) {
  collective(msg_size);
  barrier();
}
time_end = timer();
time = (time_end - time_start) / iterations;

Each participating MPI process performs this measurement and all report their times. It is the average, minimum, and maximum across this set of times which is reported.

Before the timing kernel is started, the collective is invoked once to prime it, since the initial call may be subject to overhead that later calls are not. Then, the collective is timed across a small set of iterations (~5) to get a rough estimate for the time required for a single invocation. If the user specifies a time limit using the -t option, this esitmate is used to reduce the number of iterations made in the timing kernel loop, as necessary, so it may executed within the time limit.

Basic Usage

Build:

make

Run:

srun -n <procs> ./mpiBench > output.txt

Analyze:

crunch_mpiBench output.txt

Build Instructions

There are several make targets available:

  • make -- simple build
  • make nobar -- build without barriers between consecutive collective invocations
  • make debug -- build with "-g -O0" for debugging purposes
  • make clean -- clean the build

If you'd like to build manually without the makefiles, there are some compile-time options that you should be aware of:

-D NO_BARRIER - drop barrier between consecutive collective invocations -D USE_GETTIMEOFDAY - use gettimeofday() instead of MPI_Wtime() for timing info

Usage Syntax

Usage:  mpiBench [options] [operations]

Options:
  -b <byte>  Beginning message size in bytes (default 0)
  -e <byte>  Ending message size in bytes (default 1K)
  -i <itrs>  Maximum number of iterations for a single test
             (default 1000)
  -m <byte>  Process memory buffer limit (send+recv) in bytes
             (default 1G)
  -t <usec>  Time limit for any single test in microseconds
             (default 0 = infinity)
  -d <ndim>  Number of dimensions to split processes in
             (default 0 = MPI_COMM_WORLD only)
  -c         Check receive buffer for expected data in last
             interation (default disabled)
  -C         Check receive buffer for expected data every
             iteration (default disabled)
  -h         Print this help screen and exit
  where <byte> = [0-9]+[KMG], e.g., 32K or 64M

Operations:
  Barrier
  Bcast
  Alltoall, Alltoallv
  Allgather, Allgatherv
  Gather, Gatherv
  Scatter
  Allreduce
  Reduce

Examples

mpiBench

Run the default set of tests:

srun -n2 -ppdebug mpiBench

Run the default message size range and iteration count for Alltoall, Allreduce, and Barrier:

srun -n2 -ppdebug mpiBench Alltoall Allreduce Barrier

Run from 32-256 bytes and time across 100 iterations of Alltoall:

srun -n2 -ppdebug mpiBench -b 32 -e 256 -i 100 Alltoall

Run from 0-2K bytes and default iteration count for Gather, but reduce the iteration count, as necessary, so each message size test finishes within 100,000 usecs:

srun -n2 -ppdebug mpiBench -e 2K -t 100000 Gather

crunch_mpiBench

Show data for just Alltoall:

crunch_mpiBench -op Alltoall out.txt

Merge data from several files into a single report:

crunch_mpiBench out1.txt out2.txt out3.txt

Display effective bandwidth for Allgather and Alltoall:

crunch_mpiBench -bw -op Allgather,Alltoall out.txt

Compare times in output files in dir1 with those in dir2:

crunch_mpiBench -data DIR1_DATA dir1/* -data DIR2_DATA dir2/*

Additional Notes

Rank 0 always acts as the root process for collectives which involve a root.

If the minimum and maximum are quite different, then some processes may be escaping ahead to start later iterations before the last one has completely finished. In this case, one may use the maximum time reported or insert a barrier between consecutive invocations (build with "make" instead of "make nobar") to syncronize the processes.

For Reduce and Allreduce, vectors of doubles are added, so message sizes of 1, 2, and 4-bytes are skipped.

Two available make commands build mpiBench with test kernels like the following:

   "make"              "make nobar"
start=timer()        start=timer()
for(i=o;i<N;i++)     for(i=o;i<N;i++)
{                    {
  MPI_Gather()         MPI_Gather()
  MPI_Barrier()
}                    }
end=timer()          end=timer()
time=(end-start)/N   time=(end-start)/N

"make nobar" may allow processes to escape ahead, but does not include cost of barrier.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].