All Projects → ROCm-Developer-Tools → ROCmValidationSuite

ROCm-Developer-Tools / ROCmValidationSuite

Licence: MIT license
The ROCm Validation Suite is a system administrator’s and cluster manager's tool for detecting and troubleshooting common problems affecting AMD GPU(s) running in a high-performance computing environment, enabled using the ROCm software stack on a compatible platform.

Programming Languages

C++
36643 projects - #6 most used programming language
CMake
9771 projects
python
139335 projects - #7 most used programming language
shell
77523 projects
c
50402 projects - #5 most used programming language

ROCmValidationSuite

The ROCm Validation Suite (RVS) is a system administrator’s and cluster manager's tool for detecting and troubleshooting common problems affecting AMD GPU(s) running in a high-performance computing environment, enabled using the ROCm software stack on a compatible platform.

The RVS is a collection of tests, benchmarks and qualification tools each targeting a specific sub-system of the ROCm platform. All of the tools are implemented in software and share a common command line interface. Each set of tests are implemented in a “module” which is a library encapsulating the functionality specific to the tool. The CLI can specify the directory containing modules to use when searching for libraries to load. Each module may have a set of options that it defines and a configuration file that supports its execution.

The function of each module see this link.

Examples and about config files link.

Prerequisites

Please do this before compilation/installing compiled package.

Ubuntu :

    sudo apt-get -y update && sudo apt-get install -y libpci3 libpci-dev doxygen unzip cmake git

CentOS :

    sudo yum install -y cmake3 doxygen pciutils-devel rpm rpm-build git gcc-c++ 

RHEL :

   sudo yum install -y cmake3 doxygen rpm rpm-build git gcc-c++ 
    
   wget http://mirror.centos.org/centos/7/os/x86_64/Packages/pciutils-devel-3.5.1-3.el7.x86_64.rpm
    
   sudo rpm -ivh pciutils-devel-3.5.1-3.el7.x86_64.rpm

SLES :

   sudo SUSEConnect -p sle-module-desktop-applications/15.1/x86_64
   
   sudo SUSEConnect --product sle-module-development-tools/15.1/x86_64
   
   sudo zypper  install -y cmake doxygen pciutils-devel libpci3 rpm git rpm-build gcc-c++ 

Install ROCm stack, rocblas and rocm-smi-lib

Install ROCm stack for Ubuntu/CentOS/SLES/RHEL, Refer https://github.com/RadeonOpenCompute/ROCm

Note:

rocm_smi64 package has been renamed to rocm-smi-lib64 from >= ROCm3.0. If you are using ROCm release < 3.0 , install the package as "rocm_smi64". rocm-smi-lib64 package has been renamed to rocm-smi-lib from >= ROCm4.1.

Install rocBLAS and rocm-smi-lib :

Ubuntu :

       sudo apt-get install rocblas rocm-smi-lib

CentOS & RHEL :

       sudo yum install --nogpgcheck rocblas rocm-smi-lib

SUSE :

       sudo zypper install rocblas rocm-smi-lib

Note: If rocm-smi-lib is already installed but "/opt/rocm/rocm_smi/ path doesn't exist. Do below:

Ubuntu : sudo dpkg -r rocm-smi-lib && sudo apt install rocm-smi-lib

CentOS & RHEL : sudo rpm -e rocm-smi-lib && sudo yum install rocm-smi-lib

SUSE : sudo rpm -e rocm-smi-lib && sudo zypper install rocm-smi-lib

Building from Source

This section explains how to get and compile current development stream of RVS.

Clone repository

git clone https://github.com/ROCm-Developer-Tools/ROCmValidationSuite.git

Configure and build RVS:

cd ROCmValidationSuite

If OS is Ubuntu and SLES, use cmake

 cmake  -DROCM_PATH=<rocm_installed_path> -DCMAKE_INSTALL_PREFIX=<rocm_installed_path> -DCMAKE_PACKAGING_INSTALL_PREFIX=<rocm_installed_path> ./ -B./build
 eg/- cmake -DROCM_PATH=/opt/rocm-4.0.0 -DCMAKE_INSTALL_PREFIX=/opt/rocm-4.0.0 -DCMAKE_PACKAGING_INSTALL_PREFIX=/opt/rocm-4.0.0 ./ -B./build
 
 make -C ./build

If OS is CentOS and RHEL, use cmake3

cmake3  -DROCM_PATH=<rocm_installed_path> -DCMAKE_INSTALL_PREFIX=<rocm_installed_path> -DCMAKE_PACKAGING_INSTALL_PREFIX=<rocm_installed_path> ./ -B./build

make -C ./build

Build package:

 cd ./build
 
 make package

Note:_ based on your OS, only DEB or RPM package will be built. You may ignore an error for the unrelated configuration

Install package:

Ubuntu : sudo dpkg -i rocm-validation-suite*.deb
CentOS & RHEL & SUSE : sudo rpm -i --replacefiles --nodeps rocm-validation-suite*.rpm

Note: RVS is getting packaged as part of ROCm release starting from 3.0. You can install pre-compiled package as below. Please make sure Prerequisites, ROCm stack, rocblas and rocm-smi-lib64 are already installed

Ubuntu : sudo apt install rocm-validation-suite
CentOS & RHEL : sudo yum install rocm-validation-suite
SUSE : sudo zypper install rocm-validation-suite

Running RVS

Running version built from source code:

cd ./build/bin
sudo ./rvs -d 3
sudo ./rvsqa.new.sh ; It will run complete rvs test suite

Running version pre-complied and packaged with ROCm release

sudo /opt/rocm/rvs/rvs -d 3

Similarly, you can run all tests as mentioned in "rvsqa.new.sh" script, present at "testscripts/rvsqa.new.sh"

Regression

Simple regression has been implemented. You may find more about it on this link.

Reporting

Test based reporting is enabled since beginning. Added json based reporting to gst and iet modules. To enable json logging use "-j" command line option. ./rvs -c conf/gst_sinle.conf -d 3 -j the json location will be in /var/log folder and the name of the file will be printed in the stdout. output structure is as shown below:


{
{"module-name":{
  "action-name":[

{
    "target" : "<flops/power>"
  },
{
    "dtype" : "optype"
  },
{
    "gpu_id" : "63217",
    "GFLOPS" : "11433.352136"
  },
{
    "gpu_id" : "63217",
    "GFLOPS" : "11436.291718"
  },
....]
} 
}
....
}

example for gst is:


{"gst":{
  "gpustress-9000-sgemm-false":[

{
    "target" : "9000.000000"
  },
{
    "dtype" : "sgemm"
  },
{
    "gpu_id" : "63217",
    "GFLOPS" : "11433.352136"
  },
{
    "gpu_id" : "63217",
    "GFLOPS" : "11436.291718"
  }]
  }
  }
  {"gst":{
  "gpustress-8000-sgemm-true":[
,
{
    "target" : "8000.000000"
  },
{
    "dtype" : "sgemm"
  },
{
    "gpu_id" : "63217",
    "GFLOPS" : "11657.886019"
  },
{
    "gpu_id" : "63217",
    "GFLOPS" : "11675.718793"
  },
{
    "gpu_id" : "63217",
    "GFLOPS" : "11687.461158"
  } ]
  }
 }
}

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].