CK-NNTest - testing and benchmarking common Neural Network operations via Collective Knowledge

All CK components can be found at cKnowledge.io and in one GitHub repository!

This project is hosted by the cTuning foundation (non-profit R&D organization).

This Collective Knowledge project contains CK workflows, automation actions, and reusable artifacts to test and benchmark common Nerual Network operations.

Installation

CK

$ python -m pip install ck --user

CK-NNTest

$ ck pull repo:ck-nntest
$ ck install package --tags=lib,nntest

Arm Compute Library with Neon

To use Arm Compute Library (Neon) tests with the latest development branch:

$ ck install package --tags=lib,armcl,neon,dev

To install a specific release of the library (e.g. 20.05):

$ ck install package --tags=lib,armcl,neon,rel.20.05

Arm Compute Library with OpenCL

To use Arm Compute Library (OpenCL) tests with the latest development branch:

$ ck install package --tags=lib,armcl,opencl,dev

To install a specific release of the library (e.g. 20.05):

$ ck install package --tags=lib,armcl,opencl,rel.20.05

[DEPRECATED] TensorFlow

To use TensorFlow CPU tests, install a third-party TensorFlow_CC package:

$ ck pull repo:ck-tensorflow
$ ck install package:lib-tensorflow_cc-shared-1.7.0 [--env.CK_HOST_CPU_NUMBER_OF_PROCESSORS=2]

To install, follow the instructions in the Readme NB: You may want to limit the number of build threads on a memory-constrained platform (e.g. to 2 as above).

[DEPRECATED] Caffe

To use Caffe tests, get the public CK-Caffe repository:

$ ck pull repo --url=https://github.com/dividiti/ck-caffe

To use Caffe CPU tests, install:

$ ck install package:lib-caffe-bvlc-master-cpu-universal

To use Caffe OpenCL tests, install one or more of the packages listed by:

$ ck list package:lib-caffe-bvlc-opencl-*-universal

For example, install Caffe with CLBlast:

$ ck install package:lib-caffe-bvlc-opencl-clblast-universal

Usage

View all operator tests

To view all available NNTest test programs and data sets:

$ ck search program --tags=nntest | sort
$ ck search dataset --tags=nntest | sort

Test a single operator

To compile and run a single test listed above, use e.g.:

$ ck run nntest:softmax-armcl-opencl

To view all tests to be performed run this command with list_tests or dry_run options. The command with list_tests option only lists all combinations of a library, dataset, tensor shape to be processed. And command with dry_run option prepares a pipeline for each test but don't run it.

$ ck run nntest:softmax-armcl-opencl --list_tests
$ ck run nntest:softmax-armcl-opencl --dry_run

Run experiments using Arm Compute Library with OpenCL

CK-NNTest supports the following operators for the Arm Compute Library:

average pool (fp32, uint8)
convolution (fp32, uint8)
depthwise convolution (fp32, uint8)
direct convolution (fp32, uint8)
fully connected (fp32, uint8)
gemm (fp32)
reshape (fp32, uint8)
resize bilinear (fp32, uint8)
softmax (fp32, uint8)
winograd convolution (fp32)

NB: Not all operators are supported for all libraries.

average pool fp32

Kernel profiling:

$ ck run nntest:avgpool-armcl-opencl --dvdt_prof --timestamp=<platform>-profiling

Validation:

$ ck run nntest:avgpool-armcl-opencl --repetitions=10 --timestamp=<platform>-validation

average pool uint8

Kernel profiling:

$ ck run nntest:avgpool-armcl-opencl-uint8 --dvdt_prof --timestamp=<platform>-profiling

Validation:

$ ck run nntest:avgpool-armcl-opencl-uint8 --repetitions=10 --timestamp=<platform>-validation

convolution fp32

Kernel profiling:

$ ck run nntest:conv-armcl-opencl --dvdt_prof --timestamp=<platform>-profiling

Validation:

$ ck run nntest:conv-armcl-opencl --repetitions=10 --timestamp=<platform>-validation

convolution uint8

Kernel profiling:

$ ck run nntest:conv-armcl-opencl-uint8 --dvdt_prof --timestamp=<platform>-profiling

Validation:

$ ck run nntest:conv-armcl-opencl-uint8 --repetitions=10 --timestamp=<platform>-validation

depthwise convolution fp32

Kernel profiling:

$ ck run nntest:depthwiseconv-armcl-opencl --dvdt_prof --timestamp=<platform>-profiling

Validation:

$ ck run nntest:depthwiseconv-armcl-opencl --repetitions=10 --timestamp=<platform>-validation

depthwise convolution uint8

Kernel profiling:

$ ck run nntest:depthwiseconv-armcl-opencl-uint8 --dvdt_prof --timestamp=<platform>-profiling

Validation:

$ ck run nntest:depthwiseconv-armcl-opencl-uint8 --repetitions=10 --timestamp=<platform>-validation

direct convolution fp32

Kernel profiling:

$ ck run nntest:directconv-armcl-opencl --dvdt_prof --timestamp=<platform>-profiling

Validation:

$ ck run nntest:directconv-armcl-opencl --repetitions=10 --timestamp=<platform>-validation

direct convolution uint8

Kernel profiling:

$ ck run nntest:directconv-armcl-opencl-uint8 --dvdt_prof --timestamp=<platform>-profiling

Validation:

$ ck run nntest:directconv-armcl-opencl-uint8 --repetitions=10 --timestamp=<platform>-validation

fully connected fp32

Kernel profiling:

$ ck run nntest:fullyconnected-armcl-opencl --dvdt_prof --timestamp=<platform>-profiling

Validation:

$ ck run nntest:fullyconnected-armcl-opencl --repetitions=10 --timestamp=<platform>-validation

fully connected uint8

Kernel profiling:

$ ck run nntest:fullyconnected-armcl-opencl-uint8 --dvdt_prof --timestamp=<platform>-profiling

Validation:

$ ck run nntest:fullyconnected-armcl-opencl-uint8 --repetitions=10 --timestamp=<platform>-validation

gemm [TO FIX]

Kernel profiling:

$ ck run nntest:gemm-armcl-opencl --dvdt_prof --timestamp=<platform>-profiling

Validation:

$ ck run nntest:gemm-armcl-opencl --repetitions=10 --timestamp=<platform>-validation

reshape fp32

Kernel profiling:

$ ck run nntest:reshape-armcl-opencl --dvdt_prof --timestamp=<platform>-profiling

Validation:

$ ck run nntest:reshape-armcl-opencl --repetitions=10 --timestamp=<platform>-validation

reshape uint8

Kernel profiling:

$ ck run nntest:reshape-armcl-opencl-uint8 --dvdt_prof --timestamp=<platform>-profiling

Validation:

$ ck run nntest:reshape-armcl-opencl-uint8 --repetitions=10 --timestamp=<platform>-validation

resize bilinear fp32

Kernel profiling:

$ ck run nntest:resizebilinear-armcl-opencl --dvdt_prof --timestamp=<platform>-profiling

Validation:

$ ck run nntest:resizebilinear-armcl-opencl --repetitions=10 --timestamp=<platform>-validation

resize bilinear uint8

Kernel profiling:

$ ck run nntest:resizebilinear-armcl-opencl-uint8 --dvdt_prof --timestamp=<platform>-profiling

Validation:

$ ck run nntest:resizebilinear-armcl-opencl-uint8 --repetitions=10 --timestamp=<platform>-validation

softmax fp32

Kernel profiling:

$ ck run nntest:softmax-armcl-opencl --dvdt_prof --timestamp=<platform>-profiling

Validation:

$ ck run nntest:softmax-armcl-opencl --repetitions=10 --timestamp=<platform>-validation

softmax uint8

Kernel profiling:

$ ck run nntest:softmax-armcl-opencl-uint8 --dvdt_prof --timestamp=<platform>-profiling

Validation:

$ ck run nntest:softmax-armcl-opencl-uint8 --repetitions=10 --timestamp=<platform>-validation

winograd convolution fp32

Kernel profiling:

$ ck run nntest:winogradconv-armcl-opencl --dvdt_prof --timestamp=<platform>-profiling

Validation:

$ ck run nntest:winogradconv-armcl-opencl --repetitions=10 --timestamp=<platform>-validation

Choose a dataset

If more than one dataset suitable for the operator under test (softmax above) is found, make the choice.

To test with a particular dataset, use e.g.:

$ ck run program:softmax-armcl-opencl --dataset_file=shape-1024-1-1

To test with all suitable datasets, use e.g.:

$ ck run nntest:softmax-armcl-opencl --iterations=1

NB: By default, ck run nntest:* iterates over the batch sizes ranging from 1 to 16; --iterations=1 stops the test after the first iteration (for the batch size of 1).

Override dataset values

To override one or more keys in a dataset, use e.g.:

$ ck run program:softmax-armcl-opencl --env.CK_IN_SHAPE_C=256

Record test results locally

By default, test results are recorded in a local repository. To just print test results to the standard output, use no_record option e.g.:

$ ck run nntest:softmax-armcl-opencl --iterations=1 --no_record

To list the results saved locally, use e.g.:

$ ck list local:experiment:nntest-softmax-armcl-opencl-*

Resume an interrupted test session

If not all the dataset shapes have been processed during a test session (e.g. due to a user interrupt or the platform going offline), the session can be resumed later by running the same command with the --resume option, e.g.:

$ ck run nntest:conv-armcl-opencl --iterations=1 --repetitions=1 --timestamp=odroid-conv-0001 --resume

NB It's essential to pass exactly the same --timestamp flag to correctly identify a test session to be resumed.

Output validation

When a test is invoked with a particular dataset for the first time, CK saves its output as reference (e.g. a vector of floating-point values). In subsequent invocations of this test with the same dataset, CK validates its output against the reference.

To skip output validation, use e.g.:

$ ck run program:softmax-armcl-opencl --skip_output_validation

To replace the reference output, use e.g.:

$ ck run program:softmax-armcl-opencl --overwrite_reference_output

Output validation is performed within a certain threshold specified via the CK_ABS_DIFF_THRESHOLD key in the runs_vars dictionary in the program metadata. That is, any differences smaller than the threshold are ignored.

To override the threshold value at run-time, use e.g.:

$ ck run program:softmax-armcl-opencl --env.CK_ABS_DIFF_THRESHOLD=0.01

Output naming convention

Each reference output gets a unique name e.g. default-3cda82464112173d-1000-1-1-2-42. Here:

default is the command key of the given test program;
3cda82464112173d is the unique id of the dataset (ck-nntest:dataset:tensor-0001);
1000-1-1 are the dash-separated values of the keys in the dataset file (shape-1000-1-1.json) listed in the alphabetical order (i.e. CK_IN_SHAPE_C, CK_IN_SHAPE_H, CK_IN_SHAPE_W);
2-42 are the dash-separated values of selected keys in the run_vars dictionary in the program metadata file listed in the alphabetical order (i.e. CK_IN_SHAPE_N, CK_SEED).

Visualise test results

To visualize test results in a web browser, run:

$ ck dashboard nntest

and select "Raw results".

It is possible to run this dashboard on a different host and port:

$ ck dashboard nntest --host=192.168.99.1 --port=3355

It is also possible to specify external host and port useful for Docker instances:

$ ck dashboard nntest --wfe_host=192.168.99.1 --wfe_port=3355

Replay test results

You will be able to replay individual tests (to validate performance or fix bugs).

The simplest way is to select a given experiment from the above nntest dashboard, and then click on a button "Copy to clipboard" in the Reproduce field.

You can then paste and run a command in your shell. It will look similar to

$ ck replay experiment:186380dfcd98cd7a --point=4e9e9476bab09b2c

Alternatively, you can see all available raw nntest experiments on your machine as follows:

$ ck search experiment --tags=nntest

Test outputs of all tensor shapes and batch sizes:

$ ck run nntest:*softmax* --iterations=4 --repetitions=1 --pause_if_fail

Run on other platforms

You can run some of the test directly on Android devices connected to your host machine via ADB as follows (you need to have Android NDK and SDK installed):

$ ck compile program:softmax-armcl-opencl --speed --target_os=android23-arm64
$ ck run program:asoftmax-armcl-opencl --speed --target_os=android23-arm64

We plan to add support to compile and run ArmCL-based clients on Android too (there are some minor issues at this stage):

$ ck install package --tags=armcl,vopencl,vavgpool --env.USE_EMBEDDED_KERNELS=ON --target_os=android23-arm64
$ ck compile program:avgpool-armcl-opencl --speed --target_os=android23-arm64
$ ck run program:avgpool-armcl-opencl --speed --target_os=android23-arm64

Notes

Extra environment variables for development/debugging:

--env.CK_ADD_RAW_NNTEST_OUTPUT=yes - add vector output to the CK pipeline
--env.CK_ADD_RAW_DVDT_PROF=yes - add raw dvdt_prof profile to the CK pipeline
--env.CK_ADD_RAW_MALI_HWC=yes - add Mali hardware performance counters to the CK pipeline

To record the hostname to the meta of all experimental entries:

$ ck set kernel var.record_nntest_hostname=yes

To turn off recording the hostname:

$ ck set kernel var.record_nntest_hostname=no

$ ck set kernel var.record_nntest_hostname=

Native validation of Arm OpenCL kernels

The Arm Compute Library includes validation suite which tests all internal Arm routines. It can be compiled for any ArmCL package as follows:

$ ck compile compile program:validation-armcl-opencl

It is possible to customize this build via --env.KEY=val. For example, you can add CXX flags as follows:

$ ck compile program:validation-armcl-opencl --env.EXTRA_CXX_FLAGS="-DDVDT_DEBUG"

You can now run validation as follows (select the run command):

$ ck run program:validation-armcl-opencl

You can also filter tests such as for softmax as follows:

$ ck run program:validation-armcl-opencl --env.FILTER=CL/.*Softmax

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

ctuning / ck-nntest

Programming Languages