All Projects → merrymercy → Tvm Mali

merrymercy / Tvm Mali

Licence: mit
Optimizing Mobile Deep Learning on ARM GPU with TVM

Programming Languages

c
50402 projects - #5 most used programming language

Projects that are alternatives of or similar to Tvm Mali

Computelibrary
The Compute Library is a set of computer vision and machine learning functions optimised for both Arm CPUs and GPUs using SIMD technologies.
Stars: ✭ 2,123 (+1260.9%)
Mutual labels:  arm, opencl
Stm32 graphics display drivers
STM32 LCD drivers (currently: spi(dma), gpio, fsmc(dma), st7735, st7783, ili9325, ili9328, ili9341, ili9486, ili9488, hx8347g)
Stars: ✭ 151 (-3.21%)
Mutual labels:  arm
Msm8994 8992 Nt Arm64 Drivers
Desktop Windows (ARM64) driver collection for MSM8992/8994 SoCs.
Stars: ✭ 132 (-15.38%)
Mutual labels:  arm
Amdovx Core
AMD OpenVX Core -- a sub-module of amdovx-modules:
Stars: ✭ 139 (-10.9%)
Mutual labels:  opencl
Ipfs Rpi
IPFS installer for the Raspberry Pi and other ARM-based devices.
Stars: ✭ 130 (-16.67%)
Mutual labels:  arm
Homebrew Install
homebrew安装使用中科大镜像
Stars: ✭ 143 (-8.33%)
Mutual labels:  arm
Mixbench
A GPU benchmark tool for evaluating GPUs on mixed operational intensity kernels (CUDA, OpenCL, HIP, SYCL)
Stars: ✭ 130 (-16.67%)
Mutual labels:  opencl
Cross
“Zero setup” cross compilation and “cross testing” of Rust crates
Stars: ✭ 2,461 (+1477.56%)
Mutual labels:  arm
Arm Cmake Toolchains
CMake toolchain configurations for ARM
Stars: ✭ 148 (-5.13%)
Mutual labels:  arm
Arm exploitation
Exploitation on ARM-based Systems (Troopers18)
Stars: ✭ 139 (-10.9%)
Mutual labels:  arm
Openfpgaduino
All open source file and project for OpenFPGAduino project
Stars: ✭ 137 (-12.18%)
Mutual labels:  arm
Cargo Flash
a cargo extension for programming microcontrollers
Stars: ✭ 134 (-14.1%)
Mutual labels:  arm
Docker Unms
This image is no longer maintained: https://github.com/oznu/docker-unms/issues/53
Stars: ✭ 145 (-7.05%)
Mutual labels:  arm
Nnvm
No description or website provided.
Stars: ✭ 1,639 (+950.64%)
Mutual labels:  opencl
Compactcnncascade
A binary library for very fast face detection using compact CNNs.
Stars: ✭ 152 (-2.56%)
Mutual labels:  opencl
Embedded Ai.bench
benchmark for embededded-ai deep learning inference engines, such as NCNN / TNN / MNN / TensorFlow Lite etc.
Stars: ✭ 131 (-16.03%)
Mutual labels:  arm
Kelpnet
Pure C# machine learning framework
Stars: ✭ 136 (-12.82%)
Mutual labels:  opencl
Deobf
An arm32 ollvm like deofuscator,aim to remove obfuscation made by ollvm like compiler
Stars: ✭ 138 (-11.54%)
Mutual labels:  arm
Aws Graviton Getting Started
This document is meant to help new users start using the Arm-based AWS Graviton and Graviton2 processors which power the 6th generation of Amazon EC2 instances (C6g[d], M6g[d], R6g[d], T4g, X2gd, C6gn)
Stars: ✭ 153 (-1.92%)
Mutual labels:  arm
Silicon Info
Mac menu bar tool to view the architecture of the running application
Stars: ✭ 153 (-1.92%)
Mutual labels:  arm

Note: The data and scripts here are all stale. Please go to https://github.com/dmlc/tvm/wiki/Benchmark#mobile-gpu For the latest results.








Benchmarking Deep Neural Networks on ARM CPU/GPU

This repo is the supporting material for Optimizing Mobile Deep Learning on ARM GPU with TVM

Inference Speed on ImageNet

Tested on

Firefly-RK3399 4G, CPU: dual-core Cortex-A72 + quad-core Cortex-A53, GPU: Mali-T860MP4
Arm Compute Library: v17.12,  MXNet: v1.0.1,  Openblas: v0.2.18

result

 

Set Test Environment

sudo /etc/init.d/lightdm stop
sudo -i
echo performance > /sys/class/misc/mali0/device/devfreq/ff9a0000.gpu/governor

This can make the environment more stable.

Note: You need more than 2.5GB of memory to run the following test. Otherwise, you must skip the test of vgg16 by replacing --model all with --model resnet18 or --model mobilenet in the commond.

Run Test for TVM/NNVM

In TVM, we use RPC to do test, so you should build TVM runtime and start a RPC server on your device.

python -m tvm.exec.rpc_server --host 0.0.0.0 --port=9090

Then in your host machine, run the test commond

python mali_imagenet_bench.py --target-host TARGET_HOST --host HOST --port PORT --model all

Replace the TARGET_HOST, HOST and PORT with the corresponding values in your environment.

For example, on my Firefly-RK3399, the commond is

python mali_imagenet_bench.py --target-host 'llvm -target=aarch64-linux-gnu -mattr=+neon' --host 10.42.0.96 --port 9090 --model all

Run Test for MXNet + Openblas

This test is executed locally on your device. So you need install the mxnet with openblas on your device first.

python mxnet_test.py --model all

Run Test for Arm Compute Library

Build ACL by cross-compile on host system.

scons Werror=1 neon=1 opencl=1 examples=1 benchmark_tests=1 os=linux arch=arm64-v8a embed_kernels=1 -j$(nproc)

copy acl_test.cc to the root directoy of ACL and build the acl_test by

aarch64-linux-gnu-g++ acl_test.cc build/utils/*.o -O2 -std=c++11\
    -I. -Iinclude -Lbuild -Lbuild/opencl-1.2-stubs/\
     -larm_compute -larm_compute_graph -larm_compute_core -lOpenCL -o acl_test

copy the binary file acl_test to your device and run

./acl_test all
cat result-acl.txt

results are recored in result-acl.txt

Note Some testcases (e.g. resnet) are missing because Arm Compute Library currently (v17.12) does not support skip connection in its graph runtime. Also some testcases are too slow so that be skipped.

Result

Paste the outputs on my board here.

TVM/NNVM

============================================================
model: vgg16, dtype: float32
warm up..
test..
cost per image: 1.2926s
============================================================
model: vgg16, dtype: float16
warm up..
test..
cost per image: 0.6896s
============================================================
model: resnet18, dtype: float32
warm up..
test..
cost per image: 0.2041s
============================================================
model: resnet18, dtype: float16
warm up..
test..
cost per image: 0.1183s
============================================================
model: mobilenet, dtype: float32
warm up..
test..
cost per image: 0.0767s
============================================================
model: mobilenet, dtype: float16
warm up..
test..
cost per image: 0.0479s

MXNet + Openblas

============================================================
model: vgg16, dtype: float32
warm up...
test..
cost per image: 3.0250s
============================================================
model: resnet18, dtype: float32
warm up...
test..
cost per image: 0.3977s
============================================================
model: mobilenet, dtype: float32
warm up...
test..
cost per image: 0.2914s

ACL

backend: cl    model: vgg16      conv_method: gemm     dtype: float32   cost: 1.64456
backend: cl    model: vgg16      conv_method: gemm     dtype: float16   cost: 0.969372
backend: cl    model: vgg16      conv_method: direct   dtype: float32   cost: 3.90031
backend: cl    model: vgg16      conv_method: direct   dtype: float16   cost: 1.61179
backend: cl    model: mobilenet  conv_method: gemm     dtype: float32   cost: 0.170934
backend: cl    model: mobilenet  conv_method: direct   dtype: float32   cost: 0.173883
backend: neon  model: vgg16      conv_method: gemm     dtype: float32   cost: 4.10269
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].