All Projects → jizhuoran → caffe-android-opencl-fp16

jizhuoran / caffe-android-opencl-fp16

Licence: other
Optimised Caffe with OpenCL supporting for less powerful devices such as mobile phones

Programming Languages

C++
36643 projects - #6 most used programming language
objective c
16641 projects - #2 most used programming language
CMake
9771 projects

Projects that are alternatives of or similar to caffe-android-opencl-fp16

Ck Caffe
Collective Knowledge workflow for Caffe to automate installation across diverse platforms and to collaboratively evaluate and optimize Caffe-based workloads across diverse hardware, software and data sets (compilers, libraries, tools, models, inputs):
Stars: ✭ 192 (+1029.41%)
Mutual labels:  caffe, opencl
fpga caffe
No description or website provided.
Stars: ✭ 116 (+582.35%)
Mutual labels:  caffe, opencl
chop
Round matrix elements to lower precision in MATLAB
Stars: ✭ 21 (+23.53%)
Mutual labels:  half-precision, fp16
faster-rcnn-pedestrian-detection
Faster R-CNN for pedestrian detection
Stars: ✭ 31 (+82.35%)
Mutual labels:  caffe
all-classifiers-2019
A collection of computer vision projects for Acute Lymphoblastic Leukemia classification/early detection.
Stars: ✭ 24 (+41.18%)
Mutual labels:  caffe
SimNDT
Ultrasonic NDT Simulator with engine core based on the Elastodynamic Finite Integration Technique (EFIT)
Stars: ✭ 34 (+100%)
Mutual labels:  opencl
caffe-cifar-10-and-cifar-100-datasets-preprocessed-to-HDF5
Both deep learning datasets can be imported in python directly with h5py (HDF5 format). The datasets can be directly imported or converted with a python script.
Stars: ✭ 14 (-17.65%)
Mutual labels:  caffe
oclcl
S-expression to OpenCL C
Stars: ✭ 42 (+147.06%)
Mutual labels:  opencl
nengo-ocl
OpenCL-based simulator for Nengo neural models
Stars: ✭ 22 (+29.41%)
Mutual labels:  opencl
Similarity-Adaptive-Deep-Hashing
Unsupervised Deep Hashing with Similarity-Adaptive and Discrete Optimization (TPAMI2018)
Stars: ✭ 18 (+5.88%)
Mutual labels:  caffe
MobilenetSSD caffe
How to train and verify mobilenet by using voc pascal data in caffe ssd?
Stars: ✭ 25 (+47.06%)
Mutual labels:  caffe
ufo-core
GLib-based framework for GPU-based data processing
Stars: ✭ 20 (+17.65%)
Mutual labels:  opencl
uai-sdk
UCloud AI SDK
Stars: ✭ 34 (+100%)
Mutual labels:  caffe
Face-Attributes-MultiTask-Classification
Use Cafffe to do Face Attributes MultiTask Classification based on CelebA data sets
Stars: ✭ 32 (+88.24%)
Mutual labels:  caffe
hipacc
A domain-specific language and compiler for image processing
Stars: ✭ 72 (+323.53%)
Mutual labels:  opencl
OpenCL
The content of the OpenCL.org website
Stars: ✭ 18 (+5.88%)
Mutual labels:  opencl
Caffe Rotate Pool
Rotate RoI Align and Rotate Position Sensitive RoI Align Operation in Caffe
Stars: ✭ 16 (-5.88%)
Mutual labels:  caffe
CAM-Python
Class Activation Mapping with Caffe using the Python wrapper pycaffe instead of matlab.
Stars: ✭ 66 (+288.24%)
Mutual labels:  caffe
kernelized correlation filters gpu
Real-time visual object tracking using correlations filters and deep learning
Stars: ✭ 27 (+58.82%)
Mutual labels:  caffe
autodial
AutoDIAL Caffe Implementation
Stars: ✭ 28 (+64.71%)
Mutual labels:  caffe

Caffe on Mobile Devices (V2 is moved to https://github.com/jizhuoran/HyperTea, focusing on extreme light-weight)

Optimized (for memory usage, speed and enegry efficiency) Caffe with OpenCL supporting for less powerful devices such as mobile phone (NO_BACKWARD, NO_BOOST, NO_HDF5, NO_LEVELDB).

I am developing this project. You can watch this project if you want to get the latest news

Features

  • double data type is removied, scala data store in float, others store in Dtype (half or float)
  • OpenCL supporting (mobile GPU) (Partially finished)
  • FP16 Inference support
    • BatchNorm shifting to avoid overflow and underflow
    • All Layer support
    • FP16 caffemodel load and save
    • model convertor (From FP32 to FP16)
  • As few dependencies as possible (Protobuf, OpenBLAS, CLBlast)
  • Optimized memory usage
  • Forward Only (I just noticed that in the original implementation, forward only also do unnecessary copy)
  • Zero Copy (Shared memory between Host and GPU)
  • Backward (I change my mind, Pure Forward Only library will be kept)

Peak Memory Usage Reduction

Testing on going, I am waitting for a device with large enough memory to get the peak memory usage with the memory usage optimization.

Layers with OpenCL:

  • Convolution Layer (libdnn)
  • Deconvolution Layer (libdnn)
  • Batch Norm Layer (with shift)
  • Others

On-going

  1. Modify the test cases to support half testing
  2. Check unnecessary data copy in Forward Only mode
  3. Tune for android devices
  4. Change the structure of the project (move test out of the src)
  5. Refactor: OpenCL kernls launch method, redundant code in math_fuctions_cl.cpp
  6. Doc

For Android

The project is test on:.

  • Snapdragon 820 development board
  • HUAWEI P9
  • Hikey 970

Build libcaffe.so

$ modify the NDK_HOME path in ./tools/build_android.sh to your NDK_HOME
$ modify the DEVICE_OPENCL_DIR path in ./tools/build_android.sh to the directory contains include/CL/cl.h and lib64/libOpencl.so
$ ./tools/build_android.sh
$ (You may want to choose your own make -j)

Build Android App with Android Studio

Make a directory in your devices.

$ adb shell
$ cd /sdcard/caffe

Similar as Caffe, you need the proto-file and weights. Follow the below instructions to push the needed file to your devices

$ adb push $CAFFE/examples/style_transfer/style.protobin
$ adb push $CAFFE/examples/style_transfer/a1.caffemodel
$ adb push $CAFFE/examples/style_transfer/HKU.jpg

Load the Android studio project inside the $CAFFE_MOBILE/examples/android/android-caffe/ folder, and run it on your connected device.

For Ubuntu

Test Environment

CPU: Intel(R) Xeon(R) CPU E5-2630 v4
GPU NVIDIA 2080 OS: ubuntu 16.04
OpenCL Version: 1.2
C++ Version: 5.4.0

For a art style transfer neural network, reduce the single inference time from 7.9s to 2.0s (E5 to NVIDIA 2080).

Step 1: Install dependency

$ sudo apt install libprotobuf-dev protobuf-compiler libatlas-dev # Ubuntu

Step 2: Build Caffe-Mobile Lib with cmake

$ git clone https://github.com/jizhuoran/caffe-android-opencl.git
$ mkdir build
$ cd ../build
$ cmake ..
$ make -j 40

Thanks

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].