All Projects → fujitsu → dnnl_aarch64

fujitsu / dnnl_aarch64

Licence: Apache-2.0 license
No description or website provided.

Programming Languages

C++
36643 projects - #6 most used programming language
CMake
9771 projects
c
50402 projects - #5 most used programming language
shell
77523 projects
python
139335 projects - #7 most used programming language
Batchfile
5799 projects

Projects that are alternatives of or similar to dnnl aarch64

Computelibrary
The Compute Library is a set of computer vision and machine learning functions optimised for both Arm CPUs and GPUs using SIMD technologies.
Stars: ✭ 2,123 (+4725%)
Mutual labels:  aarch64, armv8, sve
Raspberrypipkg
DEPRECATED - DO NOT USE | Go here instead ->
Stars: ✭ 758 (+1622.73%)
Mutual labels:  aarch64, armv8
Ubuntu64 Rpi
适用于树莓派3b/3b+的64位系统.
Stars: ✭ 652 (+1381.82%)
Mutual labels:  aarch64, armv8
tensorflow-aarch64
Compiled tensorflow for aarch64 architecture
Stars: ✭ 20 (-54.55%)
Mutual labels:  aarch64, armv8
Rust Raspberrypi Os Tutorials
📚 Learn to write an embedded OS in Rust 🦀
Stars: ✭ 7,275 (+16434.09%)
Mutual labels:  aarch64, armv8
Tensorflow Bin
Prebuilt binary with Tensorflow Lite enabled (native build). For RaspberryPi / Jetson Nano. And, solved Tensorflow issues #15062,#21574,#21855,#23082,#25120,#25748,#29617,#29704,#30359. Support for custom operations in MediaPipe.
Stars: ✭ 349 (+693.18%)
Mutual labels:  aarch64, armv8
Compute Engine
Highly optimized inference engine for Binarized Neural Networks
Stars: ✭ 138 (+213.64%)
Mutual labels:  aarch64, armv8
pytorch-aarch64
PyTorch wheels (whl) & conda for aarch64 / ARMv8 / ARM64
Stars: ✭ 137 (+211.36%)
Mutual labels:  aarch64, armv8
Debian Pi Aarch64
This is the first 64-bit system in the world to support all Raspberry Pi 64-bit hardware!!! (Include: PI400,4B,3B+,3B,3A+,Zero2W)
Stars: ✭ 2,505 (+5593.18%)
Mutual labels:  aarch64, armv8
tensorflow-serving-arm
TensorFlow Serving ARM - A project for cross-compiling TensorFlow Serving targeting popular ARM cores
Stars: ✭ 75 (+70.45%)
Mutual labels:  aarch64, armv8
Sse2neon
A translator from Intel SSE intrinsics to Arm/Aarch64 NEON implementation
Stars: ✭ 316 (+618.18%)
Mutual labels:  aarch64, armv8
Face-Recognition-Raspberry-Pi-64-bits
Recognize 2000+ faces on your Raspberry Pi 4 with database auto-fill and anti-spoofing
Stars: ✭ 48 (+9.09%)
Mutual labels:  aarch64, armv8
Lumia950xlpkg
Non-EOL (yes!) AArch64 UEFI firmware for Lumia 950 / Lumia 950 XL
Stars: ✭ 300 (+581.82%)
Mutual labels:  aarch64, armv8
alpine-qbittorrent-openvpn
qBittorrent docker container with OpenVPN client running as unprivileged user on alpine linux
Stars: ✭ 230 (+422.73%)
Mutual labels:  aarch64, armv8
TensorFlow-Raspberry-Pi 64-bit
TensorFlow installation wheels for Raspberry Pi 64 OS
Stars: ✭ 27 (-38.64%)
Mutual labels:  aarch64, armv8
Rappel
A linux-based assembly REPL for x86, amd64, armv7, and armv8
Stars: ✭ 818 (+1759.09%)
Mutual labels:  aarch64, armv8
simonpi
A quick & dirty script to emulate Raspberry PI family devices on your laptop.
Stars: ✭ 61 (+38.64%)
Mutual labels:  aarch64, armv8
TensorFlow Lite SSD RPi 64-bits
TensorFlow Lite SSD on bare Raspberry Pi 4 with 64-bit OS at 24 FPS
Stars: ✭ 25 (-43.18%)
Mutual labels:  aarch64, armv8
Swift On Balena
Docker images for Swift on Raspberry Pi and other ARM devices from balena's base images.
Stars: ✭ 153 (+247.73%)
Mutual labels:  aarch64, armv8
Nintendoswitchpkg
WIP UEFI EDK2 Implementation for Nintendo Switch or generic Tegra210 platforms
Stars: ✭ 196 (+345.45%)
Mutual labels:  aarch64, armv8

Deep Neural Network Library for AArch64 (DNNL_aarch64)

  • An open-source performance library for deep learning applications running on ARM(R)v8-A architecture CPUs
  • Optimized to ARMv8-A architecture with the Scalable Vector Extension (SVE)
  • The key components are Xbyak, Xbyak_aarch64, and Xbyak_Translator
    • Xbyak : A JIT-assembler for x86 and x64 architectures developed by Shigeo MITSUNARI (Cybozu Labs Inc.)
    • Xbyak_aarch64 : A JIT-assembler for ARMv8-A architecture of Xbayk
    • Xbyak_Translator : A translator which generates JIT functions for ARMv8 with SVE from JIT functions for x86
  • Developed based on version 0.21.2 of Deep Neural Network Library (DNNL) by Intel(R)

Development status

DNNL_aarch64 generates two types of JIT functions for FP32 operations using Xbyak, Xbyak_aarch64, and Xbyak_Translator on ARMv8 with SVE processors

  • One is to generate JIT functions for AArch64 directly using Xbyak_aarch64, which is called Direct method. The following operations are generated by the method.
    • Convolution
    • Reorder
  • The other is a JIT-translation from JIT functions for x64 to JIT functions for AArch64 using Xbyak, Xbyak_aarch64, and Xbyak_Translator, which is called Indirect method. The following operations are generated by the method.
    • Batch normalization
    • Eltwise
    • Pooling
    • Concat
    • Softmax
    • Sum
    • RNN operations

Reference implementations by C++ run other than those above operations and unsupported parameter sets. They output correct result, but run somewhat slow.

Bfloat16 support : Currently, DNNL_aarch64 does not support

Validated Configurations

CPU Fujitsu FX1000 / 700
OS RedHad 8.1 / Centos 8.1
Compiler Fujitsu compiler / GCC 8.3.1 20190507

Requirements

Currently, DNNL_aarch64 is intended to run on CPUs of ARMv8-A with SVE. If you run DNNL_aarch64 on CPUs without SVE, it will be aborted because of undefined instruction exception.

Installation

  1. Download DNNL_aarch64 from the repository.
git clone https://github.com/fujitsu/dnnl_aarch64.git
  1. Update submodule
cd dnnl_aarch64/
git submodule update --init --recursive
  1. Build xed library
mkdir third_party/build_xed_aarch64
pushd third_party/build_xed_aarch64/
../xbyak_translator_aarch64/translator/third_party/xed/mfile.py --shared examples install
cd kits/
ln -sf xed-install-base-* xed
popd
  1. Build DNNN_aarch64
mkdir build_aarch64
cd build_aarch64/
cmake ..
make -j40
  • Using BLAS (Optional)

    1. Set the path to the BLAS library on your environment into LD_LIBRARY_PATH
    2. Add the following options to cmake command
    BLAS Option
    SSL2 -DWITH_BLAS=ssl2 (only with FUJITSU compiler)
    openblas -DWITH_BLAS=openblas
  1. Test DNNL_aarch64 (optional)
cd tests/gtests
MKLDNN_VERBOSE=1 MKLDNN_JIT_DUMP=1 ./test_reorder

License

Copyright FUJITSU LIMITED 2019-2020

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Notice

  • Arm is a registered trademark of Arm Limited (or its subsidiaries) in the US and/or elsewhere.
  • Intel is a registered trademark of Intel Corporation (or its subsidiaries) in the US and/or elsewhere.

History

Date Version Remarks
December 11, 2019 0.9.0_base_0.19 First public release version.
May 31, 2020 1.0.0_base_0.21.2 Update

Copyright

Copyright FUJITSU LIMITED 2019-2020

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].