All Projects → larq → Compute Engine

larq / Compute Engine

Licence: apache-2.0
Highly optimized inference engine for Binarized Neural Networks

Projects that are alternatives of or similar to Compute Engine

Computelibrary
The Compute Library is a set of computer vision and machine learning functions optimised for both Arm CPUs and GPUs using SIMD technologies.
Stars: ✭ 2,123 (+1438.41%)
Mutual labels:  simd, aarch64, armv7, armv8
Arm Assembly Cheat
MOVED TO: https://github.com/cirosantilli/linux-kernel-module-cheat#userland-assembly SEE README. ARMv7 and ARMv8 assembly userland minimal examples tutorial. Runnable asserts on x86 hosts with QEMU user mode or natively on ARM targets. Nice GDB step debug setup. Tested on Ubuntu 18.04 host and Raspberry Pi 2 and 3 targets.
Stars: ✭ 159 (+15.22%)
Mutual labels:  raspberry-pi, armv7, armv8
Ubuntu64 Rpi
适用于树莓派3b/3b+的64位系统.
Stars: ✭ 652 (+372.46%)
Mutual labels:  raspberry-pi, aarch64, armv8
lsp-dsp-lib
DSP library for signal processing
Stars: ✭ 37 (-73.19%)
Mutual labels:  simd, armv7, aarch64
Archstrike
An Arch Linux repository for security professionals and enthusiasts. Done the Arch Way and optimized for i686, x86_64, ARMv6, ARMv7 and ARMv8.
Stars: ✭ 401 (+190.58%)
Mutual labels:  raspberry-pi, armv7, armv8
Pieman
Script for creating custom OS images for single-board computers
Stars: ✭ 149 (+7.97%)
Mutual labels:  raspberry-pi, armv7, armv8
Rappel
A linux-based assembly REPL for x86, amd64, armv7, and armv8
Stars: ✭ 818 (+492.75%)
Mutual labels:  aarch64, armv7, armv8
Unisimd Assembler
SIMD macro assembler unified for ARM, MIPS, PPC and x86
Stars: ✭ 63 (-54.35%)
Mutual labels:  simd, aarch64, armv7
simonpi
A quick & dirty script to emulate Raspberry PI family devices on your laptop.
Stars: ✭ 61 (-55.8%)
Mutual labels:  armv7, aarch64, armv8
tensorflow-serving-arm
TensorFlow Serving ARM - A project for cross-compiling TensorFlow Serving targeting popular ARM cores
Stars: ✭ 75 (-45.65%)
Mutual labels:  armv7, aarch64, armv8
Swift On Balena
Docker images for Swift on Raspberry Pi and other ARM devices from balena's base images.
Stars: ✭ 153 (+10.87%)
Mutual labels:  aarch64, armv7, armv8
Sse2neon
A translator from Intel SSE intrinsics to Arm/Aarch64 NEON implementation
Stars: ✭ 316 (+128.99%)
Mutual labels:  simd, aarch64, armv8
Debian Pi Aarch64
This is the first 64-bit system in the world to support all Raspberry Pi 64-bit hardware!!! (Include: PI400,4B,3B+,3B,3A+,Zero2W)
Stars: ✭ 2,505 (+1715.22%)
Mutual labels:  raspberry-pi, aarch64, armv8
alpine-qbittorrent-openvpn
qBittorrent docker container with OpenVPN client running as unprivileged user on alpine linux
Stars: ✭ 230 (+66.67%)
Mutual labels:  armv7, aarch64, armv8
TensorFlow Lite SSD RPi 64-bits
TensorFlow Lite SSD on bare Raspberry Pi 4 with 64-bit OS at 24 FPS
Stars: ✭ 25 (-81.88%)
Mutual labels:  armv7, aarch64, armv8
Rust Raspberrypi Os Tutorials
📚 Learn to write an embedded OS in Rust 🦀
Stars: ✭ 7,275 (+5171.74%)
Mutual labels:  raspberry-pi, aarch64, armv8
Tensorflow Bin
Prebuilt binary with Tensorflow Lite enabled (native build). For RaspberryPi / Jetson Nano. And, solved Tensorflow issues #15062,#21574,#21855,#23082,#25120,#25748,#29617,#29704,#30359. Support for custom operations in MediaPipe.
Stars: ✭ 349 (+152.9%)
Mutual labels:  aarch64, armv8
Sleef
SIMD Library for Evaluating Elementary Functions, vectorized libm and DFT
Stars: ✭ 353 (+155.8%)
Mutual labels:  simd, aarch64
Opensmalltalk Vm
Cross-platform virtual machine for Squeak, Pharo, Cuis, and Newspeak.
Stars: ✭ 345 (+150%)
Mutual labels:  raspberry-pi, armv7
Raspberry Pi Pcie Devices
Raspberry Pi PCI Express device compatibility database
Stars: ✭ 444 (+221.74%)
Mutual labels:  raspberry-pi, aarch64

Larq Compute Engine larq logo

Tests PyPI - Python Version PyPI PyPI - License

Larq Compute Engine (LCE) is a highly optimized inference engine for deploying extremely quantized neural networks, such as Binarized Neural Networks (BNNs). It currently supports various mobile platforms and has been benchmarked on a Pixel 1 phone and a Raspberry Pi. LCE provides a collection of hand-optimized TensorFlow Lite custom operators for supported instruction sets, developed in inline assembly or in C++ using compiler intrinsics. LCE leverages optimization techniques such as tiling to maximize the number of cache hits, vectorization to maximize the computational throughput, and multi-threading parallelization to take advantage of multi-core modern desktop and mobile CPUs.

Larq Compute Engine is part of a family of libraries for BNN development; you can also check out Larq for building and training BNNs and Larq Zoo for pre-trained models.

Key Features

  • Effortless end-to-end integration from training to deployment:

    • Tight integration of LCE with Larq and TensorFlow provides a smooth end-to-end training and deployment experience.

    • A collection of Larq pre-trained BNN models for common machine learning tasks is available in Larq Zoo and can be used out-of-the-box with LCE.

    • LCE provides a custom MLIR-based model converter which is fully compatible with TensorFlow Lite and performs additional network level optimizations for Larq models.

  • Lightning fast deployment on a variety of mobile platforms:

    • LCE enables high performance, on-device machine learning inference by providing hand-optimized kernels and network level optimizations for BNN models.

    • LCE currently supports 64-bit ARM-based mobile platforms such as Android phones and Raspberry Pi boards.

    • Thread parallelism support in LCE is essential for modern mobile devices with multi-core CPUs.

Performance

The table below presents single-threaded performance of Larq Compute Engine on different versions of a novel BNN model called QuickNet (trained on ImageNet dataset, released on Larq Zoo) on a Pixel 1 phone (2016) and a Raspberry Pi 4 Model B (BCM2711) board:

Model Top-1 Accuracy RPi 4 B, ms (1 thread) Pixel 1, ms (1 thread)
QuickNet (.h5) 58.6 % 31.4 16.8
QuickNet-Large (.h5) 62.7 % 48.7 25.5
QuickNet-XL (.h5) 67.0 % 82.9 44.2

For reference, dabnn (the other main BNN library) reports an inference time of 61.3 ms for Bi-RealNet (56.4% accuracy) on the Pixel 1 phone, while LCE achieves an inference time of 41.6 ms for Bi-RealNet on the same device. They furthermore present a modified version, BiRealNet-Stem, which achieves the same accuracy of 56.4% in 43.2 ms.

The following table presents multi-threaded performance of Larq Compute Engine on a Pixel 1 phone and a Raspberry Pi 4 Model B (BCM2711) board:

Model Top-1 Accuracy RPi 4 B, ms (4 threads) Pixel 1, ms (4 threads)
QuickNet (.h5) 58.6 % 16.1 8.9
QuickNet-Large (.h5) 62.7 % 24.7 12.6
QuickNet-XL (.h5) 67.0 % 37.9 22.8

Benchmarked on August 21st, 2020 with LCE custom TFLite Model Benchmark Tool (see here) and BNN models with randomized inputs.

Getting started

Follow these steps to deploy a BNN with LCE:

  1. Pick a Larq model

    You can use Larq to build and train your own model or pick a pre-trained model from Larq Zoo.

  2. Convert the Larq model

    LCE is built on top of TensorFlow Lite and uses the TensorFlow Lite FlatBuffer format to convert and serialize Larq models for inference. We provide an LCE Converter with additional optimization passes to increase the speed of execution of Larq models on supported target platforms.

  3. Build LCE

    The LCE documentation provides the build instructions for Android and 64-bit ARM-based boards such as Raspberry Pi. Please follow the provided instructions to create a native LCE build or cross-compile for one of the supported targets.

  4. Run inference

    LCE uses the TensorFlow Lite Interpreter to perform an inference. In addition to the already available built-in TensorFlow Lite operators, optimized LCE operators are registered to the interpreter to execute the Larq specific subgraphs of the model. An example to create and build an LCE compatible TensorFlow Lite interpreter for your own applications is provided here.

Next steps

About

Larq Compute Engine is being developed by a team of deep learning researchers and engineers at Plumerai to help accelerate both our own research and the general adoption of Binarized Neural Networks.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].