All Projects → google → CFU-Playground

google / CFU-Playground

Licence: Apache-2.0 license
Want a faster ML processor? Do it yourself! -- A framework for playing with custom opcodes to accelerate TensorFlow Lite for Microcontrollers (TFLM). . . . . . Online tutorial: https://google.github.io/CFU-Playground/ For reference docs, see the link below.

Programming Languages

Verilog
626 projects
C++
36643 projects - #6 most used programming language
python
139335 projects - #7 most used programming language
c
50402 projects - #5 most used programming language
Makefile
30231 projects
Jupyter Notebook
11667 projects

Projects that are alternatives of or similar to CFU-Playground

mtomo
Multiple types of NN model optimization environments. It is possible to directly access the host PC GUI and the camera to verify the operation. Intel iHD GPU (iGPU) support. NVIDIA GPU (dGPU) support.
Stars: ✭ 24 (-93.35%)
Mutual labels:  tflite
FAT-fast-adjustable-threshold
This is th code to FAT method with links to quantized tflite models. (CC BY-NC-ND)
Stars: ✭ 20 (-94.46%)
Mutual labels:  tflite
Yolov5
YOLOv5 🚀 in PyTorch > ONNX > CoreML > TFLite
Stars: ✭ 19,914 (+5416.34%)
Mutual labels:  tflite
glDelegateBenchmark
quick and dirty benchmark for TFLite gles delegate on iOS
Stars: ✭ 13 (-96.4%)
Mutual labels:  tflite
LIGHT-SERNET
Light-SERNet: A lightweight fully convolutional neural network for speech emotion recognition
Stars: ✭ 20 (-94.46%)
Mutual labels:  tflite
Tensorflow-lite-kotlin-samples
📌This repo contains the kotlin implementation of TensorflowLite Example Android Apps🚀
Stars: ✭ 17 (-95.29%)
Mutual labels:  tflite
android tflite
GPU Accelerated TensorFlow Lite applications on Android NDK. Higher accuracy face detection, Age and gender estimation, Human pose estimation, Artistic style transfer
Stars: ✭ 105 (-70.91%)
Mutual labels:  tflite
TF2DeepFloorplan
TF2 Deep FloorPlan Recognition using a Multi-task Network with Room-boundary-Guided Attention. Enable tensorboard, quantization, flask, tflite, docker, github actions and google colab.
Stars: ✭ 98 (-72.85%)
Mutual labels:  tflite
TFLite-Mobile-Generic-Object-Localizer
Python TFLite scripts for detecting objects of any class in an image without knowing their label.
Stars: ✭ 42 (-88.37%)
Mutual labels:  tflite
Yolov3
YOLOv3 in PyTorch > ONNX > CoreML > TFLite
Stars: ✭ 8,159 (+2160.11%)
Mutual labels:  tflite
YOLOv5-Lite
🍅🍅🍅YOLOv5-Lite: lighter, faster and easier to deploy. Evolved from yolov5 and the size of model is only 930+kb (int8) and 1.7M (fp16). It can reach 10+ FPS on the Raspberry Pi 4B when the input size is 320×320~
Stars: ✭ 1,230 (+240.72%)
Mutual labels:  tflite
Selfie2Anime-with-TFLite
How to create Selfie2Anime from tflite model to Android.
Stars: ✭ 70 (-80.61%)
Mutual labels:  tflite
TFLite-Object-Detection-with-TFLite-Model-Maker
Custom object detection with the TFLite Model Maker
Stars: ✭ 13 (-96.4%)
Mutual labels:  tflite
Mobile Image-Video Enhancement
Sensifai image and video enhancement module on mobiles
Stars: ✭ 39 (-89.2%)
Mutual labels:  tflite
Tensorflow Yolov4 Tflite
YOLOv4, YOLOv4-tiny, YOLOv3, YOLOv3-tiny Implemented in Tensorflow 2.0, Android. Convert YOLO v4 .weights tensorflow, tensorrt and tflite
Stars: ✭ 1,881 (+421.05%)
Mutual labels:  tflite
glDelegateBench
quick and dirty inference time benchmark for TFLite gles delegate
Stars: ✭ 17 (-95.29%)
Mutual labels:  tflite
MobileQA
离线端阅读理解应用 QA for mobile, Android & iPhone
Stars: ✭ 49 (-86.43%)
Mutual labels:  tflite
react-native-camera-tflite
Real time image classification with React Native and Tensorflow lite.
Stars: ✭ 52 (-85.6%)
Mutual labels:  tflite
Tensorflowtts
😝 TensorFlowTTS: Real-Time State-of-the-art Speech Synthesis for Tensorflow 2 (supported including English, French, Korean, Chinese, German and Easy to adapt for other languages)
Stars: ✭ 2,382 (+559.83%)
Mutual labels:  tflite
Aidlearning Framework
🔥🔥AidLearning is a powerful mobile development platform, AidLearning builds a linux env supporting GUI, deep learning and visual IDE on Android...Now Aid supports OpenCL (GPU+NPU) for high performance acceleration...Linux on Android or HarmonyOS
Stars: ✭ 4,537 (+1156.79%)
Mutual labels:  tflite

CFU Playground

Want a faster ML processor? Do it yourself!

This project provides a framework that an engineer, intern, or student can use to design and evaluate enhancements to an FPGA-based “soft” processor, specifically to increase the performance of machine learning (ML) tasks. The goal is to abstract away most infrastructure details so that the user can get up to speed quickly and focus solely on adding new processor instructions, exploiting them in the computation, and measuring the results.

This project enables rapid iteration on processor improvements -- multiple iterations per day.

This is how it works:

  • Choose a TensorFlow Lite model; a quantized person detection model is provided, or bring your own.
  • Execute the inference on the Arty FPGA board to get cycle counts per layer.
  • Choose an TFLite operator to accelerate, and dig into that code.
  • Design new instruction(s) that can replace multiple basic operations.
  • Build a custom function unit (a small amount of hardware) that performs the new instruction(s).
  • Modify the TFLite/Micro library kernel to use the new instruction(s), which are available as intrinsics with function call syntax.
  • Rebuild the FPGA Soc, recompile the TFLM library, and rerun to measure improvement.

The focus here is performance, not demos. The inputs to the ML inference are canned/faked, and the only output is cycle counts. It would be possible to export the improvements made here to an actual demo, but currently no pathway is set up for doing so.

With the exception of Vivado, everything used by this project is open source.

Disclaimer: This is not an officially supported Google project. Support and/or new releases may be limited.

This is an early prototype of a ML exploration framework; expect a lack of documentation and occasional breakage. If you want to collaborate on building out this framework, reach out to [email protected]! See "Contribution guidelines" below.

Required hardware/OS

  • One of the boards supported by LiteX Boards. Most of LiteX Boards targets should work.
    It has been tested on the Arty A7-35T/100T, iCEBreaker, Fomu, OrangeCrab, ULX3S, and Nexys Video boards.
  • The only supported host OS is Linux (Debian / Ubuntu).

You don't need any board if you want to run Renode or Verilator simulation.

Assumed software

  • FPGA Toolchain: that depends on a chosen board. If you already have a toolchain installed for your board, you can use that.

For a board with a Xilinx XC7 part, you can use either Vivado, which must be manually installed (here's our guide), or the open-source SymbiFlow tool chain, which can be easily installed using Conda (see the Setup Guide).

For boards with Lattice iCE40, ECP5, or Nexus FPGAs, you can install the appropriate set of open source tools either via Conda (see the Setup Guide) or on your own by building from source. Or, you can use the Lattice toolchain (Radiant/Diamond).

If you want to try things out using Renode simulation, then you don't need either the board or toolchain. You can also perform Verilog-level cycle-accurate simulation with Verilator, but this is much slower. Renode is installed by the setup script.

Other required packages will be checked for and, if on a Debian-based system, automatically installed by the setup script below.

Setup

Clone this repo, cd into it, then get run:

scripts/setup

Use with board

The default board is Arty. If you want to use different board you must specify target, e.g. TARGET=digilent_nexys_video.

  1. Build the SoC and load the bitstream onto Arty:
cd proj/proj_template
make prog

This builds the SoC with the default CFU from proj/proj_template. Later you'll copy this and modify it to make your own project.

  1. Build a RISC-V program and execute it on the SoC that you just loaded onto the Arty:
make load

Use without board

If you don't have any board supported by LiteX Boards you can use Renode or Verilator to simulate it.

To use Renode to execute on a simulator on the host machine (no Vivado or Arty board required), execute:

make renode

To use Verilator to execute on a cycle-accurate RTL-level simulator (no Vivado or Arty board required), execute:

make PLATFORM=sim load

Most useful make flags

Option Explanation Example Default
PLATFORM Choose which SoC platform you want to build: hps or sim or common_soc make bitstream PLATFORM=hps common_soc
TARGET Choose one of many targets from LiteX Boards repository, common_soc will take BaseSoC from specified target.py make bitstream TARGET=nexys_video_board digilent_arty
USE_VIVADO Use Vivado toolchain make bitstream USE_VIVADO=1 0
USE_SYMBIFLOW Use Symbiflow toolchain make bitstream USE_SYMBIFLOW=1 0
UART_SPEED Choose UART baudrate make bitstream UART_SPEED=115200 3686400
IGNORE_TIMING Ignore timing contraints (only for Vivado) make bitstream USE_VIVADO=1 IGNORE_TIMING=1 0

Underlying open-source technology

  • LiteX: Open-source framework for assembling the SoC (CPU + peripherals)
  • VexRiscv: Open-source RISC-V soft CPU optimized for FPGAs
  • Amaranth: Python toolbox for building digital hardware

Licensed under Apache-2.0 license

See the file LICENSE.

Contribution guidelines

If you want to contribute to CFU Playground, be sure to review the contribution guidelines. This project adheres to Google's code of conduct. By participating, you are expected to uphold this code.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].