All Projects → WalkerLau → Accelerating Cnn With Fpga

WalkerLau / Accelerating Cnn With Fpga

Licence: other
This project accelerates CNN computation with the help of FPGA, for more than 50x speed-up compared with CPU.

Projects that are alternatives of or similar to Accelerating Cnn With Fpga

Deepface
Deep Learning Models for Face Detection/Recognition/Alignments, implemented in Tensorflow
Stars: ✭ 409 (+35.88%)
Mutual labels:  convolutional-neural-networks, face-recognition
Awslambdaface
Perform deep neural network based face detection and recognition in the cloud (via AWS lambda) with zero model configuration or tuning.
Stars: ✭ 98 (-67.44%)
Mutual labels:  convolutional-neural-networks, face-recognition
Tensorflow 101
TensorFlow 101: Introduction to Deep Learning for Python Within TensorFlow
Stars: ✭ 642 (+113.29%)
Mutual labels:  convolutional-neural-networks, face-recognition
Intelegent lock
lock mechanism with face recognition and liveness detection
Stars: ✭ 134 (-55.48%)
Mutual labels:  convolutional-neural-networks, face-recognition
Facerank
FaceRank - Rank Face by CNN Model based on TensorFlow (add keras version). FaceRank-人脸打分基于 TensorFlow (新增 Keras 版本) 的 CNN 模型(QQ群:167122861)。技术支持:http://tensorflow123.com
Stars: ✭ 841 (+179.4%)
Mutual labels:  convolutional-neural-networks, face-recognition
zed face
zedboard上基于FPGA+ARM的人脸识别智能监控系统。关键词:linux,zedboard,arm,fpga,人脸检测,人脸识别。
Stars: ✭ 38 (-87.38%)
Mutual labels:  fpga, face-recognition
Fastor
A lightweight high performance tensor algebra framework for modern C++
Stars: ✭ 280 (-6.98%)
Mutual labels:  fpga
Pytorch Srgan
A modern PyTorch implementation of SRGAN
Stars: ✭ 289 (-3.99%)
Mutual labels:  convolutional-neural-networks
Insightface V2
PyTorch implementation of Additive Angular Margin Loss for Deep Face Recognition.
Stars: ✭ 282 (-6.31%)
Mutual labels:  face-recognition
Icezum
🌟 IceZUM Alhambra: an Arduino-like Open FPGA electronic board
Stars: ✭ 280 (-6.98%)
Mutual labels:  fpga
S6 pcie microblaze
PCI Express DIY hacking toolkit for Xilinx SP605
Stars: ✭ 301 (+0%)
Mutual labels:  fpga
Verilog
Repository for basic (and not so basic) Verilog blocks with high re-use potential
Stars: ✭ 296 (-1.66%)
Mutual labels:  fpga
Docface
Face recognition system for ID photos
Stars: ✭ 288 (-4.32%)
Mutual labels:  face-recognition
Braindecode
Outdated, see new https://github.com/braindecode/braindecode
Stars: ✭ 284 (-5.65%)
Mutual labels:  convolutional-neural-networks
Cherry Autonomous Racecar
Implementation of the CNN from End to End Learning for Self-Driving Cars on a Nvidia Jetson TX1 using Tensorflow and ROS
Stars: ✭ 294 (-2.33%)
Mutual labels:  convolutional-neural-networks
Openpiton
The OpenPiton Platform
Stars: ✭ 282 (-6.31%)
Mutual labels:  fpga
Beagle sdr gps
KiwiSDR: BeagleBone web-accessible shortwave receiver and software-defined GPS
Stars: ✭ 300 (-0.33%)
Mutual labels:  fpga
Meglass
An eyeglass face dataset collected and cleaned for face recognition evaluation, CCBR 2018.
Stars: ✭ 281 (-6.64%)
Mutual labels:  face-recognition
Finn
Dataflow compiler for QNN inference on FPGAs
Stars: ✭ 284 (-5.65%)
Mutual labels:  fpga
Hal
HAL – The Hardware Analyzer
Stars: ✭ 298 (-1%)
Mutual labels:  fpga

<Super Detailed Tutorial> Accelerate CNN computation with FPGA

查看中文版教程请戳 这里 ,更详细哦!
查看中文版教程请戳 这里 ,更详细哦!
查看中文版教程请戳 这里 ,更详细哦!

This is an original tutorial, please indicate the source when it is reprinted : https://github.com/WalkerLau/Accelerating-CNN-with-FPGA

For a GPU version of this project, please refer to: https://github.com/WalkerLau/GPU-CNN

The purpose of this project is to accelerate the processing speed of convolution neural network with the help of FPGA, which shows great advantage on parallel computation. It's also my bachelor graduation project, and I am glad to show you how my work is done step by step.

Final Performance

Let's first check out how fast our FPGA accelerator can achieve. The acceleration system only accelerates convolution layers. The screenshots below indicate the processing clock cycles on two cases where we implement our FPGA accelerator on convolution layers or not. VIPLFaceNet, a face recognition algorithm with 7 convolution layers, is adopted as an evaluation application for this project. Compared with using only a quad-core ARM Cortex A53 CPU, this CPU+FPGA acceleration system works 45x-75x faster on VIPLFaceNet.

Description & Features

VIPLFaceNet, as mentioned above, is part of SeetaFaceEngine which is an open source face recognition engine developed by Visual Information Processing and Learning (VIPL) group, Institute of Computing Technology, Chinese Academy of Sciences.

This project is developed on Xilinx SDSOC. It is a very efficient embedded development tool for individual developers or small teams. With the help of SDSOC, you can program your FPGA hardware even with little knowledge about HDL.

Below are some features of the acceleration system:

  • Easy Transplantation. SDSOC automatically translates C/C++ to HDL and then creates FPGA bitstream. So, you can easily migrate this system to other CNN algorithms, especially to those written in C/C++, by just adjusting the accelerator structure ( such as ifmap size, stride, filter size, etc) which can be seen inside my source code so that it can fit in different convolution layers.

  • Good Performance. This acceleration system includes a bunch of optimization strategies as listed below.

    • ifmap volume reuse architecture

    • convert data to lower precision

    • 16-channel parallel processing unit & adder tree

    • pipline

    • on-chip BRAM partition & BRAM's cross-layer sharing

    • multi-layer acceleration strategy

What should I prepare before getting started?

  • Hardware

    • Xilinx Ultrascale+ MPSOC ZCU102 ( also works on ZCU104 or other Xilinx devices, depending on your performance needs )
  • Software

    • Ubuntu 16.04 ( For installing and running SDSOC. The acceleration system requires an embedded Linux OS, which means all development work should be done on a Linux host machine and under Linux environment )

    • Xilinx SDSOC 2018.2 ( click -> SDSOC installation and configuration tutorial (Chinese). It's important to properly install and configure SDSOC before further operations, so it's strongly recommended to glance over Xilinx's official document UG1294 )

    • Xilinx reVISION platform ( The main reason for installing reVISION platform is to use xfopencv library, as SeetaFace uses OpenCV to load and preprocess images. For more information about reVISION platform and xfopencv configuration, please check reVISION-Getting-Started-Guide and xfopencv tutorial )

    • [ optional, but recommended ] CodeBlocks ( for off-board debugging )

    • [ optional, but recommended ] OpenCV 2.4.13.6 ( for off-board debugging )

  • Some basic knowledge

    • Please be sure to glance over SDSOC Tutorials before going deeper. That tutorial is a very good guidance which helps you understand some basic operations of SDSOC in a very efficient way.

    • Basic C/C++ programming skills.

Installation

  1. First, download this repository.

  2. Create an empty SDSOC project, be sure to select reVISION platform if you have installed it.

  3. Adjusting C/C++ Build Options

  4. Add all the source files of src folder to the newly created SDSOC project. By the way, most source files remain unchanged as in SeetaFaceEngine. You can jump to conv_net.cpp and view the FPGA accelerated codes.

  5. Find out convolute1.cpp in project explorer and expand it. Right click on the green dot convolute1, then click on "Toggle HW/SW".

    Again, find out math_functions.cpp and toggle matrix_procuct. Note that Toggle HW/SW is to label the function as a Hardware Function which runs on FPGA after synthesis.

  6. Select Generate SD card image in Application Project Setting window and then build the project. This process will take 1~3 hours, depending on your computer's performance.

  7. After building, navigate to the folder with the same name as your build environment in the SDSOC project file directory. And then find the sd_card folder, copy all the files inside to the SD card root directory.

  8. Open the model folder of this repository and extract the two compressed pakages inside. After that you will get a file named seeta_fr_v1.0.bin of about 110 MB. make sure this file is under the root of model folder.

  9. Copy two folders, model and data, to the root directory of SD card.

  10. Configure UART settings ( as mentioned in SDSOC Tutorials ) and run the application on board. Note that the executable file Seeta-Accel-Test.elf locates in /media/card and you should navigate to the right place to run it.

Off-board Debug

We have introduced the installation process of this project. All the codes mentioned above will be executed on FPGA evaluation board. But when you want to make changes to the code or even migrate it to other algorithms, you might need to do some off-board debugging before moving on-board. Note that off-board debugging should also be done under Linux environment.

  1. Download OpenCV 2.4.13.6, and install it ( OpenCV installation tutorial )

  2. Install Codeblocks, create an empty project, and then configure OpenCV for the build environment.

  3. Select c++11 standard support in build option.

  4. Copy the two .cpp files under the off-board debug folder of this repository to the src folder. Overwrite the original files with the same name.

  5. Import the files in src to codeblocks project.

  6. Build and run the project.

Acknowledgement

I would like to express my special thanks to my teachers, Shulong WANG and Quanxue GAO of Xidian University, for their support on this project.

References

UG1235

UG902

UG1253

UG1027

UG1146

UG1282

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].