All Projects → tirumalnaidu → opencl-hls-cnn-accelerator

tirumalnaidu / opencl-hls-cnn-accelerator

Licence: GPL-3.0 license
OpenCL HLS based CNN Accelerator on Intel DE10 Nano FPGA.

Programming Languages

c
50402 projects - #5 most used programming language
Jupyter Notebook
11667 projects

Projects that are alternatives of or similar to opencl-hls-cnn-accelerator

spector
Spector: An OpenCL FPGA Benchmark Suite
Stars: ✭ 38 (-22.45%)
Mutual labels:  fpga, opencl, altera-opencl-sdk
fpga caffe
No description or website provided.
Stars: ✭ 116 (+136.73%)
Mutual labels:  fpga, opencl
Sdaccel examples
SDAccel Examples
Stars: ✭ 325 (+563.27%)
Mutual labels:  fpga, opencl
dcurl
Hardware-accelerated Multi-threaded IOTA PoW, drop-in replacement for ccurl
Stars: ✭ 39 (-20.41%)
Mutual labels:  fpga, opencl
Tf2
An Open Source Deep Learning Inference Engine Based on FPGA
Stars: ✭ 113 (+130.61%)
Mutual labels:  fpga, opencl
Trisycl
Generic system-wide modern C++ for heterogeneous platforms with SYCL from Khronos Group
Stars: ✭ 354 (+622.45%)
Mutual labels:  fpga, opencl
Tornadovm
TornadoVM: A practical and efficient heterogeneous programming framework for managed languages
Stars: ✭ 479 (+877.55%)
Mutual labels:  fpga, opencl
John
John the Ripper jumbo - advanced offline password cracker, which supports hundreds of hash and cipher types, and runs on many operating systems, CPUs, GPUs, and even some FPGAs
Stars: ✭ 5,656 (+11442.86%)
Mutual labels:  fpga, opencl
Pipecnn
An OpenCL-based FPGA Accelerator for Convolutional Neural Networks
Stars: ✭ 775 (+1481.63%)
Mutual labels:  fpga, opencl
tiny-tpu
Small-scale Tensor Processing Unit built on an FPGA
Stars: ✭ 61 (+24.49%)
Mutual labels:  fpga, fpga-accelerator
tensor stream-opencl
An OpenCL backend for TensorStream
Stars: ✭ 26 (-46.94%)
Mutual labels:  opencl
KiCad-Schematic-Symbol-Libraries
Schematic symbol libraries for FPGAs & microcontrollers.
Stars: ✭ 70 (+42.86%)
Mutual labels:  fpga
SpinalCrypto
SpinalHDL - Cryptography libraries
Stars: ✭ 36 (-26.53%)
Mutual labels:  fpga
no2muacm
Drop In USB CDC ACM core for iCE40 FPGA
Stars: ✭ 26 (-46.94%)
Mutual labels:  fpga
LVDS-7-to-1-Serializer
An Verilog implementation of 7-to-1 LVDS Serializer. Which can be used for comunicating FPGAs with LVDS TFT Screens.
Stars: ✭ 33 (-32.65%)
Mutual labels:  fpga
clash-compucolor2
Clash implementation of the Compucolor II home computer
Stars: ✭ 25 (-48.98%)
Mutual labels:  fpga
penguinV
Simple and fast C++ image processing library with focus on heterogeneous systems
Stars: ✭ 110 (+124.49%)
Mutual labels:  opencl
verilog-sid-mos6581
MOS6581 SID chip emulator in SystemVerilog
Stars: ✭ 22 (-55.1%)
Mutual labels:  fpga
bandicoot-code
Bandicoot: GPU accelerator add-on for the Armadillo C++ linear algebra library
Stars: ✭ 21 (-57.14%)
Mutual labels:  opencl
memalloy
Memory consistency modelling using Alloy
Stars: ✭ 23 (-53.06%)
Mutual labels:  opencl

About

We designed a Neural Network Accelerator for Darknet Reference Model (which is 2.9 times faster than AlexNet and attains the same top-1 and top-5 performance as AlexNet but with 1/10th the parameters) for image classification on Imagenet Dataset.

Table of Contents

Board

Requirements

Files

  • pytorch_model - We used a CNN based on Darknet Framework. So, we had to implemented the model in PyTorch Framework to check the results and collect the model parameters
  • pyopencl_model - To simulate and verify the kernels we wrote in OpenCL, we used PyOpenCL package and it worked with same accuracy as PyTorch model and acheived about 20x speed than PyTorch model.
  • model - This folder contains the pre-trained model parameters of darknet reference model of each layer in seperate txt file.

CNN Architecture

Layer Filters Kernel Size Stride Pad Input Size Output Size
1 conv 16 3 x 3 1 1 256 x 256 x 3 256 x 256 x 16
2 max - 2 x 2 2 0 256 x 256 x 16 128 x 128 x 16
3 conv 32 3 x 3 1 1 128 x 128 x 16 128 x 128 x 32
4 max - 2 x 2 2 0 128 x 128 x 32 64 x 64 x 32
5 conv 64 3 x 3 1 1 64 x 64 x 32 64 x 64 x 64
6 max - 2 x 2 2 0 64 x 64 x 64 32 x 32 x 64
7 conv 128 3 x 3 1 1 32 x 32 x 64 32 x 32 x 128
8 max - 2 x 2 2 0 32 x 32 x 128 16 x 16 x 128
9 conv 256 3 x 3 1 1 16 x 16 x 128 16 x 16 x 256
10 max - 2 x 2 2 0 16 x 16 x 256 8 x 8 x 256
11 conv 512 3 x 3 1 1 8 x 8 x 256 8 x 8 x 512
12 max - 2 x 2 2 0 8 x 8 x 512 4 x 4 x 512
13 conv 1024 3 x 3 1 1 4 x 4 x 512 4 x 4 x 1024
14 avg - 4 x 4 1 0 4 x 4 x 1024 1 x 1 x 1024
15 conv 1000 1 x 1 1 0 1 x 1 x 1024 1 x 1 x 1000

Results

Conv 0  time: 35.898 ms                                                         
Conv 2  time: 79.748 ms                                                         
Conv 4  time: 79.439 ms                                                         
Conv 6  time: 79.442 ms                                                         
Conv 8  time: 79.418 ms                                                         
Conv 10 time: 79.411 ms                                                         
Conv 12 time: 79.404 ms                                                         
Conv 14 time: 17.319 ms                                                         
Total Convolution time: 530.079 ms

Batchnorm 0   time: 143.092 ms                                                  
Batchnorm 2   time: 73.007 ms                                                   
Batchnorm 4   time: 21.486 ms                                                   
Batchnorm 6   time: 5.504 ms                                                    
Batchnorm 8   time: 2.479 ms                                                    
Batchnorm 10  time: 1.259 ms                                                    
Batchnorm 12  time: 0.641 ms                                                    
Batchnorm 14  time: 0.052 ms                                                    
Total Batchnorm time: 247.520 ms   

Maxpool 1  time: 78.848 ms                                                      
Maxpool 3  time: 31.823 ms                                                      
Maxpool 5  time: 8.991 ms                                                       
Maxpool 7  time: 2.890 ms                                                       
Maxpool 9  time: 1.486 ms                                                       
Maxpool 11  time: 0.719 ms                                                      
Maxpool 13  time: 0.286 ms                                                      
Total Pooling time: 125.042 ms                                                  
                                                                                
Total Time: 902.642 ms                                                            
                                                                                
Label   : Egyptian cat                                                          
Accuracy: 35.796 % 

Resource Usage

Kernel ALUTs FFs RAMs DSPs
conv 27822 28705 144 58
batch norm 9949 12211 93 10
pool 8247 10211 36 24
conv1x1 12184 15087 102 17
Total 61872 (56%) 73190 (33%) 405 (79%) 109 (97%)

Planned Improvements

We can further improve the throughput of the accelerator by converting the model to fixed point (8-bit or 16-bit) and pipelining the accelerator by using Intel channels and pipes extension.

License

MIT License

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].