tirumalnaidu / opencl-hls-cnn-accelerator

Licence: GPL-3.0 license

OpenCL HLS based CNN Accelerator on Intel DE10 Nano FPGA.

Programming Languages

50402 projects - #5 most used programming language

Jupyter Notebook

11667 projects

Projects that are alternatives of or similar to opencl-hls-cnn-accelerator

spector

Spector: An OpenCL FPGA Benchmark Suite

Stars: ✭ 38 (-22.45%)

Mutual labels: fpga, opencl, altera-opencl-sdk

fpga caffe

No description or website provided.

Stars: ✭ 116 (+136.73%)

Mutual labels: fpga, opencl

Sdaccel examples

SDAccel Examples

Stars: ✭ 325 (+563.27%)

Mutual labels: fpga, opencl

dcurl

Hardware-accelerated Multi-threaded IOTA PoW, drop-in replacement for ccurl

Stars: ✭ 39 (-20.41%)

Mutual labels: fpga, opencl

Tf2

An Open Source Deep Learning Inference Engine Based on FPGA

Stars: ✭ 113 (+130.61%)

Mutual labels: fpga, opencl

Trisycl

Generic system-wide modern C++ for heterogeneous platforms with SYCL from Khronos Group

Stars: ✭ 354 (+622.45%)

Mutual labels: fpga, opencl

Tornadovm

TornadoVM: A practical and efficient heterogeneous programming framework for managed languages

Stars: ✭ 479 (+877.55%)

Mutual labels: fpga, opencl

John

John the Ripper jumbo - advanced offline password cracker, which supports hundreds of hash and cipher types, and runs on many operating systems, CPUs, GPUs, and even some FPGAs

Stars: ✭ 5,656 (+11442.86%)

Mutual labels: fpga, opencl

Pipecnn

An OpenCL-based FPGA Accelerator for Convolutional Neural Networks

Stars: ✭ 775 (+1481.63%)

Mutual labels: fpga, opencl

tiny-tpu

Small-scale Tensor Processing Unit built on an FPGA

Stars: ✭ 61 (+24.49%)

Mutual labels: fpga, fpga-accelerator

tensor stream-opencl

An OpenCL backend for TensorStream

Stars: ✭ 26 (-46.94%)

Mutual labels: opencl

KiCad-Schematic-Symbol-Libraries

Schematic symbol libraries for FPGAs & microcontrollers.

Stars: ✭ 70 (+42.86%)

Mutual labels: fpga

SpinalCrypto

SpinalHDL - Cryptography libraries

Stars: ✭ 36 (-26.53%)

Mutual labels: fpga

no2muacm

Drop In USB CDC ACM core for iCE40 FPGA

Stars: ✭ 26 (-46.94%)

Mutual labels: fpga

LVDS-7-to-1-Serializer

An Verilog implementation of 7-to-1 LVDS Serializer. Which can be used for comunicating FPGAs with LVDS TFT Screens.

Stars: ✭ 33 (-32.65%)

Mutual labels: fpga

clash-compucolor2

Clash implementation of the Compucolor II home computer

Stars: ✭ 25 (-48.98%)

Mutual labels: fpga

penguinV

Simple and fast C++ image processing library with focus on heterogeneous systems

Stars: ✭ 110 (+124.49%)

Mutual labels: opencl

verilog-sid-mos6581

MOS6581 SID chip emulator in SystemVerilog

Stars: ✭ 22 (-55.1%)

Mutual labels: fpga

bandicoot-code

Bandicoot: GPU accelerator add-on for the Armadillo C++ linear algebra library

Stars: ✭ 21 (-57.14%)

Mutual labels: opencl

memalloy

Memory consistency modelling using Alloy

Stars: ✭ 23 (-53.06%)

Mutual labels: opencl

View All Similar Projects ➔

About

We designed a Neural Network Accelerator for Darknet Reference Model (which is 2.9 times faster than AlexNet and attains the same top-1 and top-5 performance as AlexNet but with 1/10th the parameters) for image classification on Imagenet Dataset.

About
Table of Contents

Board

Terasic DE10-Nano Development Kit (Cyclone V SoC FPGA)

Requirements

Files

pytorch_model - We used a CNN based on Darknet Framework. So, we had to implemented the model in PyTorch Framework to check the results and collect the model parameters
pyopencl_model - To simulate and verify the kernels we wrote in OpenCL, we used PyOpenCL package and it worked with same accuracy as PyTorch model and acheived about 20x speed than PyTorch model.
model - This folder contains the pre-trained model parameters of darknet reference model of each layer in seperate txt file.

CNN Architecture

	Layer	Filters	Kernel Size	Stride	Pad	Input Size	Output Size
1	conv	16	3 x 3	1	1	256 x 256 x 3	256 x 256 x 16
2	max	-	2 x 2	2	0	256 x 256 x 16	128 x 128 x 16
3	conv	32	3 x 3	1	1	128 x 128 x 16	128 x 128 x 32
4	max	-	2 x 2	2	0	128 x 128 x 32	64 x 64 x 32
5	conv	64	3 x 3	1	1	64 x 64 x 32	64 x 64 x 64
6	max	-	2 x 2	2	0	64 x 64 x 64	32 x 32 x 64
7	conv	128	3 x 3	1	1	32 x 32 x 64	32 x 32 x 128
8	max	-	2 x 2	2	0	32 x 32 x 128	16 x 16 x 128
9	conv	256	3 x 3	1	1	16 x 16 x 128	16 x 16 x 256
10	max	-	2 x 2	2	0	16 x 16 x 256	8 x 8 x 256
11	conv	512	3 x 3	1	1	8 x 8 x 256	8 x 8 x 512
12	max	-	2 x 2	2	0	8 x 8 x 512	4 x 4 x 512
13	conv	1024	3 x 3	1	1	4 x 4 x 512	4 x 4 x 1024
14	avg	-	4 x 4	1	0	4 x 4 x 1024	1 x 1 x 1024
15	conv	1000	1 x 1	1	0	1 x 1 x 1024	1 x 1 x 1000

Results

Conv 0  time: 35.898 ms                                                         
Conv 2  time: 79.748 ms                                                         
Conv 4  time: 79.439 ms                                                         
Conv 6  time: 79.442 ms                                                         
Conv 8  time: 79.418 ms                                                         
Conv 10 time: 79.411 ms                                                         
Conv 12 time: 79.404 ms                                                         
Conv 14 time: 17.319 ms                                                         
Total Convolution time: 530.079 ms

Batchnorm 0   time: 143.092 ms                                                  
Batchnorm 2   time: 73.007 ms                                                   
Batchnorm 4   time: 21.486 ms                                                   
Batchnorm 6   time: 5.504 ms                                                    
Batchnorm 8   time: 2.479 ms                                                    
Batchnorm 10  time: 1.259 ms                                                    
Batchnorm 12  time: 0.641 ms                                                    
Batchnorm 14  time: 0.052 ms                                                    
Total Batchnorm time: 247.520 ms   

Maxpool 1  time: 78.848 ms                                                      
Maxpool 3  time: 31.823 ms                                                      
Maxpool 5  time: 8.991 ms                                                       
Maxpool 7  time: 2.890 ms                                                       
Maxpool 9  time: 1.486 ms                                                       
Maxpool 11  time: 0.719 ms                                                      
Maxpool 13  time: 0.286 ms                                                      
Total Pooling time: 125.042 ms                                                  
                                                                                
Total Time: 902.642 ms                                                            
                                                                                
Label   : Egyptian cat                                                          
Accuracy: 35.796 %

Resource Usage

Kernel	ALUTs	FFs	RAMs	DSPs
conv	27822	28705	144	58
batch norm	9949	12211	93	10
pool	8247	10211	36	24
conv1x1	12184	15087	102	17
Total	61872 (56%)	73190 (33%)	405 (79%)	109 (97%)

Planned Improvements

We can further improve the throughput of the accelerator by converting the model to fixed point (8-bit or 16-bit) and pipelining the accelerator by using Intel channels and pipes extension.

License

MIT License

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

tirumalnaidu / opencl-hls-cnn-accelerator

Programming Languages

Labels

Projects that are alternatives of or similar to opencl-hls-cnn-accelerator

About

Table of Contents

Board

Requirements

Files

CNN Architecture

Results

Resource Usage

Planned Improvements

License