All Projects → xuqiantong → Cuda Winograd

xuqiantong / Cuda Winograd

Fast CUDA Kernels for ResNet Inference.

Labels

Projects that are alternatives of or similar to Cuda Winograd

Minhashcuda
Weighted MinHash implementation on CUDA (multi-gpu).
Stars: ✭ 88 (-15.38%)
Mutual labels:  cuda
Elasticfusion
Real-time dense visual SLAM system
Stars: ✭ 1,298 (+1148.08%)
Mutual labels:  cuda
Supra
SUPRA: Software Defined Ultrasound Processing for Real-Time Applications - An Open Source 2D and 3D Pipeline from Beamforming to B-Mode
Stars: ✭ 96 (-7.69%)
Mutual labels:  cuda
Deep Learning With Cats
Deep learning with cats (^._.^)
Stars: ✭ 1,290 (+1140.38%)
Mutual labels:  cuda
Aurora
Minimal Deep Learning library is written in Python/Cython/C++ and Numpy/CUDA/cuDNN.
Stars: ✭ 90 (-13.46%)
Mutual labels:  cuda
Numer
Numeric Erlang - vector and matrix operations with CUDA. Heavily inspired by Pteracuda - https://github.com/kevsmith/pteracuda
Stars: ✭ 91 (-12.5%)
Mutual labels:  cuda
Python Opencv Cuda
custom opencv_contrib module which exposes opencv cuda optical flow methods with python bindings
Stars: ✭ 86 (-17.31%)
Mutual labels:  cuda
Deepnet
Deep.Net machine learning framework for F#
Stars: ✭ 99 (-4.81%)
Mutual labels:  cuda
Matconvnet
MatConvNet: CNNs for MATLAB
Stars: ✭ 1,299 (+1149.04%)
Mutual labels:  cuda
Pynvvl
A Python wrapper of NVIDIA Video Loader (NVVL) with CuPy for fast video loading with Python
Stars: ✭ 95 (-8.65%)
Mutual labels:  cuda
Weighted softmax loss
Weighted Softmax Loss Layer for Caffe
Stars: ✭ 89 (-14.42%)
Mutual labels:  cuda
Deeppipe2
Deep Learning library using GPU(CUDA/cuBLAS)
Stars: ✭ 90 (-13.46%)
Mutual labels:  cuda
Fbtt Embedding
This is a Tensor Train based compression library to compress sparse embedding tables used in large-scale machine learning models such as recommendation and natural language processing. We showed this library can reduce the total model size by up to 100x in Facebook’s open sourced DLRM model while achieving same model quality. Our implementation is faster than the state-of-the-art implementations. Existing the state-of-the-art library also decompresses the whole embedding tables on the fly therefore they do not provide memory reduction during runtime of the training. Our library decompresses only the requested rows therefore can provide 10,000 times memory footprint reduction per embedding table. The library also includes a software cache to store a portion of the entries in the table in decompressed format for faster lookup and process.
Stars: ✭ 92 (-11.54%)
Mutual labels:  cuda
Thundersvm
ThunderSVM: A Fast SVM Library on GPUs and CPUs
Stars: ✭ 1,282 (+1132.69%)
Mutual labels:  cuda
Extending Jax
Extending JAX with custom C++ and CUDA code
Stars: ✭ 98 (-5.77%)
Mutual labels:  cuda
Deep Learning Boot Camp
A community run, 5-day PyTorch Deep Learning Bootcamp
Stars: ✭ 1,270 (+1121.15%)
Mutual labels:  cuda
Tutorial Ubuntu 18.04 Install Nvidia Driver And Cuda And Cudnn And Build Tensorflow For Gpu
Ubuntu 18.04 How to install Nvidia driver + CUDA + CUDNN + build tensorflow for gpu step by step command line
Stars: ✭ 91 (-12.5%)
Mutual labels:  cuda
Pygraphistry
PyGraphistry is a Python library to quickly load, shape, embed, and explore big graphs with the GPU-accelerated Graphistry visual graph analyzer
Stars: ✭ 1,365 (+1212.5%)
Mutual labels:  cuda
Dpp
Detail-Preserving Pooling in Deep Networks (CVPR 2018)
Stars: ✭ 99 (-4.81%)
Mutual labels:  cuda
Region Conv
Not All Pixels Are Equal: Difficulty-Aware Semantic Segmentation via Deep Layer Cascade
Stars: ✭ 95 (-8.65%)
Mutual labels:  cuda

Introduction

This code implements fast cuda kernels for DNN inference, especially for convolution layers / residule blocks in ResNet. Specifically, the kernels combine three parts into one piece:

  • Convolution
  • Batch Nomalization (BN + Scale)
  • Activation (ReLU)

For implementation details, please refer to the technical report included in this repo. Winograd algorithm is used for 3 * 3 convolutional kernels.

Usage

mkdir data
python data_generator.py
make
./Test 0
  • Set parameters in data_generator.py
  • Run 6 test cases with changing numbers from 0 to 5 after ./Test

Results

3 * 3 Kernels

Kernals Operations 128 / 128 256 / 256
Cudnn Gemm + BN + ReLU 214us 384us
Cudnn Winograd + BN + ReLU 95us 155us
Our Kernel Winograd + BN + ReLU 59us 117us

1 * 1 Kernels [BUGGY NUMBERS]

Kernals 512 / 128 128 / 512 1024 / 256 256 / 1024
Operations Gemm + BN + ReLU Gemm + BN Gemm + BN + ReLU Gemm + BN + ReLU
Cudnn 119us 115us 219us 214us
Our Kernel 58us 55us 186us 181us
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].