All Projects → godweiyang → NN-CUDA-Example

godweiyang / NN-CUDA-Example

Licence: Apache-2.0 License
Several simple examples for popular neural network toolkits calling custom CUDA operators.

Programming Languages

python
139335 projects - #7 most used programming language
CMake
9771 projects
C++
36643 projects - #6 most used programming language
Cuda
1817 projects
c
50402 projects - #5 most used programming language

Projects that are alternatives of or similar to NN-CUDA-Example

libelas-gpu
Implementation of LIBELAS in cuda.
Stars: ✭ 41 (-93.1%)
Mutual labels:  cuda
dbcsr
DBCSR: Distributed Block Compressed Sparse Row matrix library
Stars: ✭ 65 (-89.06%)
Mutual labels:  cuda
gproshan
geometry processing and shape analysis framework
Stars: ✭ 48 (-91.92%)
Mutual labels:  cuda
lane detection
Lane detection for the Nvidia Jetson TX2 using OpenCV4Tegra
Stars: ✭ 15 (-97.47%)
Mutual labels:  cuda
ONNX-Runtime-with-TensorRT-and-OpenVINO
Docker scripts for building ONNX Runtime with TensorRT and OpenVINO in manylinux environment
Stars: ✭ 15 (-97.47%)
Mutual labels:  cuda
ctuning-programs
Collective Knowledge extension with unified and customizable benchmarks (with extensible JSON meta information) to be easily integrated with customizable and portable Collective Knowledge workflows. You can easily compile and run these benchmarks using different compilers, environments, hardware and OS (Linux, MacOS, Windows, Android). More info:
Stars: ✭ 41 (-93.1%)
Mutual labels:  cuda
Aresdb
A GPU-powered real-time analytics storage and query engine.
Stars: ✭ 2,814 (+373.74%)
Mutual labels:  cuda
gpubootcamp
This repository consists for gpu bootcamp material for HPC and AI
Stars: ✭ 227 (-61.78%)
Mutual labels:  cuda
euler2d kokkos
Simple 2d finite volume solver for Euler equations using c++ kokkos library
Stars: ✭ 27 (-95.45%)
Mutual labels:  cuda
FGPU
No description or website provided.
Stars: ✭ 30 (-94.95%)
Mutual labels:  cuda
Social-Distancing-and-Face-Mask-Detection
Social Distancing and Face Mask Detection using TensorFlow. Install all required Libraries and GPU drivers as well. Refer to README.md or REPORT for know to installation requirement
Stars: ✭ 39 (-93.43%)
Mutual labels:  cuda
How to write cuda extensions in pytorch
How to write cuda kernels or c functions in pytorch, especially for former caffe users.
Stars: ✭ 51 (-91.41%)
Mutual labels:  cuda
NewsMTSC
Target-dependent sentiment classification in news articles reporting on political events. Includes a high-quality data set of over 11k sentences and a state-of-the-art classification model.
Stars: ✭ 54 (-90.91%)
Mutual labels:  cuda
HeCBench
software.intel.com/content/www/us/en/develop/articles/repo-evaluating-performance-productivity-oneapi.html
Stars: ✭ 85 (-85.69%)
Mutual labels:  cuda
GOSH
An ultra-fast, GPU-based large graph embedding algorithm utilizing a novel coarsening algorithm requiring not more than a single GPU.
Stars: ✭ 12 (-97.98%)
Mutual labels:  cuda
Plotoptix
Data visualisation in Python based on OptiX 7.2 ray tracing framework.
Stars: ✭ 252 (-57.58%)
Mutual labels:  cuda
HiSpatialCluster
Clustering spatial points with algorithm of Fast Search, high performace computing implements of CUDA or parallel in CPU, and runnable implements on python standalone or arcgis.
Stars: ✭ 31 (-94.78%)
Mutual labels:  cuda
k-means
Code accompanying my blog post on k-means in Python, C++ and CUDA
Stars: ✭ 56 (-90.57%)
Mutual labels:  cuda
peakperf
Achieve peak performance on x86 CPUs and NVIDIA GPUs
Stars: ✭ 33 (-94.44%)
Mutual labels:  cuda
allgebra
Base container for developing C++ and Fortran HPC applications
Stars: ✭ 14 (-97.64%)
Mutual labels:  cuda

Neural Network CUDA Example

logo

Several simple examples for neural network toolkits (PyTorch, TensorFlow, etc.) calling custom CUDA operators.

We provide several ways to compile the CUDA kernels and their cpp wrappers, including jit, setuptools and cmake.

We also provide several python codes to call the CUDA kernels, including kernel time statistics and model training.

For more accurate time statistics, you'd best use nvprof or nsys to run the code.

Environments

  • NVIDIA Driver: 418.116.00
  • CUDA: 11.0
  • Python: 3.7.3
  • PyTorch: 1.7.0+cu110
  • TensorFlow: 2.4.1
  • CMake: 3.16.3
  • Ninja: 1.10.0
  • GCC: 8.3.0

Cannot ensure successful running in other environments.

Code structure

├── include
│   └── add2.h # header file of add2 cuda kernel
├── kernel
│   └── add2_kernel.cu # add2 cuda kernel
├── pytorch
│   ├── add2_ops.cpp # torch wrapper of add2 cuda kernel
│   ├── time.py # time comparison of cuda kernel and torch
│   ├── train.py # training using custom cuda kernel
│   ├── setup.py
│   └── CMakeLists.txt
├── tensorflow
│   ├── add2_ops.cpp # tensorflow wrapper of add2 cuda kernel
│   ├── time.py # time comparison of cuda kernel and tensorflow
│   ├── train.py # training using custom cuda kernel
│   └── CMakeLists.txt
├── LICENSE
└── README.md

PyTorch

Compile cpp and cuda

JIT
Directly run the python code.

Setuptools

python3 pytorch/setup.py install

CMake

mkdir build
cd build
cmake ../pytorch
make

Run python

Compare kernel running time

python3 pytorch/time.py --compiler jit
python3 pytorch/time.py --compiler setup
python3 pytorch/time.py --compiler cmake

Train model

python3 pytorch/train.py --compiler jit
python3 pytorch/train.py --compiler setup
python3 pytorch/train.py --compiler cmake

TensorFlow

Compile cpp and cuda

CMake

mkdir build
cd build
cmake ../tensorflow
make

Run python

Compare kernel running time

python3 tensorflow/time.py --compiler cmake

Train model

python3 tensorflow/train.py --compiler cmake

Implementation details (in Chinese)

PyTorch自定义CUDA算子教程与运行时间分析
详解PyTorch编译并调用自定义CUDA算子的三种方式
三分钟教你如何PyTorch自定义反向传播

F.A.Q

Q. ImportError: libc10.so: cannot open shared object file: No such file or directory
A. You must do import torch before import add2.

Q. tensorflow.python.framework.errors_impl.NotFoundError: build/libadd2.so: undefined symbol: _ZTIN10tensorflow8OpKernelE
A. Check if ${TF_LFLAGS} in CmakeLists.txt is correct.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].