Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → artyom-beilis → dlprimitives

artyom-beilis / dlprimitives

Licence: MIT license

Deep Learning Primitives and Mini-Framework for OpenCL

Programming Languages

36643 projects - #6 most used programming language

50402 projects - #5 most used programming language

139335 projects - #7 most used programming language

184084 projects - #8 most used programming language

Labels

deep-neural-networks deep-learning gpu opencl convolutional-neural-networks gpu-computing open-standard

Projects that are alternatives of or similar to dlprimitives

Algorithms implemented in CUDA + resources about GPGPU

Stars: ✭ 37 (-43.08%)

Mutual labels: opencl, gpu-computing

AutoDock for GPUs and other accelerators

Stars: ✭ 65 (+0%)

Mutual labels: opencl, gpu-computing

LuxCore source repository

Stars: ✭ 601 (+824.62%)

Mutual labels: opencl, gpu-computing

High-performance Bayesian Data Analysis on the GPU in Clojure

Stars: ✭ 342 (+426.15%)

Mutual labels: opencl, gpu-computing

A framework for GPU based high-performance medical image processing and visualization

Stars: ✭ 179 (+175.38%)

Mutual labels: opencl, gpu-computing

Generic system-wide modern C++ for heterogeneous platforms with SYCL from Khronos Group

Stars: ✭ 354 (+444.62%)

Mutual labels: opencl, gpu-computing

Fast Clojure Matrix Library

Stars: ✭ 927 (+1326.15%)

Mutual labels: opencl, gpu-computing

A fast, ergonomic and portable tensor library in Nim with a deep learning focus for CPU, GPU and embedded devices via OpenMP, Cuda and OpenCL backends

Stars: ✭ 793 (+1120%)

Mutual labels: opencl, gpu-computing

Experimental implementation of OpenCL on Vulkan

Stars: ✭ 158 (+143.08%)

Mutual labels: opencl, gpu-computing

A Python Library for Genetic Algorithm on OpenCL

Stars: ✭ 103 (+58.46%)

Mutual labels: opencl, gpu-computing

Blender Integration for LuxCore

Stars: ✭ 287 (+341.54%)

Mutual labels: opencl, gpu-computing

CUDAfy .NET allows easy development of high performance GPGPU applications completely from the .NET. It's developed in C#.

Stars: ✭ 56 (-13.85%)

Mutual labels: opencl, gpu-computing

ClojureCL is a Clojure library for parallel computations with OpenCL.

Stars: ✭ 266 (+309.23%)

Mutual labels: opencl, gpu-computing

Implementation of SYCL for CPUs, AMD GPUs, NVIDIA GPUs

Stars: ✭ 377 (+480%)

Mutual labels: opencl, gpu-computing

A flexable HTM (Hierarchical Temporal Memory) framework with full GPU support.

Stars: ✭ 79 (+21.54%)

Mutual labels: opencl, gpu-computing

Multi-device OpenCL kernel load balancer and pipeliner API for C#. Uses shared-distributed memory model to keep GPUs updated fast while using same kernel on all devices(for simplicity).

Stars: ✭ 76 (+16.92%)

Mutual labels: opencl, gpu-computing

GPU Mersenne primality test.

Stars: ✭ 77 (+18.46%)

Mutual labels: opencl, gpu-computing

GARDENIA: Graph Analytics Repository for Designing Efficient Next-generation Accelerators

Stars: ✭ 22 (-66.15%)

Mutual labels: opencl, gpu-computing

Beatmup: image and signal processing library

Stars: ✭ 168 (+158.46%)

Mutual labels: gpu-computing

Memory consistency modelling using Alloy

Stars: ✭ 23 (-64.62%)

Mutual labels: opencl

View All Similar Projects ➔

DLPrimitives

This project aims to provide cross platform OpenCL tools for deep learning and inference.

Today, most of deep learning training is done on NVidia GPUs using closed source CUDA and CUDNN libraries. It is either challenging or virtually impossible to use AMD or Intel GPUs. For example: AMD provides ROCm platform, but there is no support of RDNA platforms yet (more than a year since a release), there is no support of APUs and no support of any operating systems other than Linux.

Goals

Create an open source, cross platform deep learning primitives library similar to cuDNN or MIOpen that supports multiple GPU architectures.
Create an inference library with minimal dependencies for efficient inference on any modern GPU, similar to TensorRT or MIGraphX.
Create minimalistic deep-learning framework as POC of capabilities and performance.
Integrate to existing large scale deep learning projects like PyTorch, TF, MXNet such that vendor independent open-source OpenCL API will be first class citizen for deep learning.

Please note this is only work in progress - first and preliminary stages.

Initial Framework Integration

Integration with existing frameworks:

Pytorch, (almost) out-of-tree OpenCL backend project:

https://github.com/artyom-beilis/pytorch_dlprim
Caffe-OpenCL, performance improvements by using dlprimitives:

https://github.com/artyom-beilis/caffe/tree/opencl_dlprim

Integration With ONNX

ONNX Model loading and inference tested on following imagenet networks:

Pytorch, opsets 9, 11, 13: alexnet, vgg16, resnet18, resnext50_32x4d, wide_resnet50_2, efficientnet_b0, efficientnet_b4, regnet_y_400mf, squeezenet1_0, mobilenet_v2, densenet121
MXNet: vgg11_bn, alexnet, mobilenetv2_0.25, mobilenet0.25, densenet121, resnet18_v1, squeezenet1.0
Tensorflow, limited initial support, channel first: resnet50, densenet121

Documentation

Is published under http://dlprimitives.org/docs/

Features Matrix

Operator	Features	Comment
Softmax	Softmax, LogSoftmax
NLLLoss
MSELoss
SoftmaxWithLoss
Elementwise	ax+by, max(ax,by), ax*y, broadcasting
Concat
Slice
Pooling2D	max, average
GlobalPooling	max, average	2D only
GlobalAvgPool2d
InnerProduct
BatchNorm
Reshape
Squeeze
Flatten
Threshold
Hardtanh
Abs
Parameter		ֹUtility
Reduction	Sum, Mean, Sum Squares, L1
Convolution2D	GEMM, Winograd, Depthwise Separable
TransposedConvolution2D	GEMM, Winograd, Depthwise Separable
Activation	relu, sigmoid, tanh, relu6

Solvers: SGD, Adam

Tested GPUs

Device	Vendor	Notes
RX 6600XT	AMD	ROCr
RX 560	AMD	16cu model, ROCm, PAL, Clover
HD 530	Intel	i5-6600, NEO driver
GTX 960	NVidia
GTX 1080	NVidia
RTX 2060S	NVidia
MaliG52 MC2	ARM	performance not optimised yet
M1 Max	Apple	32-core model

Devices Tested on Windows: AMD RX 560, NVidia GTX 960.

Devices Tested on macOS: Apple M1 Max.

Other features

Network object for inference
ONNX to DLPrimitives model converter

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 65

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (0) 🔗