Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → sony → Nnabla Ext Cuda

sony / Nnabla Ext Cuda

Licence: apache-2.0

A CUDA Extension of Neural Network Libraries

Labels

cuda

Projects that are alternatives of or similar to Nnabla Ext Cuda

Tsne Cuda

GPU Accelerated t-SNE for CUDA with Python bindings

Stars: ✭ 1,120 (+1317.72%)

Mutual labels: cuda

Build Deep Learning Env With Tensorflow Python Opencv

Tutorial on how to build your own research envirorment for Deep Learning with OpenCV, Python, Tensorfow

Stars: ✭ 66 (-16.46%)

Mutual labels: cuda

Titan

A high-performance CUDA-based physics simulation sandbox for soft robotics and reinforcement learning.

Stars: ✭ 73 (-7.59%)

Mutual labels: cuda

Cutlass

CUDA Templates for Linear Algebra Subroutines

Stars: ✭ 1,123 (+1321.52%)

Mutual labels: cuda

Arboretum

Gradient Boosting powered by GPU(NVIDIA CUDA)

Stars: ✭ 64 (-18.99%)

Mutual labels: cuda

Torch sampling

Efficient reservoir sampling implementation for PyTorch

Stars: ✭ 68 (-13.92%)

Mutual labels: cuda

Minkowskiengine

Minkowski Engine is an auto-diff neural network library for high-dimensional sparse tensors

Stars: ✭ 1,110 (+1305.06%)

Mutual labels: cuda

Cuda Design Patterns

Some CUDA design patterns and a bit of template magic for CUDA

Stars: ✭ 78 (-1.27%)

Mutual labels: cuda

Autodock Gpu

AutoDock for GPUs and other accelerators

Stars: ✭ 65 (-17.72%)

Mutual labels: cuda

Parenchyma

An extensible HPC framework for CUDA, OpenCL and native CPU.

Stars: ✭ 71 (-10.13%)

Mutual labels: cuda

Mpn Cov

@ICCV2017: For exploiting second-order statistics, we propose Matrix Power Normalized Covariance pooling (MPN-COV) ConvNets, different from and outperforming those using global average pooling.

Stars: ✭ 63 (-20.25%)

Mutual labels: cuda

Cudadrv.jl

A Julia wrapper for the CUDA driver API.

Stars: ✭ 64 (-18.99%)

Mutual labels: cuda

Deepjointfilter

The source code of ECCV16 'Deep Joint Image Filtering'.

Stars: ✭ 68 (-13.92%)

Mutual labels: cuda

Ggnn

GGNN: State of the Art Graph-based GPU Nearest Neighbor Search

Stars: ✭ 63 (-20.25%)

Mutual labels: cuda

Cudart.jl

Julia wrapper for CUDA runtime API

Stars: ✭ 75 (-5.06%)

Mutual labels: cuda

Gdax Orderbook Ml

Application of machine learning to the Coinbase (GDAX) orderbook

Stars: ✭ 60 (-24.05%)

Mutual labels: cuda

Alenka

GPU database engine

Stars: ✭ 1,150 (+1355.7%)

Mutual labels: cuda

2016 super resolution

ICCV2015 Image Super-Resolution Using Deep Convolutional Networks

Stars: ✭ 78 (-1.27%)

Mutual labels: cuda

Hiop

HPC solver for nonlinear optimization problems

Stars: ✭ 75 (-5.06%)

Mutual labels: cuda

Project Currennt Public

CURRENNNT codes and scripts

Stars: ✭ 69 (-12.66%)

Mutual labels: cuda

View All Similar Projects ➔

A CUDA Extension of Neural Network Libraries

This repository provides an official CUDA/cuDNN-accelerated extension of the Neural Network Libraries deep learning framework.

In order to use it, the default context needs to be changed from 'cpu' to cudnn':

from nnabla.ext_utils import get_extension_context

ctx = get_extension_context('cudnn', device_id='0')
nn.set_default_context(ctx)

Float 16-bit precision (fp16, half) can also be used by setting type_config options as following.

ctx = get_extension_context('cudnn', device_id='0', type_config='half')

See Mixed precision training tutorial for a stable training technique with fp16.

Currently, the binary package install manual and the usage documentation are integrated into the NNabla's documentation. For build instructions, see below.

Build CUDA extension

Performance notes

Automatic Convolution algorithm selection

If CUDNN is enabled, the extension library uses the specific Convolution algorithms pre-optimized by CUDNN.

Optionally, this library can automatically select the fastest algorithms for your own network using the given configuration of parameters (filter size, stride, dilation, pad, etc), by exhaustively executing and measuring the time of each computation of algorithms (cudnnFindConvolution*Algorithm). The best algorithm will be cached, then re-used when an identical configuration is passed to our Convolution interface. It is very powerful in speed, even in non-static (dynamic) neural network. This mode becomes enabled by setting an environment variable NNABLA_CUDNN_ALGORITHM_BY_HEURISTIC 0.

However, it often consumes much memory due to a big workspace memory required by automatically found algorithms, and sometimes doesn't work on a GPU with small memory. To avoid this, you can specify the limit of the workspace size by setting an environment variable NNABLA_CUDNN_WORKSPACE_LIMIT (in bytes) read at runtime (not compilation time). For example, NNABLA_CUDNN_WORKSPACE_LIMIT=134217728 limits the workspace size up to 128 MB. The default value is -1 which means there is no limit of the workspace size.

In some cases it may be desired to restrict the automatic search for CUDNN Convolution algorithms to those that give deterministic (reproducable) results. This can be achived by setting an environment variable NNABLA_CUDNN_DETERMINISTIC to some value other than 0.

FAQ

No FAQ so far.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 79

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (4) 🔗