All Projects → torch → Cunn

torch / Cunn

Licence: other

Labels

Projects that are alternatives of or similar to Cunn

Deformable Kernels
Deforming kernels to adapt towards object deformation. In ICLR 2020.
Stars: ✭ 166 (-19.02%)
Mutual labels:  cuda
Hybridizer Basic Samples
Examples of C# code compiled to GPU by hybridizer
Stars: ✭ 186 (-9.27%)
Mutual labels:  cuda
Viseron
Self-hosted NVR with object detection
Stars: ✭ 192 (-6.34%)
Mutual labels:  cuda
Cuda programming
Code from the "CUDA Crash Course" YouTube series by CoffeeBeforeArch
Stars: ✭ 169 (-17.56%)
Mutual labels:  cuda
Ssd Gpu Dma
Build userspace NVMe drivers and storage applications with CUDA support
Stars: ✭ 172 (-16.1%)
Mutual labels:  cuda
Macos Egpu Cuda Guide
Set up CUDA for machine learning (and gaming) on macOS using a NVIDIA eGPU
Stars: ✭ 187 (-8.78%)
Mutual labels:  cuda
Quda
QUDA is a library for performing calculations in lattice QCD on GPUs.
Stars: ✭ 166 (-19.02%)
Mutual labels:  cuda
Pine
🌲 Aimbot powered by real-time object detection with neural networks, GPU accelerated with Nvidia. Optimized for use with CS:GO.
Stars: ✭ 202 (-1.46%)
Mutual labels:  cuda
Cuml
cuML - RAPIDS Machine Learning Library
Stars: ✭ 2,504 (+1121.46%)
Mutual labels:  cuda
Timemory
Modular C++ Toolkit for Performance Analysis and Logging. Profiling API and Tools for C, C++, CUDA, Fortran, and Python. The C++ template API is essentially a framework to creating tools: it is designed to provide a unifying interface for recording various performance measurements alongside data logging and interfaces to other tools.
Stars: ✭ 192 (-6.34%)
Mutual labels:  cuda
Cuda freshman
Stars: ✭ 168 (-18.05%)
Mutual labels:  cuda
Gmonitor
gmonitor is a GPU monitor (Nvidia only at the moment)
Stars: ✭ 169 (-17.56%)
Mutual labels:  cuda
Pytorch Spynet
a reimplementation of Optical Flow Estimation using a Spatial Pyramid Network in PyTorch
Stars: ✭ 190 (-7.32%)
Mutual labels:  cuda
Dragon
Dragon: A Computation Graph Virtual Machine Based Deep Learning Framework.
Stars: ✭ 168 (-18.05%)
Mutual labels:  cuda
Msn Point Cloud Completion
Morphing and Sampling Network for Dense Point Cloud Completion (AAAI2020)
Stars: ✭ 196 (-4.39%)
Mutual labels:  cuda
Floor
A C++ Compute/Graphics Library and Toolchain enabling same-source CUDA/Host/Metal/OpenCL/Vulkan C++ programming and execution.
Stars: ✭ 166 (-19.02%)
Mutual labels:  cuda
Nvidia Docker
Build and run Docker containers leveraging NVIDIA GPUs
Stars: ✭ 13,961 (+6710.24%)
Mutual labels:  cuda
Oneflow
OneFlow is a performance-centered and open-source deep learning framework.
Stars: ✭ 2,868 (+1299.02%)
Mutual labels:  cuda
Simplegpuhashtable
A simple GPU hash table implemented in CUDA using lock free techniques
Stars: ✭ 198 (-3.41%)
Mutual labels:  cuda
Ck Caffe
Collective Knowledge workflow for Caffe to automate installation across diverse platforms and to collaboratively evaluate and optimize Caffe-based workloads across diverse hardware, software and data sets (compilers, libraries, tools, models, inputs):
Stars: ✭ 192 (-6.34%)
Mutual labels:  cuda
# CUDA backend for the Neural Network Package #

This package provides a CUDA implementation for many of the modules in the base nn package: nn

  • Modules: There are also additional GPU-related modules not found in the nn package.

Installing from source

git clone https://github.com/torch/cunn
cd cunn
luarocks make rocks/cunn-scm-1.rockspec

To use

Simply convert your network model to CUDA by calling :cuda():

local model = nn.Sequential()
model:add(nn.Linear(2,2))
model:add(nn.LogSoftMax())

model:cuda()  -- convert model to CUDA

... and similarly for your tensors:

local input = torch.Tensor(32,2):uniform()
input = input:cuda()
local output = model:forward(input)

... or create them directly as CudaTensors:

local input = torch.CudaTensor(32,2):uniform()
local output = model:forward(input)

To run unit-tests

luajit -l cunn -e 'cunn.test()'

GPU Training Concepts

Performance

  • data should be transferred between main memory and gpu in batches, otherwise the transfer time will be dominated by latency associated with speed of light, and execution overheads, rather than by bandwidth
  • therefore, train and predict using mini-batches
  • allocating GPU memory causes a sync-point, which will noticeably affect performance
    • therefore try to allocate any CudaTensors once, at the start of the program, and then simply copy data backwards and forwards between main memory and existing CudaTensors
  • similarly, try to avoid any operations that implicitly allocate new tensors. For example, if you write:
require 'cutorch'

local a = torch.CudaTensor(1000):uniform()
for it=1,1000 do
  local b = torch.add(a, 1)
end

... this will allocate one thousand new CudaTensors, one for each call to torch.add(a, 1).

Use instead this form:

require 'cutorch'

local a = torch.CudaTensor(1000):uniform()
local b = torch.CudaTensor(1000):uniform()
for it=1,1000 do
  b:add(a, 1)
end

In this form, b is allocated only once, before the loop. Then the b:add(a,1) operation will perform the add inside the GPU kernel, and store the result into the original b CudaTensor. This will run noticeably faster, in general. It's also a lot less likely to eat up arbitrary amounts of memory, and less likely to need frequent calls to collectgarbage(); collectgarbage().

Benchmarking

  • GPU operations will typically continue after an instruction has been issued
  • eg, if you do:
require 'cutorch'
local a = torch.CudaTensor(1000,1000):uniform()
a:add(1)

... the GPU kernel to add 1 will only be scheduled for launch by a:add(1). It might not have completed yet, or even have reached the GPU, at the time that the a:add(1) returns

  • therefore for running wall-clock timings, you should call cutorch.synchronize() before each timecheck point:
require 'cutorch'
require 'sys'

local a = torch.CudaTensor(1000,1000):uniform()
cutorch.synchronize()
start = sys.tic()
a:add(1)
cutorch.synchronize()
print(sys.toc())
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].