CARLsim is an efficient, easy-to-use, GPU-accelerated software framework for simulating large-scale spiking neural network (SNN) models with a high degree of biological detail.

Stars: ✭ 52 (-58.4%)

Mutual labels: cuda

Knn cuda

pytorch knn [cuda version]

Stars: ✭ 86 (-31.2%)

Mutual labels: cuda

Cs344

Introduction to Parallel Programming class code

Stars: ✭ 1,051 (+740.8%)

Mutual labels: cuda

Pytorch spn

Extension package for spatial propagation network in pytorch.

Stars: ✭ 114 (-8.8%)

Mutual labels: cuda

Singularity Tutorial

Tutorial for using Singularity containers

Stars: ✭ 46 (-63.2%)

Mutual labels: cuda

Pytorch Emdloss

PyTorch 1.0 implementation of the approximate Earth Mover's Distance

Stars: ✭ 82 (-34.4%)

Mutual labels: cuda

Lyra

Stars: ✭ 43 (-65.6%)

Mutual labels: cuda

Deepnet

Deep.Net machine learning framework for F#

Stars: ✭ 99 (-20.8%)

Mutual labels: cuda

Cuda Convnet2.torch

Torch7 bindings for cuda-convnet2 kernels!

Stars: ✭ 42 (-66.4%)

Mutual labels: cuda

Nnabla Ext Cuda

A CUDA Extension of Neural Network Libraries

Stars: ✭ 79 (-36.8%)

Mutual labels: cuda

Sixtyfour

How fast can we brute force a 64-bit comparison?

Stars: ✭ 41 (-67.2%)

Mutual labels: cuda

Warp Rnnt

CUDA-Warp RNN-Transducer

Stars: ✭ 122 (-2.4%)

Mutual labels: cuda

Nbody

N body gravity attraction problem solver

Stars: ✭ 40 (-68%)

Mutual labels: cuda

Cuda Design Patterns

Some CUDA design patterns and a bit of template magic for CUDA

Stars: ✭ 78 (-37.6%)

Mutual labels: cuda

Soul Engine

Physically based renderer and simulation engine for real-time applications.

Stars: ✭ 37 (-70.4%)

Mutual labels: cuda

Extending Jax

Extending JAX with custom C++ and CUDA code

Stars: ✭ 98 (-21.6%)

Mutual labels: cuda

Nvidia libs test

Tests and benchmarks for cudnn (and in the future, other nvidia libraries)

Stars: ✭ 36 (-71.2%)

Mutual labels: cuda

Cudart.jl

Julia wrapper for CUDA runtime API

Stars: ✭ 75 (-40%)

Mutual labels: cuda

Object Detection And Location Realsensed435

Use the Intel D435 real-sensing camera to realize target detection based on the Yolov3 framework under the Opencv DNN framework, and realize the 3D positioning of the Objection according to the depth information. Real-time display of the coordinates in the camera coordinate system.ADD--Using Yolov5 By TensorRT model,AGX-Xavier,RealTime Object Detection

Stars: ✭ 36 (-71.2%)

Mutual labels: cuda

Pytorch Unflow

a reimplementation of UnFlow in PyTorch that matches the official TensorFlow version

Stars: ✭ 113 (-9.6%)

Mutual labels: cuda

Deformable Convolution V2 Pytorch

Deformable ConvNets V2 (DCNv2) in PyTorch

Stars: ✭ 963 (+670.4%)

Mutual labels: cuda

Parenchyma

An extensible HPC framework for CUDA, OpenCL and native CPU.

Stars: ✭ 71 (-43.2%)

Mutual labels: cuda

Cuda

Experiments with CUDA and Rust

Stars: ✭ 31 (-75.2%)

Mutual labels: cuda

Pynvvl

A Python wrapper of NVIDIA Video Loader (NVVL) with CuPy for fast video loading with Python

Stars: ✭ 95 (-24%)

Mutual labels: cuda

Cuda Utilities

Utilities for CUDA programming

Stars: ✭ 30 (-76%)

Mutual labels: cuda

Deepjointfilter

The source code of ECCV16 'Deep Joint Image Filtering'.

Stars: ✭ 68 (-45.6%)

Mutual labels: cuda

Des Cuda

DES cracking using brute force algorithm and CUDA

Stars: ✭ 21 (-83.2%)

Mutual labels: cuda

Tensorflow Optimized Wheels

TensorFlow wheels built for latest CUDA/CuDNN and enabled performance flags: SSE, AVX, FMA; XLA

Stars: ✭ 118 (-5.6%)

Mutual labels: cuda

Imagenet Classifier Tensorflow

Image recognition and classification using Convolutional Neural Networks with TensorFlow

Stars: ✭ 13 (-89.6%)

Mutual labels: cuda

Alenka

GPU database engine

Stars: ✭ 1,150 (+820%)

Mutual labels: cuda

Uammd

A CUDA project for Molecular Dynamics, Brownian Dynamics, Hydrodynamics... intended to simulate a very generic system constructing a simulation with modules.

Stars: ✭ 11 (-91.2%)

Mutual labels: cuda

Fbtt Embedding

This is a Tensor Train based compression library to compress sparse embedding tables used in large-scale machine learning models such as recommendation and natural language processing. We showed this library can reduce the total model size by up to 100x in Facebook’s open sourced DLRM model while achieving same model quality. Our implementation is faster than the state-of-the-art implementations. Existing the state-of-the-art library also decompresses the whole embedding tables on the fly therefore they do not provide memory reduction during runtime of the training. Our library decompresses only the requested rows therefore can provide 10,000 times memory footprint reduction per embedding table. The library also includes a software cache to store a portion of the entries in the table in decompressed format for faster lookup and process.

Stars: ✭ 92 (-26.4%)

Mutual labels: cuda

Gpu badmm mt

Bregman ADMM for mass transportation on GPU

Stars: ✭ 10 (-92%)

Mutual labels: cuda

Autodock Gpu

AutoDock for GPUs and other accelerators

Stars: ✭ 65 (-48%)

Mutual labels: cuda

Presentations

Slides and demo code for past presentations