Top 527 cuda open source projects

Compactcnncascade
A binary library for very fast face detection using compact CNNs.
Jetson
Helmut Hoffer von Ankershoffen experimenting with arm64 based NVIDIA Jetson (Nano and AGX Xavier) edge devices running Kubernetes (K8s) for machine learning (ML) including Jupyter Notebooks, TensorFlow Training and TensorFlow Serving using CUDA for smart IoT.
Lantern
✭ 150
cuda
Ginkgo
Numerical linear algebra software package
Cuda Cnn
CNN accelerated by cuda. Test on mnist and finilly get 99.76%
✭ 148
cuda
Sketchgraphs
A dataset of 15 million CAD sketches with geometric constraint graphs.
✭ 148
cuda
Optical Flow Filter
A real time optical flow algorithm implemented on GPU
Volumetric Path Tracer
☁️ Volumetric path tracer using cuda
Gpurir
Python library for Room Impulse Response (RIR) simulation with GPU acceleration
Remotery
Single C file, Realtime CPU/GPU Profiler with Remote Web Viewer
Hoomd Blue
Molecular dynamics and Monte Carlo soft matter simulation on GPUs.
Libgdf
[ARCHIVED] C GPU DataFrame Library
✭ 142
cuda
Forward
A library for high performance deep learning inference on NVIDIA GPUs.
Nsimd
Agenium Scale vectorization library for CPUs and GPUs
Marian Dev
Fast Neural Machine Translation in C++ - development repository
Spanet
Spatial Attentive Single-Image Deraining with a High Quality Real Rain Dataset (CVPR'19)
Partial Order Pruning
Partial Order Pruning: for Best Speed/Accuracy Trade-off in Neural Architecture Search
✭ 135
cuda
Mixbench
A GPU benchmark tool for evaluating GPUs on mixed operational intensity kernels (CUDA, OpenCL, HIP, SYCL)
Py Faster Rcnn Windows
py-faster-rcnn that can compile on windows directly
✭ 126
cuda
Kaldi
kaldi-asr/kaldi is the official location of the Kaldi project.
Waveglow inference in cuda
C++ Code to run waveglow inference in cuda
✭ 125
cuda
Warp Rnnt
CUDA-Warp RNN-Transducer
✭ 122
pythoncuda
Onemkl
oneAPI Math Kernel Library (oneMKL) Interfaces
Babelstream
STREAM, for lots of devices written in many programming models
Knn cuda
Fast K-Nearest Neighbor search with GPU
✭ 119
cuda
Tensorflow Optimized Wheels
TensorFlow wheels built for latest CUDA/CuDNN and enabled performance flags: SSE, AVX, FMA; XLA
Spoc
Stream Processing with OCaml
Mtensor
A C++ Cuda Tensor Lazy Computing Library
Cltune
CLTune: An automatic OpenCL & CUDA kernel tuner
✭ 114
cudaopencl
Pytorch spn
Extension package for spatial propagation network in pytorch.
✭ 114
cuda
Tensorflow Object Detection Tutorial
The purpose of this tutorial is to learn how to install and prepare TensorFlow framework to train your own convolutional neural network object detection classifier for multiple objects, starting from scratch
Pytorch Unflow
a reimplementation of UnFlow in PyTorch that matches the official TensorFlow version
Adacof Pytorch
Official source code for our paper "AdaCoF: Adaptive Collaboration of Flows for Video Frame Interpolation" (CVPR 2020)
Futhark
💥💻💥 A data-parallel functional programming language
Cuhe
CUDA Homomorphic Encryption Library
✭ 109
cuda
Hashcat
World's fastest and most advanced password recovery utility
Dace
DaCe - Data Centric Parallel Programming
Chamferdistancepytorch
Chamfer Distance in Pytorch with f-score
✭ 105
cuda
Cuda Winograd
Fast CUDA Kernels for ResNet Inference.
✭ 104
cuda
Pygraphistry
PyGraphistry is a Python library to quickly load, shape, embed, and explore big graphs with the GPU-accelerated Graphistry visual graph analyzer
Dpp
Detail-Preserving Pooling in Deep Networks (CVPR 2018)
✭ 99
cuda
Extending Jax
Extending JAX with custom C++ and CUDA code
✭ 98
pythoncuda
Supra
SUPRA: Software Defined Ultrasound Processing for Real-Time Applications - An Open Source 2D and 3D Pipeline from Beamforming to B-Mode
Pynvvl
A Python wrapper of NVIDIA Video Loader (NVVL) with CuPy for fast video loading with Python
Region Conv
Not All Pixels Are Equal: Difficulty-Aware Semantic Segmentation via Deep Layer Cascade
Fbtt Embedding
This is a Tensor Train based compression library to compress sparse embedding tables used in large-scale machine learning models such as recommendation and natural language processing. We showed this library can reduce the total model size by up to 100x in Facebook’s open sourced DLRM model while achieving same model quality. Our implementation is faster than the state-of-the-art implementations. Existing the state-of-the-art library also decompresses the whole embedding tables on the fly therefore they do not provide memory reduction during runtime of the training. Our library decompresses only the requested rows therefore can provide 10,000 times memory footprint reduction per embedding table. The library also includes a software cache to store a portion of the entries in the table in decompressed format for faster lookup and process.
✭ 92
cuda
Numer
Numeric Erlang - vector and matrix operations with CUDA. Heavily inspired by Pteracuda - https://github.com/kevsmith/pteracuda
Elasticfusion
Real-time dense visual SLAM system
Matconvnet
MatConvNet: CNNs for MATLAB
✭ 1,299
cuda
Aurora
Minimal Deep Learning library is written in Python/Cython/C++ and Numpy/CUDA/cuDNN.
61-120 of 527 cuda projects