PycudaCUDA integration for Python, plus shiny features
Stars: ✭ 1,112 (+789.6%)
Deeppipe2Deep Learning library using GPU(CUDA/cuBLAS)
Stars: ✭ 90 (-28%)
MtensorA C++ Cuda Tensor Lazy Computing Library
Stars: ✭ 115 (-8%)
Flattened CnnFlattened convolutional neural networks (1D convolution modules for Torch nn)
Stars: ✭ 59 (-52.8%)
DokaiCollection of Docker images for ML/DL and video processing projects
Stars: ✭ 58 (-53.6%)
DaceDaCe - Data Centric Parallel Programming
Stars: ✭ 106 (-15.2%)
HeteroflowConcurrent CPU-GPU Programming using Task Models
Stars: ✭ 57 (-54.4%)
ThundersvmThunderSVM: A Fast SVM Library on GPUs and CPUs
Stars: ✭ 1,282 (+925.6%)
Nvbio GplNVBIO is a library of reusable components designed to accelerate bioinformatics applications using CUDA.
Stars: ✭ 56 (-55.2%)
BabelstreamSTREAM, for lots of devices written in many programming models
Stars: ✭ 121 (-3.2%)
Dink点云深度学习框架 | Point cloud Deep learning Framework
Stars: ✭ 56 (-55.2%)
Cuda WinogradFast CUDA Kernels for ResNet Inference.
Stars: ✭ 104 (-16.8%)
Carlsim3CARLsim is an efficient, easy-to-use, GPU-accelerated software framework for simulating large-scale spiking neural network (SNN) models with a high degree of biological detail.
Stars: ✭ 52 (-58.4%)
Knn cudapytorch knn [cuda version]
Stars: ✭ 86 (-31.2%)
Cs344Introduction to Parallel Programming class code
Stars: ✭ 1,051 (+740.8%)
Pytorch spnExtension package for spatial propagation network in pytorch.
Stars: ✭ 114 (-8.8%)
Pytorch EmdlossPyTorch 1.0 implementation of the approximate Earth Mover's Distance
Stars: ✭ 82 (-34.4%)
Lyra Stars: ✭ 43 (-65.6%)
DeepnetDeep.Net machine learning framework for F#
Stars: ✭ 99 (-20.8%)
Nnabla Ext CudaA CUDA Extension of Neural Network Libraries
Stars: ✭ 79 (-36.8%)
SixtyfourHow fast can we brute force a 64-bit comparison?
Stars: ✭ 41 (-67.2%)
Warp RnntCUDA-Warp RNN-Transducer
Stars: ✭ 122 (-2.4%)
NbodyN body gravity attraction problem solver
Stars: ✭ 40 (-68%)
Cuda Design PatternsSome CUDA design patterns and a bit of template magic for CUDA
Stars: ✭ 78 (-37.6%)
Soul EnginePhysically based renderer and simulation engine for real-time applications.
Stars: ✭ 37 (-70.4%)
Extending JaxExtending JAX with custom C++ and CUDA code
Stars: ✭ 98 (-21.6%)
Nvidia libs testTests and benchmarks for cudnn (and in the future, other nvidia libraries)
Stars: ✭ 36 (-71.2%)
Cudart.jlJulia wrapper for CUDA runtime API
Stars: ✭ 75 (-40%)
Object Detection And Location Realsensed435Use the Intel D435 real-sensing camera to realize target detection based on the Yolov3 framework under the Opencv DNN framework, and realize the 3D positioning of the Objection according to the depth information. Real-time display of the coordinates in the camera coordinate system.ADD--Using Yolov5 By TensorRT model,AGX-Xavier,RealTime Object Detection
Stars: ✭ 36 (-71.2%)
Pytorch Unflow a reimplementation of UnFlow in PyTorch that matches the official TensorFlow version
Stars: ✭ 113 (-9.6%)
ParenchymaAn extensible HPC framework for CUDA, OpenCL and native CPU.
Stars: ✭ 71 (-43.2%)
CudaExperiments with CUDA and Rust
Stars: ✭ 31 (-75.2%)
PynvvlA Python wrapper of NVIDIA Video Loader (NVVL) with CuPy for fast video loading with Python
Stars: ✭ 95 (-24%)
DeepjointfilterThe source code of ECCV16 'Deep Joint Image Filtering'.
Stars: ✭ 68 (-45.6%)
Des CudaDES cracking using brute force algorithm and CUDA
Stars: ✭ 21 (-83.2%)
Tensorflow Optimized WheelsTensorFlow wheels built for latest CUDA/CuDNN and enabled performance flags: SSE, AVX, FMA; XLA
Stars: ✭ 118 (-5.6%)
AlenkaGPU database engine
Stars: ✭ 1,150 (+820%)
UammdA CUDA project for Molecular Dynamics, Brownian Dynamics, Hydrodynamics... intended to simulate a very generic system constructing a simulation with modules.
Stars: ✭ 11 (-91.2%)
Fbtt EmbeddingThis is a Tensor Train based compression library to compress sparse embedding tables used in large-scale machine learning models such as recommendation and natural language processing. We showed this library can reduce the total model size by up to 100x in Facebook’s open sourced DLRM model while achieving same model quality. Our implementation is faster than the state-of-the-art implementations. Existing the state-of-the-art library also decompresses the whole embedding tables on the fly therefore they do not provide memory reduction during runtime of the training. Our library decompresses only the requested rows therefore can provide 10,000 times memory footprint reduction per embedding table. The library also includes a software cache to store a portion of the entries in the table in decompressed format for faster lookup and process.
Stars: ✭ 92 (-26.4%)
Gpu badmm mtBregman ADMM for mass transportation on GPU
Stars: ✭ 10 (-92%)
Autodock GpuAutoDock for GPUs and other accelerators
Stars: ✭ 65 (-48%)
PresentationsSlides and demo code for past presentations
Stars: ✭ 7 (-94.4%)
Futhark💥💻💥 A data-parallel functional programming language
Stars: ✭ 1,641 (+1212.8%)
Cudadrv.jlA Julia wrapper for the CUDA driver API.
Stars: ✭ 64 (-48.8%)
FcisFully Convolutional Instance-aware Semantic Segmentation
Stars: ✭ 1,563 (+1150.4%)
OnemkloneAPI Math Kernel Library (oneMKL) Interfaces
Stars: ✭ 122 (-2.4%)
SpocStream Processing with OCaml
Stars: ✭ 115 (-8%)
CuheCUDA Homomorphic Encryption Library
Stars: ✭ 109 (-12.8%)
ElasticfusionReal-time dense visual SLAM system
Stars: ✭ 1,298 (+938.4%)
CutlassCUDA Templates for Linear Algebra Subroutines
Stars: ✭ 1,123 (+798.4%)