TutorialsSome basic programming tutorials
Stars: ✭ 353 (+115.24%)
Cs344Introduction to Parallel Programming class code
Stars: ✭ 1,051 (+540.85%)
VisionarayA C++-based, cross platform ray tracing library
Stars: ✭ 342 (+108.54%)
Extending JaxExtending JAX with custom C++ and CUDA code
Stars: ✭ 98 (-40.24%)
CudahandbookSource code that accompanies The CUDA Handbook.
Stars: ✭ 345 (+110.37%)
3GPU-accelerated micromagnetic simulator
Stars: ✭ 324 (+97.56%)
CudppCUDA Data Parallel Primitives Library
Stars: ✭ 333 (+103.05%)
Lyra Stars: ✭ 43 (-73.78%)
JitifyA single-header C++ library for simplifying the use of CUDA Runtime Compilation (NVRTC).
Stars: ✭ 314 (+91.46%)
PynvvlA Python wrapper of NVIDIA Video Loader (NVVL) with CuPy for fast video loading with Python
Stars: ✭ 95 (-42.07%)
ThrustThe C++ parallel algorithms library.
Stars: ✭ 3,595 (+2092.07%)
Knn CudaFast k nearest neighbor search using GPU
Stars: ✭ 310 (+89.02%)
LibcudacxxThe C++ Standard Library for your entire system.
Stars: ✭ 1,861 (+1034.76%)
Person Reid ganICCV2017 Unlabeled Samples Generated by GAN Improve the Person Re-identification Baseline in vitro
Stars: ✭ 301 (+83.54%)
SixtyfourHow fast can we brute force a 64-bit comparison?
Stars: ✭ 41 (-75%)
Deep High Resolution Net.pytorchThe project is an official implementation of our CVPR2019 paper "Deep High-Resolution Representation Learning for Human Pose Estimation"
Stars: ✭ 3,521 (+2046.95%)
Fbtt EmbeddingThis is a Tensor Train based compression library to compress sparse embedding tables used in large-scale machine learning models such as recommendation and natural language processing. We showed this library can reduce the total model size by up to 100x in Facebook’s open sourced DLRM model while achieving same model quality. Our implementation is faster than the state-of-the-art implementations. Existing the state-of-the-art library also decompresses the whole embedding tables on the fly therefore they do not provide memory reduction during runtime of the training. Our library decompresses only the requested rows therefore can provide 10,000 times memory footprint reduction per embedding table. The library also includes a software cache to store a portion of the entries in the table in decompressed format for faster lookup and process.
Stars: ✭ 92 (-43.9%)
Ffmpeg Build ScriptThe FFmpeg build script provides an easy way to build a static FFmpeg on OSX and Linux with non-free codecs included.
Stars: ✭ 290 (+76.83%)
NbodyN body gravity attraction problem solver
Stars: ✭ 40 (-75.61%)
SketchgraphsA dataset of 15 million CAD sketches with geometric constraint graphs.
Stars: ✭ 148 (-9.76%)
Cuarrays.jlA Curious Cumulation of CUDA Cuisine
Stars: ✭ 283 (+72.56%)
Soul EnginePhysically based renderer and simulation engine for real-time applications.
Stars: ✭ 37 (-77.44%)
Tensor StreamA library for real-time video stream decoding to CUDA memory
Stars: ✭ 277 (+68.9%)
FbcudaFacebook's CUDA extensions.
Stars: ✭ 275 (+67.68%)
Nvidia libs testTests and benchmarks for cudnn (and in the future, other nvidia libraries)
Stars: ✭ 36 (-78.05%)
AgencyExecution primitives for C++
Stars: ✭ 127 (-22.56%)
GprmaxgprMax is open source software that simulates electromagnetic wave propagation using the Finite-Difference Time-Domain (FDTD) method for numerical modelling of Ground Penetrating Radar (GPR)
Stars: ✭ 268 (+63.41%)
Object Detection And Location Realsensed435Use the Intel D435 real-sensing camera to realize target detection based on the Yolov3 framework under the Opencv DNN framework, and realize the 3D positioning of the Objection according to the depth information. Real-time display of the coordinates in the camera coordinate system.ADD--Using Yolov5 By TensorRT model,AGX-Xavier,RealTime Object Detection
Stars: ✭ 36 (-78.05%)
Kinectfusionlib Implementation of the KinectFusion approach in modern C++14 and CUDA
Stars: ✭ 261 (+59.15%)
MatconvnetMatConvNet: CNNs for MATLAB
Stars: ✭ 1,299 (+692.07%)
PopsiftPopSift is an implementation of the SIFT algorithm in CUDA.
Stars: ✭ 259 (+57.93%)
gpu-monitorScript to remotely check GPU servers for free GPUs
Stars: ✭ 85 (-48.17%)
RmmRAPIDS Memory Manager
Stars: ✭ 154 (-6.1%)
CudaExperiments with CUDA and Rust
Stars: ✭ 31 (-81.1%)
Torch-TensorRTPyTorch/TorchScript compiler for NVIDIA GPUs using TensorRT
Stars: ✭ 1,216 (+641.46%)
Deeppipe2Deep Learning library using GPU(CUDA/cuBLAS)
Stars: ✭ 90 (-45.12%)
crowdsource-video-experiments-on-androidCrowdsourcing video experiments (such as collaborative benchmarking and optimization of DNN algorithms) using Collective Knowledge Framework across diverse Android devices provided by volunteers. Results are continuously aggregated in the open repository:
Stars: ✭ 29 (-82.32%)
desertA fast (?) random sampling drawing library
Stars: ✭ 61 (-62.8%)
CPP-ProgrammingVarious C/C++ examples. DirectX, OpenGL, CUDA, Vulkan, OpenCL.
Stars: ✭ 30 (-81.71%)
Des CudaDES cracking using brute force algorithm and CUDA
Stars: ✭ 21 (-87.2%)
hipaccA domain-specific language and compiler for image processing
Stars: ✭ 72 (-56.1%)
Multi Gpu Programming ModelsExamples demonstrating available options to program multiple GPUs in a single node or a cluster
Stars: ✭ 165 (+0.61%)
Cx db8a contextual, biasable, word-or-sentence-or-paragraph extractive summarizer powered by the latest in text embeddings (Bert, Universal Sentence Encoder, Flair)
Stars: ✭ 164 (+0%)
Cuda CnnCNN accelerated by cuda. Test on mnist and finilly get 99.76%
Stars: ✭ 148 (-9.76%)
NnvmNo description or website provided.
Stars: ✭ 1,639 (+899.39%)
SupraSUPRA: Software Defined Ultrasound Processing for Real-Time Applications - An Open Source 2D and 3D Pipeline from Beamforming to B-Mode
Stars: ✭ 96 (-41.46%)
Slic cudaSuperpixel SLIC for GPU (CUDA)
Stars: ✭ 45 (-72.56%)
ArrayfireArrayFire: a general purpose GPU library.
Stars: ✭ 3,693 (+2151.83%)