ParenchymaAn extensible HPC framework for CUDA, OpenCL and native CPU.
Stars: ✭ 71 (+97.22%)
Arrayfire PythonPython bindings for ArrayFire: A general purpose GPU library.
Stars: ✭ 358 (+894.44%)
IlgpuILGPU JIT Compiler for high-performance .Net GPU programs
Stars: ✭ 374 (+938.89%)
GraphviteGraphVite: A General and High-performance Graph Embedding System
Stars: ✭ 865 (+2302.78%)
CekirdeklerMulti-device OpenCL kernel load balancer and pipeliner API for C#. Uses shared-distributed memory model to keep GPUs updated fast while using same kernel on all devices(for simplicity).
Stars: ✭ 76 (+111.11%)
NyuziprocessorGPGPU microprocessor architecture
Stars: ✭ 1,351 (+3652.78%)
Deeppipe2Deep Learning library using GPU(CUDA/cuBLAS)
Stars: ✭ 90 (+150%)
SpeedtorchLibrary for faster pinned CPU <-> GPU transfer in Pytorch
Stars: ✭ 615 (+1608.33%)
ThundergbmThunderGBM: Fast GBDTs and Random Forests on GPUs
Stars: ✭ 586 (+1527.78%)
ForwardA library for high performance deep learning inference on NVIDIA GPUs.
Stars: ✭ 136 (+277.78%)
PrimitivA Neural Network Toolkit.
Stars: ✭ 164 (+355.56%)
Hoomd BlueMolecular dynamics and Monte Carlo soft matter simulation on GPUs.
Stars: ✭ 143 (+297.22%)
Nvidia DockerBuild and run Docker containers leveraging NVIDIA GPUs
Stars: ✭ 13,961 (+38680.56%)
MprReference implementation for "Massively Parallel Rendering of Complex Closed-Form Implicit Surfaces" (SIGGRAPH 2020)
Stars: ✭ 84 (+133.33%)
H2o4gpuH2Oai GPU Edition
Stars: ✭ 416 (+1055.56%)
Open3dOpen3D: A Modern Library for 3D Data Processing
Stars: ✭ 5,860 (+16177.78%)
CudasiftA CUDA implementation of SIFT for NVidia GPUs (1.2 ms on a GTX 1060)
Stars: ✭ 555 (+1441.67%)
BabelstreamSTREAM, for lots of devices written in many programming models
Stars: ✭ 121 (+236.11%)
CaerHigh-performance Vision library in Python. Scale your research, not boilerplate.
Stars: ✭ 452 (+1155.56%)
SenetSqueeze-and-Excitation Networks
Stars: ✭ 2,850 (+7816.67%)
LuisaRenderHigh-Performance Multiple-Backend Renderer Based on LuisaCompute
Stars: ✭ 47 (+30.56%)
TutorialsSome basic programming tutorials
Stars: ✭ 353 (+880.56%)
CudfcuDF - GPU DataFrame Library
Stars: ✭ 4,370 (+12038.89%)
PicongpuParticle-in-Cell Simulations for the Exascale Era ✨
Stars: ✭ 452 (+1155.56%)
WheelsPerformance-optimized wheels for TensorFlow (SSE, AVX, FMA, XLA, MPI)
Stars: ✭ 891 (+2375%)
GmatrixR package for unleashing the power of NVIDIA GPU's
Stars: ✭ 16 (-55.56%)
DllFast Deep Learning Library (DLL) for C++ (ANNs, CNNs, RBMs, DBNs...)
Stars: ✭ 605 (+1580.56%)
Webgl WindWind power visualization with WebGL particles
Stars: ✭ 601 (+1569.44%)
UammdA CUDA project for Molecular Dynamics, Brownian Dynamics, Hydrodynamics... intended to simulate a very generic system constructing a simulation with modules.
Stars: ✭ 11 (-69.44%)
Ddsh Tip2018source code for paper "Deep Discrete Supervised Hashing"
Stars: ✭ 16 (-55.56%)
BenchmarkA microbenchmark support library
Stars: ✭ 5,987 (+16530.56%)
Compute RuntimeIntel® Graphics Compute Runtime for oneAPI Level Zero and OpenCL™ Driver
Stars: ✭ 593 (+1547.22%)
CudadbclusteringClustering via Graphics Processor, using NVIDIA CUDA sdk to preform database clustering on the massively parallel graphics card processor
Stars: ✭ 6 (-83.33%)
CeleroC++ Benchmark Authoring Library/Framework
Stars: ✭ 593 (+1547.22%)
Glchaos.p3D GPUs Strange Attractors and Hypercomplex Fractals explorer - up to 256 Million particles in RealTime
Stars: ✭ 590 (+1538.89%)
DrlkitA High Level Python Deep Reinforcement Learning library. Great for beginners, prototyping and quickly comparing algorithms
Stars: ✭ 29 (-19.44%)
Theano Roi AlignAn implementation of the RoiAlign operation for Theano
Stars: ✭ 11 (-69.44%)
Turbotransformersa fast and user-friendly runtime for transformer inference (Bert, Albert, GPT2, Decoders, etc) on CPU and GPU.
Stars: ✭ 826 (+2194.44%)
OcbarrageiOS 弹幕库 OCBarrage, 同时渲染5000条弹幕也不卡, 轻量, 可拓展, 高度自定义动画, 超高性能, 简单易上手; A barrage render-engine with high performance for iOS. At the same time, rendering 5000 barrages is also very smooth, lightweight, scalable, highly custom animation, ultra high performance, simple and easy to use!
Stars: ✭ 589 (+1536.11%)
Tensorflow.jlA Julia wrapper for TensorFlow
Stars: ✭ 822 (+2183.33%)
TaskflowA General-purpose Parallel and Heterogeneous Task Programming System
Stars: ✭ 6,128 (+16922.22%)
TrtorchPyTorch/TorchScript compiler for NVIDIA GPUs using TensorRT
Stars: ✭ 583 (+1519.44%)
Gpu badmm mtBregman ADMM for mass transportation on GPU
Stars: ✭ 10 (-72.22%)
LibcudarangeAn interval arithmetic and affine arithmetic library for NVIDIA CUDA
Stars: ✭ 5 (-86.11%)
AsvAirspeed Velocity: A simple Python benchmarking tool with web-based reporting
Stars: ✭ 570 (+1483.33%)
ScannerEfficient video analysis at scale
Stars: ✭ 569 (+1480.56%)
Pytorch Losslabel-smooth, amsoftmax, focal-loss, triplet-loss, lovasz-softmax. Maybe useful
Stars: ✭ 812 (+2155.56%)
Gpu Gems Book Source Code💿 CD Content ( Source Code ) Collection of Book <GPU Gems > 1~ 3 | 《GPU精粹》 1~ 3 随书CD(源代码)珍藏
Stars: ✭ 567 (+1475%)
Esnext BenchmarksBenchmarks comparing ESNext features to their ES5 and various pre-processor equivalents
Stars: ✭ 28 (-22.22%)
Xmrig NvidiaMonero (XMR) NVIDIA miner
Stars: ✭ 560 (+1455.56%)
ClblastTuned OpenCL BLAS
Stars: ✭ 559 (+1452.78%)
AlphaposeReal-Time and Accurate Full-Body Multi-Person Pose Estimation&Tracking System
Stars: ✭ 5,697 (+15725%)
HuststoreHigh-performance Distributed Storage
Stars: ✭ 806 (+2138.89%)
PixelsA tiny hardware-accelerated pixel frame buffer. 🦀
Stars: ✭ 555 (+1441.67%)
GpusortingImplementation of a few sorting algorithms in OpenCL
Stars: ✭ 9 (-75%)