WheelsPerformance-optimized wheels for TensorFlow (SSE, AVX, FMA, XLA, MPI)
Stars: ✭ 891 (+555.15%)
Scikit CudaPython interface to GPU-powered libraries
Stars: ✭ 803 (+490.44%)
H2o4gpuH2Oai GPU Edition
Stars: ✭ 416 (+205.88%)
Cudart.jlJulia wrapper for CUDA runtime API
Stars: ✭ 75 (-44.85%)
LightseqLightSeq: A High Performance Inference Library for Sequence Processing and Generation
Stars: ✭ 501 (+268.38%)
LibcudacxxThe C++ Standard Library for your entire system.
Stars: ✭ 1,861 (+1268.38%)
ChainerA flexible framework of neural networks for deep learning
Stars: ✭ 5,656 (+4058.82%)
HeteroflowConcurrent CPU-GPU Programming using Task Models
Stars: ✭ 57 (-58.09%)
ArboretumGradient Boosting powered by GPU(NVIDIA CUDA)
Stars: ✭ 64 (-52.94%)
TrainyourownyoloTrain a state-of-the-art yolov3 object detector from scratch!
Stars: ✭ 399 (+193.38%)
DeepnetDeep.Net machine learning framework for F#
Stars: ✭ 99 (-27.21%)
CubertFast implementation of BERT inference directly on NVIDIA (CUDA, CUBLAS) and Intel MKL
Stars: ✭ 395 (+190.44%)
BitcrackerBitCracker is the first open source password cracking tool for memory units encrypted with BitLocker
Stars: ✭ 463 (+240.44%)
Tensorflow CmakeTensorFlow examples in C, C++, Go and Python without bazel but with cmake and FindTensorFlow.cmake
Stars: ✭ 418 (+207.35%)
OnemkloneAPI Math Kernel Library (oneMKL) Interfaces
Stars: ✭ 122 (-10.29%)
ThundergbmThunderGBM: Fast GBDTs and Random Forests on GPUs
Stars: ✭ 586 (+330.88%)
PyopenclOpenCL integration for Python, plus shiny features
Stars: ✭ 790 (+480.88%)
Stdgpustdgpu: Efficient STL-like Data Structures on the GPU
Stars: ✭ 531 (+290.44%)
MixbenchA GPU benchmark tool for evaluating GPUs on mixed operational intensity kernels (CUDA, OpenCL, HIP, SYCL)
Stars: ✭ 130 (-4.41%)
Qualia2.0Qualia is a deep learning framework deeply integrated with automatic differentiation and dynamic graphing with CUDA acceleration. Qualia was built from scratch.
Stars: ✭ 41 (-69.85%)
GgnnGGNN: State of the Art Graph-based GPU Nearest Neighbor Search
Stars: ✭ 63 (-53.68%)
Deeppipe2Deep Learning library using GPU(CUDA/cuBLAS)
Stars: ✭ 90 (-33.82%)
NumerNumeric Erlang - vector and matrix operations with CUDA. Heavily inspired by Pteracuda - https://github.com/kevsmith/pteracuda
Stars: ✭ 91 (-33.09%)
CudfcuDF - GPU DataFrame Library
Stars: ✭ 4,370 (+3113.24%)
Futhark💥💻💥 A data-parallel functional programming language
Stars: ✭ 1,641 (+1106.62%)
HipsyclImplementation of SYCL for CPUs, AMD GPUs, NVIDIA GPUs
Stars: ✭ 377 (+177.21%)
Gpu Rest EngineA REST API for Caffe using Docker and Go
Stars: ✭ 412 (+202.94%)
IlgpuILGPU JIT Compiler for high-performance .Net GPU programs
Stars: ✭ 374 (+175%)
CaerHigh-performance Vision library in Python. Scale your research, not boilerplate.
Stars: ✭ 452 (+232.35%)
Open3dOpen3D: A Modern Library for 3D Data Processing
Stars: ✭ 5,860 (+4208.82%)
RustacudaRusty wrapper for the CUDA Driver API
Stars: ✭ 511 (+275.74%)
Cuda.jlCUDA programming in Julia.
Stars: ✭ 370 (+172.06%)
CudasiftA CUDA implementation of SIFT for NVidia GPUs (1.2 ms on a GTX 1060)
Stars: ✭ 555 (+308.09%)
CupyNumPy & SciPy for GPU
Stars: ✭ 5,625 (+4036.03%)
SpeedtorchLibrary for faster pinned CPU <-> GPU transfer in Pytorch
Stars: ✭ 615 (+352.21%)
Lighthouse2Lighthouse 2 framework for real-time ray tracing
Stars: ✭ 542 (+298.53%)
MarianFast Neural Machine Translation in C++
Stars: ✭ 777 (+471.32%)
GunrockHigh-Performance Graph Primitives on GPUs
Stars: ✭ 718 (+427.94%)
Turbotransformersa fast and user-friendly runtime for transformer inference (Bert, Albert, GPT2, Decoders, etc) on CPU and GPU.
Stars: ✭ 826 (+507.35%)
Cuda Api WrappersThin C++-flavored wrappers for the CUDA Runtime API
Stars: ✭ 362 (+166.18%)
Nvidia libs testTests and benchmarks for cudnn (and in the future, other nvidia libraries)
Stars: ✭ 36 (-73.53%)
CudaExperiments with CUDA and Rust
Stars: ✭ 31 (-77.21%)
Carlsim3CARLsim is an efficient, easy-to-use, GPU-accelerated software framework for simulating large-scale spiking neural network (SNN) models with a high degree of biological detail.
Stars: ✭ 52 (-61.76%)
CubCooperative primitives for CUDA C++.
Stars: ✭ 883 (+549.26%)
Tsne CudaGPU Accelerated t-SNE for CUDA with Python bindings
Stars: ✭ 1,120 (+723.53%)
PycudaCUDA integration for Python, plus shiny features
Stars: ✭ 1,112 (+717.65%)
ParenchymaAn extensible HPC framework for CUDA, OpenCL and native CPU.
Stars: ✭ 71 (-47.79%)
GraphviteGraphVite: A General and High-performance Graph Embedding System
Stars: ✭ 865 (+536.03%)
ThundersvmThunderSVM: A Fast SVM Library on GPUs and CPUs
Stars: ✭ 1,282 (+842.65%)
PynvvlA Python wrapper of NVIDIA Video Loader (NVVL) with CuPy for fast video loading with Python
Stars: ✭ 95 (-30.15%)
MprReference implementation for "Massively Parallel Rendering of Complex Closed-Form Implicit Surfaces" (SIGGRAPH 2020)
Stars: ✭ 84 (-38.24%)
BayaderaHigh-performance Bayesian Data Analysis on the GPU in Clojure
Stars: ✭ 342 (+151.47%)
Arrayfire PythonPython bindings for ArrayFire: A general purpose GPU library.
Stars: ✭ 358 (+163.24%)
NeanderthalFast Clojure Matrix Library
Stars: ✭ 927 (+581.62%)
Cuda Design PatternsSome CUDA design patterns and a bit of template magic for CUDA
Stars: ✭ 78 (-42.65%)
PygraphistryPyGraphistry is a Python library to quickly load, shape, embed, and explore big graphs with the GPU-accelerated Graphistry visual graph analyzer
Stars: ✭ 1,365 (+903.68%)
Tensorflow Object Detection TutorialThe purpose of this tutorial is to learn how to install and prepare TensorFlow framework to train your own convolutional neural network object detection classifier for multiple objects, starting from scratch
Stars: ✭ 113 (-16.91%)