gslibsparse communication library
Stars: ✭ 22 (-90.31%)
UcxUnified Communication X (mailing list - https://elist.ornl.gov/mailman/listinfo/ucx-group)
Stars: ✭ 471 (+107.49%)
KttKernel Tuning Toolkit
Stars: ✭ 33 (-85.46%)
euler2d kokkosSimple 2d finite volume solver for Euler equations using c++ kokkos library
Stars: ✭ 27 (-88.11%)
KernelsThis is a set of simple programs that can be used to explore the features of a parallel platform.
Stars: ✭ 287 (+26.43%)
ArraymancerA fast, ergonomic and portable tensor library in Nim with a deep learning focus for CPU, GPU and embedded devices via OpenMP, Cuda and OpenCL backends
Stars: ✭ 793 (+249.34%)
crowdsource-video-experiments-on-androidCrowdsourcing video experiments (such as collaborative benchmarking and optimization of DNN algorithms) using Collective Knowledge Framework across diverse Android devices provided by volunteers. Results are continuously aggregated in the open repository:
Stars: ✭ 29 (-87.22%)
hpcLearning and practice of high performance computing (CUDA, Vulkan, OpenCL, OpenMP, TBB, SSE/AVX, NEON, MPI, coroutines, etc. )
Stars: ✭ 39 (-82.82%)
Ctranslate2Fast inference engine for OpenNMT models
Stars: ✭ 140 (-38.33%)
Torstenlibrary of C++ functions that support applications of Stan in Pharmacometrics
Stars: ✭ 38 (-83.26%)
FLAMEGPU2FLAME GPU 2 is a GPU accelerated agent based modelling framework for C++ and Python
Stars: ✭ 25 (-88.99%)
bifrostA stream processing framework for high-throughput applications.
Stars: ✭ 48 (-78.85%)
warpcontinuous energy monte carlo neutron transport in general geometries on GPUs
Stars: ✭ 27 (-88.11%)
briefmatchBriefMatch real-time GPU optical flow
Stars: ✭ 36 (-84.14%)
hipaccA domain-specific language and compiler for image processing
Stars: ✭ 72 (-68.28%)
tiny-cuda-nnLightning fast & tiny C++/CUDA neural network framework
Stars: ✭ 908 (+300%)
gpu-monitorScript to remotely check GPU servers for free GPUs
Stars: ✭ 85 (-62.56%)
hpdbscanHighly parallel DBSCAN (HPDBSCAN)
Stars: ✭ 19 (-91.63%)
Deep DiamondA fast Clojure Tensor & Deep Learning library
Stars: ✭ 288 (+26.87%)
Awesome CudaThis is a list of useful libraries and resources for CUDA development.
Stars: ✭ 274 (+20.7%)
HeCBenchsoftware.intel.com/content/www/us/en/develop/articles/repo-evaluating-performance-productivity-oneapi.html
Stars: ✭ 85 (-62.56%)
HemiSimple utilities to enable code reuse and portability between CUDA C/C++ and standard C/C++.
Stars: ✭ 275 (+21.15%)
BayaderaHigh-performance Bayesian Data Analysis on the GPU in Clojure
Stars: ✭ 342 (+50.66%)
ThrustThe C++ parallel algorithms library.
Stars: ✭ 3,595 (+1483.7%)
CudfcuDF - GPU DataFrame Library
Stars: ✭ 4,370 (+1825.11%)
HipsyclImplementation of SYCL for CPUs, AMD GPUs, NVIDIA GPUs
Stars: ✭ 377 (+66.08%)
CaerHigh-performance Vision library in Python. Scale your research, not boilerplate.
Stars: ✭ 452 (+99.12%)
IlgpuILGPU JIT Compiler for high-performance .Net GPU programs
Stars: ✭ 374 (+64.76%)
OmpiOpen MPI main development repository
Stars: ✭ 1,221 (+437.89%)
ComputeA C++ GPU Computing Library for OpenCL
Stars: ✭ 1,192 (+425.11%)
PymapdPython client for OmniSci GPU-accelerated SQL engine and analytics platform
Stars: ✭ 109 (-51.98%)
az-hopThe Azure HPC On-Demand Platform provides an HPC Cluster Ready solution
Stars: ✭ 33 (-85.46%)
TimemoryModular C++ Toolkit for Performance Analysis and Logging. Profiling API and Tools for C, C++, CUDA, Fortran, and Python. The C++ template API is essentially a framework to creating tools: it is designed to provide a unifying interface for recording various performance measurements alongside data logging and interfaces to other tools.
Stars: ✭ 192 (-15.42%)
ParMmgDistributed parallelization of 3D volume mesh adaptation
Stars: ✭ 19 (-91.63%)
SpeedtorchLibrary for faster pinned CPU <-> GPU transfer in Pytorch
Stars: ✭ 615 (+170.93%)
GunrockHigh-Performance Graph Primitives on GPUs
Stars: ✭ 718 (+216.3%)
ThundergbmThunderGBM: Fast GBDTs and Random Forests on GPUs
Stars: ✭ 586 (+158.15%)
Scikit CudaPython interface to GPU-powered libraries
Stars: ✭ 803 (+253.74%)
CudasiftA CUDA implementation of SIFT for NVidia GPUs (1.2 ms on a GTX 1060)
Stars: ✭ 555 (+144.49%)
hp2pHeavy Peer To Peer: a MPI based benchmark for network diagnostic
Stars: ✭ 17 (-92.51%)
CudaExperiments with CUDA and Rust
Stars: ✭ 31 (-86.34%)
Qualia2.0Qualia is a deep learning framework deeply integrated with automatic differentiation and dynamic graphing with CUDA acceleration. Qualia was built from scratch.
Stars: ✭ 41 (-81.94%)
CubCooperative primitives for CUDA C++.
Stars: ✭ 883 (+288.99%)
NsimdAgenium Scale vectorization library for CPUs and GPUs
Stars: ✭ 138 (-39.21%)
DashDASH, the C++ Template Library for Distributed Data Structures with Support for Hierarchical Locality for HPC and Data-Driven Science
Stars: ✭ 134 (-40.97%)
GinkgoNumerical linear algebra software package
Stars: ✭ 149 (-34.36%)
CupyNumPy & SciPy for GPU
Stars: ✭ 5,625 (+2377.97%)
MprReference implementation for "Massively Parallel Rendering of Complex Closed-Form Implicit Surfaces" (SIGGRAPH 2020)
Stars: ✭ 84 (-63%)
Cuda Design PatternsSome CUDA design patterns and a bit of template magic for CUDA
Stars: ✭ 78 (-65.64%)
ThundersvmThunderSVM: A Fast SVM Library on GPUs and CPUs
Stars: ✭ 1,282 (+464.76%)
ForwardA library for high performance deep learning inference on NVIDIA GPUs.
Stars: ✭ 136 (-40.09%)
HpcinfoInformation about many aspects of high-performance computing. Wiki content moved to ~/docs.
Stars: ✭ 171 (-24.67%)
Coreparallel finite element unstructured meshes
Stars: ✭ 124 (-45.37%)
UmpireAn application-focused API for memory management on NUMA & GPU architectures
Stars: ✭ 154 (-32.16%)
peakperfAchieve peak performance on x86 CPUs and NVIDIA GPUs
Stars: ✭ 33 (-85.46%)
GalaxyGalaxy is an asynchronous parallel visualization ray tracer for performant rendering in distributed computing environments. Galaxy builds upon Intel OSPRay and Intel Embree, including ray queueing and sending logic inspired by TACC GraviT.
Stars: ✭ 18 (-92.07%)
amh-codeComplete implementations from "Algorithms for Modern Hardware"
Stars: ✭ 247 (+8.81%)
waldur-mastermindWaldur MasterMind is a hybrid cloud orchestrator.
Stars: ✭ 37 (-83.7%)