Ctranslate2Fast inference engine for OpenNMT models
Stars: ✭ 140 (+833.33%)
gardeniaGARDENIA: Graph Analytics Repository for Designing Efficient Next-generation Accelerators
Stars: ✭ 22 (+46.67%)
LaserThe HPC toolbox: fused matrix multiplication, convolution, data-parallel strided tensor primitives, OpenMP facilities, SIMD, JIT Assembler, CPU detection, state-of-the-art vectorized BLAS for floats and integers
Stars: ✭ 191 (+1173.33%)
OptimOptimLib: a lightweight C++ library of numerical optimization methods for nonlinear functions
Stars: ✭ 411 (+2640%)
URTFast Unit Root Tests and OLS regression in C++ with wrappers for R and Python
Stars: ✭ 70 (+366.67%)
Ytk Mp4jYtk-mp4j is a fast, user-friendly, cross-platform, multi-process, multi-thread collective message passing java library which includes gather, scatter, allgather, reduce-scatter, broadcast, reduce, allreduce communications for distributed machine learning.
Stars: ✭ 102 (+580%)
HDR-imagingAn implementation of "Paul E. Debevec, Jitendra Malik, Recovering High Dynamic Range Radiance Maps from Photographs, SIGGRAPH 1997."
Stars: ✭ 55 (+266.67%)
KratosKratos Multiphysics (A.K.A Kratos) is a framework for building parallel multi-disciplinary simulation software. Modularity, extensibility and HPC are the main objectives. Kratos has BSD license and is written in C++ with extensive Python interface.
Stars: ✭ 558 (+3620%)
Foundations of HPC 2021This repository collects the materials from the course "Foundations of HPC", 2021, at the Data Science and Scientific Computing Department, University of Trieste
Stars: ✭ 22 (+46.67%)
OnednnoneAPI Deep Neural Network Library (oneDNN)
Stars: ✭ 2,600 (+17233.33%)
WeaveA state-of-the-art multithreading runtime: message-passing based, fast, scalable, ultra-low overhead
Stars: ✭ 305 (+1933.33%)
saccadeA sophisticated scientific image viewer for Linux with OpenGL support and synchronized viewports
Stars: ✭ 38 (+153.33%)
Rawspeedfast raw decoding library
Stars: ✭ 179 (+1093.33%)
sparse-somEfficient Self-Organizing Map for Sparse Data
Stars: ✭ 17 (+13.33%)
Corrfunc⚡️⚡️⚡️Blazing fast correlation functions on the CPU.
Stars: ✭ 114 (+660%)
cpu-gbfilter♨️ Optimized Gaussian blur filter on CPU.
Stars: ✭ 14 (-6.67%)
Training MaterialA collection of code examples as well as presentations for training purposes
Stars: ✭ 85 (+466.67%)
yaskYASK--Yet Another Stencil Kit: a domain-specific language and framework to create high-performance stencil code for implementing finite-difference methods and similar applications.
Stars: ✭ 81 (+440%)
EdgeExtreme-scale Discontinuous Galerkin Environment (EDGE)
Stars: ✭ 18 (+20%)
ISP-pipeline-hdrplusDenoise,HDR,Isppipeline,Image-processing(图形处理),camera, Isp ,HDRplus
Stars: ✭ 412 (+2646.67%)
Stdgpustdgpu: Efficient STL-like Data Structures on the GPU
Stars: ✭ 531 (+3440%)
nanoxNanos++ is a runtime designed to serve as runtime support in parallel environments. It is mainly used to support OmpSs, a extension to OpenMP developed at BSC.
Stars: ✭ 37 (+146.67%)
AmgclC++ library for solving large sparse linear systems with algebraic multigrid method
Stars: ✭ 390 (+2500%)
OccaJIT Compilation for Multiple Architectures: C++, OpenMP, CUDA, HIP, OpenCL, Metal
Stars: ✭ 230 (+1433.33%)
BvhA modern C++ BVH construction and traversal library
Stars: ✭ 216 (+1340%)
StatsA C++ header-only library of statistical distribution functions.
Stars: ✭ 292 (+1846.67%)
matrix multiplicationParallel Matrix Multiplication Using OpenMP, Phtreads, and MPI
Stars: ✭ 41 (+173.33%)
Primecount🚀 Fast prime counting function implementations
Stars: ✭ 193 (+1186.67%)
GOMCGOMC - GPU Optimized Monte Carlo is a parallel molecular simulation code designed for high-performance simulation of large systems
Stars: ✭ 41 (+173.33%)
BkcrackCrack legacy zip encryption with Biham and Kocher's known plaintext attack.
Stars: ✭ 178 (+1086.67%)
libquoDynamic execution environments for coupled, thread-heterogeneous MPI+X applications
Stars: ✭ 21 (+40%)
GapbsGAP Benchmark Suite
Stars: ✭ 165 (+1000%)
rkmhClassify sequencing reads using MinHash.
Stars: ✭ 42 (+180%)
BabelstreamSTREAM, for lots of devices written in many programming models
Stars: ✭ 121 (+706.67%)
claw-compilerCLAW Compiler for Performance Portability
Stars: ✭ 38 (+153.33%)
Arm VoEfficient monocular visual odometry for ground vehicles on ARM processors
Stars: ✭ 115 (+666.67%)
mcxxMercurium is a C/C++/Fortran source-to-source compilation infrastructure aimed at fast prototyping developed by the Programming Models group at the Barcelona Supercomputing Center
Stars: ✭ 59 (+293.33%)
CompactnsearchA C++ library to compute neighborhood information for point clouds within a fixed radius. Suitable for many applications, e.g. neighborhood search for SPH fluid simulations.
Stars: ✭ 93 (+520%)
wasabiA Buddhabrot explorer based on wabisabi, but with a more affectionate name.
Stars: ✭ 17 (+13.33%)
playblastOIIO-mayaImplements a new 'playblast' command that uses OpenImageIO (OIIO) to process and write image data.
Stars: ✭ 29 (+93.33%)
NbodyN body gravity attraction problem solver
Stars: ✭ 40 (+166.67%)
contechThe Contech analysis framework provides the means for generating and analyzing task graphs that enable computer architects and programmers to gain a deeper understanding of parallel programs.
Stars: ✭ 43 (+186.67%)
ArraymancerA fast, ergonomic and portable tensor library in Nim with a deep learning focus for CPU, GPU and embedded devices via OpenMP, Cuda and OpenCL backends
Stars: ✭ 793 (+5186.67%)
FoxNNSimple neural network
Stars: ✭ 20 (+33.33%)
EmsExtended Memory Semantics - Persistent shared object memory and parallelism for Node.js and Python
Stars: ✭ 552 (+3580%)
JohnJohn the Ripper jumbo - advanced offline password cracker, which supports hundreds of hash and cipher types, and runs on many operating systems, CPUs, GPUs, and even some FPGAs
Stars: ✭ 5,656 (+37606.67%)
FaasmHigh-performance stateful serverless runtime based on WebAssembly
Stars: ✭ 403 (+2586.67%)
Armadillo CodeArmadillo: fast C++ library for linear algebra & scientific computing - http://arma.sourceforge.net
Stars: ✭ 388 (+2486.67%)
NPB-CPPNAS Parallel Benchmark Kernels in C/C++. The parallel versions are in FastFlow, TBB, and OpenMP.
Stars: ✭ 18 (+20%)
Abyss🔬 Assemble large genomes using short reads
Stars: ✭ 219 (+1360%)
euler2d kokkosSimple 2d finite volume solver for Euler equations using c++ kokkos library
Stars: ✭ 27 (+80%)
HeCBenchsoftware.intel.com/content/www/us/en/develop/articles/repo-evaluating-performance-productivity-oneapi.html
Stars: ✭ 85 (+466.67%)
ByteSlice"Byteslice: Pushing the envelop of main memory data processing with a new storage layout" (SIGMOD'15)
Stars: ✭ 24 (+60%)
Dive Into Ml SystemDive into machine learning system, start from reinventing the wheel.
Stars: ✭ 220 (+1366.67%)