NsimdAgenium Scale vectorization library for CPUs and GPUs
Stars: ✭ 138 (-14.81%)
Unisimd AssemblerSIMD macro assembler unified for ARM, MIPS, PPC and x86
Stars: ✭ 63 (-61.11%)
VcSIMD Vector Classes for C++
Stars: ✭ 985 (+508.02%)
cpuwhatNim utilities for advanced CPU operations: CPU identification, ISA extension detection, bindings to assorted intrinsics
Stars: ✭ 25 (-84.57%)
Corrfunc⚡️⚡️⚡️Blazing fast correlation functions on the CPU.
Stars: ✭ 114 (-29.63%)
UmesimdUME::SIMD A library for explicit simd vectorization.
Stars: ✭ 66 (-59.26%)
SimdeImplementations of SIMD instruction sets for systems which don't natively support them.
Stars: ✭ 1,012 (+524.69%)
LibxsmmLibrary for specialized dense and sparse matrix operations, and deep learning primitives.
Stars: ✭ 518 (+219.75%)
Quadray EngineRealtime raytracer using SIMD on ARM, MIPS, PPC and x86
Stars: ✭ 13 (-91.98%)
ternary-logicSupport for ternary logic in SSE, XOP, AVX2 and x86 programs
Stars: ✭ 21 (-87.04%)
SimdC++ image processing and machine learning library with using of SIMD: SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, AVX, AVX2, AVX-512, VMX(Altivec) and VSX(Power7), NEON for ARM.
Stars: ✭ 1,263 (+679.63%)
WheelsPerformance-optimized wheels for TensorFlow (SSE, AVX, FMA, XLA, MPI)
Stars: ✭ 891 (+450%)
libmsrWrapper library for model-specific registers. APIs cover RAPL, performance counters, clocks and turbo.
Stars: ✭ 47 (-70.99%)
wxparaverwxParaver is a trace-based visualization and analysis tool designed to study quantitative detailed metrics and obtain qualitative knowledge of the performance of applications, libraries, processors and whole architectures.
Stars: ✭ 23 (-85.8%)
positional-popcountFast C functions for the computing the positional popcount (pospopcnt).
Stars: ✭ 47 (-70.99%)
ToysStorage for my snippets, toy programs, etc.
Stars: ✭ 187 (+15.43%)
OnednnoneAPI Deep Neural Network Library (oneDNN)
Stars: ✭ 2,600 (+1504.94%)
Simple PtSimple Intel CPU processor tracing on Linux
Stars: ✭ 232 (+43.21%)
Guided Missile SimulationGuided Missile, Radar and Infrared EOS Simulation Framework written in Fortran.
Stars: ✭ 33 (-79.63%)
ultra-sortDSL for SIMD Sorting on AVX2 & AVX512
Stars: ✭ 29 (-82.1%)
Libpopcnt🚀 Fast C/C++ bit population count library
Stars: ✭ 219 (+35.19%)
hpcLearning and practice of high performance computing (CUDA, Vulkan, OpenCL, OpenMP, TBB, SSE/AVX, NEON, MPI, coroutines, etc. )
Stars: ✭ 39 (-75.93%)
tacc statsTACC Stats is an automated resource-usage monitoring and analysis package.
Stars: ✭ 36 (-77.78%)
yaskYASK--Yet Another Stencil Kit: a domain-specific language and framework to create high-performance stencil code for implementing finite-difference methods and similar applications.
Stars: ✭ 81 (-50%)
HighwayPerformance-portable, length-agnostic SIMD with runtime dispatch
Stars: ✭ 301 (+85.8%)
SleefSIMD Library for Evaluating Elementary Functions, vectorized libm and DFT
Stars: ✭ 353 (+117.9%)
Asm DudeVisual Studio extension for assembly syntax highlighting and code completion in assembly files and the disassembly window
Stars: ✭ 3,898 (+2306.17%)
DirectxmathDirectXMath is an all inline SIMD C++ linear algebra library for use in games and graphics apps
Stars: ✭ 859 (+430.25%)
Sha256 SimdAccelerate SHA256 computations in pure Go using Accelerate SHA256 computations in pure Go using AVX512, SHA Extensions for x86 and ARM64 for ARM. On AVX512 it provides an up to 8x improvement (over 3 GB/s per core). SHA Extensions give a performance boost of close to 4x over native.
Stars: ✭ 657 (+305.56%)
LibsimdppPortable header-only C++ low level SIMD library
Stars: ✭ 914 (+464.2%)
LikwidPerformance monitoring and benchmarking suite
Stars: ✭ 957 (+490.74%)
CaliperCaliper is an instrumentation and performance profiling library
Stars: ✭ 162 (+0%)
Sse4 StrstrSIMD (SWAR/SSE/SSE4/AVX2/AVX512F/ARM Neon) of Karp-Rabin algorithm's modification
Stars: ✭ 115 (-29.01%)
Sse PopcountSIMD (SSE) population count --- http://0x80.pl/articles/sse-popcount.html
Stars: ✭ 226 (+39.51%)
Base64simdBase64 coding and decoding with SIMD instructions (SSE/AVX2/AVX512F/AVX512BW/AVX512VBMI/ARM Neon)
Stars: ✭ 115 (-29.01%)
Ctranslate2Fast inference engine for OpenNMT models
Stars: ✭ 140 (-13.58%)
Chromium ClangChromium browser compiled with the Clang/LLVM compiler.
Stars: ✭ 77 (-52.47%)
Std Simdstd::experimental::simd for GCC [ISO/IEC TS 19570:2018]
Stars: ✭ 275 (+69.75%)
variorumTool for hardware-level feature control
Stars: ✭ 21 (-87.04%)
XsimdC++ wrappers for SIMD intrinsics and parallelized, optimized mathematical functions (SSE, AVX, NEON, AVX512)
Stars: ✭ 964 (+495.06%)
KfrFast, modern C++ DSP framework, FFT, Sample Rate Conversion, FIR/IIR/Biquad Filters (SSE, AVX, AVX-512, ARM NEON)
Stars: ✭ 985 (+508.02%)
Md5 SimdAccelerate aggregated MD5 hashing performance up to 8x for AVX512 and 4x for AVX2. Useful for server applications that need to compute many MD5 sums in parallel.
Stars: ✭ 71 (-56.17%)
ZydisFast and lightweight x86/x86-64 disassembler and code generation library
Stars: ✭ 2,168 (+1238.27%)
BootmineBootable minesweeper game in a 512-byte boot sector
Stars: ✭ 136 (-16.05%)
Lighthouse MonitorInvestigate performance over your whole company with lighthouse
Stars: ✭ 136 (-16.05%)
Edb Debuggeredb is a cross-platform AArch32/x86/x86-64 debugger.
Stars: ✭ 2,019 (+1146.3%)
Heapinspector For IosFind memory issues & leaks in your iOS app without instruments
Stars: ✭ 1,819 (+1022.84%)
Steg86Hiding messages in x86 programs using semantic duals
Stars: ✭ 136 (-16.05%)
BltA streamlined CMake build system foundation for developing HPC software
Stars: ✭ 135 (-16.67%)
Dask JobqueueDeploy Dask on job schedulers like PBS, SLURM, and SGE
Stars: ✭ 150 (-7.41%)
V86x86 virtualization in your browser, recompiling x86 to wasm on the fly
Stars: ✭ 12,765 (+7779.63%)
DashDASH, the C++ Template Library for Distributed Data Structures with Support for Hierarchical Locality for HPC and Data-Driven Science
Stars: ✭ 134 (-17.28%)
SysmonAn intuitive remotely-accessible system performance monitoring and task management tool for servers and headless Raspberry Pi setups.
Stars: ✭ 158 (-2.47%)
HlslppMath library using hlsl syntax with SSE/NEON support
Stars: ✭ 153 (-5.56%)
GinkgoNumerical linear algebra software package
Stars: ✭ 149 (-8.02%)
Asm Cli Rustinterative assembly shell written in rust
Stars: ✭ 133 (-17.9%)