SleefSIMD Library for Evaluating Elementary Functions, vectorized libm and DFT
Stars: ✭ 353 (+155.8%)
Unisimd AssemblerSIMD macro assembler unified for ARM, MIPS, PPC and x86
Stars: ✭ 63 (-54.35%)
SimdeImplementations of SIMD instruction sets for systems which don't natively support them.
Stars: ✭ 1,012 (+633.33%)
Quadray EngineRealtime raytracer using SIMD on ARM, MIPS, PPC and x86
Stars: ✭ 13 (-90.58%)
VcSIMD Vector Classes for C++
Stars: ✭ 985 (+613.77%)
UmesimdUME::SIMD A library for explicit simd vectorization.
Stars: ✭ 66 (-52.17%)
SimdC++ image processing and machine learning library with using of SIMD: SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, AVX, AVX2, AVX-512, VMX(Altivec) and VSX(Power7), NEON for ARM.
Stars: ✭ 1,263 (+815.22%)
SimdjsonParsing gigabytes of JSON per second
Stars: ✭ 15,115 (+10852.9%)
DirectxmathDirectXMath is an all inline SIMD C++ linear algebra library for use in games and graphics apps
Stars: ✭ 859 (+522.46%)
Corrfunc⚡️⚡️⚡️Blazing fast correlation functions on the CPU.
Stars: ✭ 114 (-17.39%)
LibsimdppPortable header-only C++ low level SIMD library
Stars: ✭ 914 (+562.32%)
Base64simdBase64 coding and decoding with SIMD instructions (SSE/AVX2/AVX512F/AVX512BW/AVX512VBMI/ARM Neon)
Stars: ✭ 115 (-16.67%)
ternary-logicSupport for ternary logic in SSE, XOP, AVX2 and x86 programs
Stars: ✭ 21 (-84.78%)
simdutf8SIMD-accelerated UTF-8 validation for Rust.
Stars: ✭ 426 (+208.7%)
LibxsmmLibrary for specialized dense and sparse matrix operations, and deep learning primitives.
Stars: ✭ 518 (+275.36%)
XsimdC++ wrappers for SIMD intrinsics and parallelized, optimized mathematical functions (SSE, AVX, NEON, AVX512)
Stars: ✭ 964 (+598.55%)
HighwayPerformance-portable, length-agnostic SIMD with runtime dispatch
Stars: ✭ 301 (+118.12%)
Std Simdstd::experimental::simd for GCC [ISO/IEC TS 19570:2018]
Stars: ✭ 275 (+99.28%)
OsacaOpen Source Architecture Code Analyzer
Stars: ✭ 162 (+17.39%)
MippMIPP is a portable wrapper for SIMD instructions written in C++11. It supports NEON, SSE, AVX and AVX-512.
Stars: ✭ 253 (+83.33%)
WheelsPerformance-optimized wheels for TensorFlow (SSE, AVX, FMA, XLA, MPI)
Stars: ✭ 891 (+545.65%)
positional-popcountFast C functions for the computing the positional popcount (pospopcnt).
Stars: ✭ 47 (-65.94%)
Sse4 StrstrSIMD (SWAR/SSE/SSE4/AVX2/AVX512F/ARM Neon) of Karp-Rabin algorithm's modification
Stars: ✭ 115 (-16.67%)
Ctranslate2Fast inference engine for OpenNMT models
Stars: ✭ 140 (+1.45%)
simdutfUnicode routines (UTF8, UTF16): billions of characters per second.
Stars: ✭ 108 (-21.74%)
Sse PopcountSIMD (SSE) population count --- http://0x80.pl/articles/sse-popcount.html
Stars: ✭ 226 (+63.77%)
Libpopcnt🚀 Fast C/C++ bit population count library
Stars: ✭ 219 (+58.7%)
utf8Fast UTF-8 validation with range algorithm (NEON+SSE4+AVX2)
Stars: ✭ 60 (-56.52%)
Sse2neonA translator from Intel SSE intrinsics to Arm/Aarch64 NEON implementation
Stars: ✭ 316 (+128.99%)
Guided Missile SimulationGuided Missile, Radar and Infrared EOS Simulation Framework written in Fortran.
Stars: ✭ 33 (-76.09%)
Cglm📽 Highly Optimized Graphics Math (glm) for C
Stars: ✭ 887 (+542.75%)
KfrFast, modern C++ DSP framework, FFT, Sample Rate Conversion, FIR/IIR/Biquad Filters (SSE, AVX, AVX-512, ARM NEON)
Stars: ✭ 985 (+613.77%)
ultra-sortDSL for SIMD Sorting on AVX2 & AVX512
Stars: ✭ 29 (-78.99%)
Md5 SimdAccelerate aggregated MD5 hashing performance up to 8x for AVX512 and 4x for AVX2. Useful for server applications that need to compute many MD5 sums in parallel.
Stars: ✭ 71 (-48.55%)
OnednnoneAPI Deep Neural Network Library (oneDNN)
Stars: ✭ 2,600 (+1784.06%)
ComputelibraryThe Compute Library is a set of computer vision and machine learning functions optimised for both Arm CPUs and GPUs using SIMD technologies.
Stars: ✭ 2,123 (+1438.41%)
hpcLearning and practice of high performance computing (CUDA, Vulkan, OpenCL, OpenMP, TBB, SSE/AVX, NEON, MPI, coroutines, etc. )
Stars: ✭ 39 (-71.74%)
oversimpleA library for audio oversampling, which tries to offer a simple api while wrapping HIIR, by Laurent De Soras, for minimum phase antialiasing, and r8brain-free-src, by Aleksey Vaneev, for linear phase antialiasing.
Stars: ✭ 25 (-81.88%)
cpuwhatNim utilities for advanced CPU operations: CPU identification, ISA extension detection, bindings to assorted intrinsics
Stars: ✭ 25 (-81.88%)
peakperfAchieve peak performance on x86 CPUs and NVIDIA GPUs
Stars: ✭ 33 (-76.09%)
allgebraBase container for developing C++ and Fortran HPC applications
Stars: ✭ 14 (-89.86%)
monolishmonolish: MONOlithic LInear equation Solvers for Highly-parallel architecture
Stars: ✭ 166 (+20.29%)
HiSpatialClusterClustering spatial points with algorithm of Fast Search, high performace computing implements of CUDA or parallel in CPU, and runnable implements on python standalone or arcgis.
Stars: ✭ 31 (-77.54%)
gpubootcampThis repository consists for gpu bootcamp material for HPC and AI
Stars: ✭ 227 (+64.49%)
Tensorflow Optimized WheelsTensorFlow wheels built for latest CUDA/CuDNN and enabled performance flags: SSE, AVX, FMA; XLA
Stars: ✭ 118 (-14.49%)
block-alignerSIMD-accelerated library for computing global and X-drop affine gap penalty sequence-to-sequence or sequence-to-profile alignments using an adaptive block-based algorithm.
Stars: ✭ 58 (-57.97%)
dbcsrDBCSR: Distributed Block Compressed Sparse Row matrix library
Stars: ✭ 65 (-52.9%)
cuda memtestFork of CUDA GPU memtest 👓
Stars: ✭ 68 (-50.72%)
MatXAn efficient C++17 GPU numerical computing library with Python-like syntax
Stars: ✭ 418 (+202.9%)
mbsolveAn open-source solver tool for the Maxwell-Bloch equations.
Stars: ✭ 14 (-89.86%)
BitmagicBitMagic Library
Stars: ✭ 263 (+90.58%)
FastorA lightweight high performance tensor algebra framework for modern C++
Stars: ✭ 280 (+102.9%)
Fastbase64SIMD-accelerated base64 codecs
Stars: ✭ 309 (+123.91%)
ArrayfireArrayFire: a general purpose GPU library.
Stars: ✭ 3,693 (+2576.09%)