LibxsmmLibrary for specialized dense and sparse matrix operations, and deep learning primitives.
Stars: ✭ 518 (+2366.67%)
VcSIMD Vector Classes for C++
Stars: ✭ 985 (+4590.48%)
Unisimd AssemblerSIMD macro assembler unified for ARM, MIPS, PPC and x86
Stars: ✭ 63 (+200%)
SimdC++ image processing and machine learning library with using of SIMD: SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, AVX, AVX2, AVX-512, VMX(Altivec) and VSX(Power7), NEON for ARM.
Stars: ✭ 1,263 (+5914.29%)
Quadray EngineRealtime raytracer using SIMD on ARM, MIPS, PPC and x86
Stars: ✭ 13 (-38.1%)
SimdeImplementations of SIMD instruction sets for systems which don't natively support them.
Stars: ✭ 1,012 (+4719.05%)
cpuwhatNim utilities for advanced CPU operations: CPU identification, ISA extension detection, bindings to assorted intrinsics
Stars: ✭ 25 (+19.05%)
Std Simdstd::experimental::simd for GCC [ISO/IEC TS 19570:2018]
Stars: ✭ 275 (+1209.52%)
UmesimdUME::SIMD A library for explicit simd vectorization.
Stars: ✭ 66 (+214.29%)
Base64simdBase64 coding and decoding with SIMD instructions (SSE/AVX2/AVX512F/AVX512BW/AVX512VBMI/ARM Neon)
Stars: ✭ 115 (+447.62%)
LibsimdppPortable header-only C++ low level SIMD library
Stars: ✭ 914 (+4252.38%)
Corrfunc⚡️⚡️⚡️Blazing fast correlation functions on the CPU.
Stars: ✭ 114 (+442.86%)
DirectxmathDirectXMath is an all inline SIMD C++ linear algebra library for use in games and graphics apps
Stars: ✭ 859 (+3990.48%)
XsimdC++ wrappers for SIMD intrinsics and parallelized, optimized mathematical functions (SSE, AVX, NEON, AVX512)
Stars: ✭ 964 (+4490.48%)
NsimdAgenium Scale vectorization library for CPUs and GPUs
Stars: ✭ 138 (+557.14%)
oversimpleA library for audio oversampling, which tries to offer a simple api while wrapping HIIR, by Laurent De Soras, for minimum phase antialiasing, and r8brain-free-src, by Aleksey Vaneev, for linear phase antialiasing.
Stars: ✭ 25 (+19.05%)
KfrFast, modern C++ DSP framework, FFT, Sample Rate Conversion, FIR/IIR/Biquad Filters (SSE, AVX, AVX-512, ARM NEON)
Stars: ✭ 985 (+4590.48%)
penguinVSimple and fast C++ image processing library with focus on heterogeneous systems
Stars: ✭ 110 (+423.81%)
Sse PopcountSIMD (SSE) population count --- http://0x80.pl/articles/sse-popcount.html
Stars: ✭ 226 (+976.19%)
hpcLearning and practice of high performance computing (CUDA, Vulkan, OpenCL, OpenMP, TBB, SSE/AVX, NEON, MPI, coroutines, etc. )
Stars: ✭ 39 (+85.71%)
Guided Missile SimulationGuided Missile, Radar and Infrared EOS Simulation Framework written in Fortran.
Stars: ✭ 33 (+57.14%)
DespacerC library to remove white space from strings as fast as possible
Stars: ✭ 90 (+328.57%)
Md5 SimdAccelerate aggregated MD5 hashing performance up to 8x for AVX512 and 4x for AVX2. Useful for server applications that need to compute many MD5 sums in parallel.
Stars: ✭ 71 (+238.1%)
ToysStorage for my snippets, toy programs, etc.
Stars: ✭ 187 (+790.48%)
Cglm📽 Highly Optimized Graphics Math (glm) for C
Stars: ✭ 887 (+4123.81%)
ultra-sortDSL for SIMD Sorting on AVX2 & AVX512
Stars: ✭ 29 (+38.1%)
SleefSIMD Library for Evaluating Elementary Functions, vectorized libm and DFT
Stars: ✭ 353 (+1580.95%)
OsacaOpen Source Architecture Code Analyzer
Stars: ✭ 162 (+671.43%)
Turbo-TransposeTranspose: SIMD Integer+Floating Point Compression Filter
Stars: ✭ 50 (+138.1%)
Sse4 StrstrSIMD (SWAR/SSE/SSE4/AVX2/AVX512F/ARM Neon) of Karp-Rabin algorithm's modification
Stars: ✭ 115 (+447.62%)
MippMIPP is a portable wrapper for SIMD instructions written in C++11. It supports NEON, SSE, AVX and AVX-512.
Stars: ✭ 253 (+1104.76%)
positional-popcountFast C functions for the computing the positional popcount (pospopcnt).
Stars: ✭ 47 (+123.81%)
HighwayPerformance-portable, length-agnostic SIMD with runtime dispatch
Stars: ✭ 301 (+1333.33%)
Chromium ClangChromium browser compiled with the Clang/LLVM compiler.
Stars: ✭ 77 (+266.67%)
PackettracerThe SIMD-accelereted ray tracing in C# powered by Intel hardware intrinsic of .NET Core.
Stars: ✭ 109 (+419.05%)
Base64 Avx512Code for paper "Base64 encoding and decoding at almost the speed of a memory copy"
Stars: ✭ 158 (+652.38%)
HlslppMath library using hlsl syntax with SSE/NEON support
Stars: ✭ 153 (+628.57%)
WheelsPerformance-optimized wheels for TensorFlow (SSE, AVX, FMA, XLA, MPI)
Stars: ✭ 891 (+4142.86%)
SimdjsonsharpC# bindings for lemire/simdjson (and full C# port)
Stars: ✭ 506 (+2309.52%)
Sha256 SimdAccelerate SHA256 computations in pure Go using Accelerate SHA256 computations in pure Go using AVX512, SHA Extensions for x86 and ARM64 for ARM. On AVX512 it provides an up to 8x improvement (over 3 GB/s per core). SHA Extensions give a performance boost of close to 4x over native.
Stars: ✭ 657 (+3028.57%)
UgmUbpa Graphics Mathematics
Stars: ✭ 178 (+747.62%)
SoftLightA shader-based Software Renderer Using The LightSky Framework.
Stars: ✭ 2 (-90.48%)
FFmpegPlayerSimple FFmpeg video player
Stars: ✭ 72 (+242.86%)
HLMLAuto-generated maths library for C and C++ based on HLSL/Cg
Stars: ✭ 23 (+9.52%)
hlmlvectorized high-level math library
Stars: ✭ 42 (+100%)
Ctranslate2Fast inference engine for OpenNMT models
Stars: ✭ 140 (+566.67%)
BitmagicBitMagic Library
Stars: ✭ 263 (+1152.38%)
Tensorflow Optimized WheelsTensorFlow wheels built for latest CUDA/CuDNN and enabled performance flags: SSE, AVX, FMA; XLA
Stars: ✭ 118 (+461.9%)
Fastbase64SIMD-accelerated base64 codecs
Stars: ✭ 309 (+1371.43%)
SimdjsonParsing gigabytes of JSON per second
Stars: ✭ 15,115 (+71876.19%)