UmesimdUME::SIMD A library for explicit simd vectorization.
Stars: ✭ 66 (+100%)
SimdeImplementations of SIMD instruction sets for systems which don't natively support them.
Stars: ✭ 1,012 (+2966.67%)
Corrfunc⚡️⚡️⚡️Blazing fast correlation functions on the CPU.
Stars: ✭ 114 (+245.45%)
VcSIMD Vector Classes for C++
Stars: ✭ 985 (+2884.85%)
Ctranslate2Fast inference engine for OpenNMT models
Stars: ✭ 140 (+324.24%)
ternary-logicSupport for ternary logic in SSE, XOP, AVX2 and x86 programs
Stars: ✭ 21 (-36.36%)
cpuwhatNim utilities for advanced CPU operations: CPU identification, ISA extension detection, bindings to assorted intrinsics
Stars: ✭ 25 (-24.24%)
SleefSIMD Library for Evaluating Elementary Functions, vectorized libm and DFT
Stars: ✭ 353 (+969.7%)
awesome-simdA curated list of awesome SIMD frameworks, libraries and software
Stars: ✭ 39 (+18.18%)
DirectxmathDirectXMath is an all inline SIMD C++ linear algebra library for use in games and graphics apps
Stars: ✭ 859 (+2503.03%)
EdgeExtreme-scale Discontinuous Galerkin Environment (EDGE)
Stars: ✭ 18 (-45.45%)
SimdC++ image processing and machine learning library with using of SIMD: SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, AVX, AVX2, AVX-512, VMX(Altivec) and VSX(Power7), NEON for ARM.
Stars: ✭ 1,263 (+3727.27%)
runtimeAnyDSL Runtime Library
Stars: ✭ 17 (-48.48%)
XsimdC++ wrappers for SIMD intrinsics and parallelized, optimized mathematical functions (SSE, AVX, NEON, AVX512)
Stars: ✭ 964 (+2821.21%)
LibxsmmLibrary for specialized dense and sparse matrix operations, and deep learning primitives.
Stars: ✭ 518 (+1469.7%)
NsimdAgenium Scale vectorization library for CPUs and GPUs
Stars: ✭ 138 (+318.18%)
LaserThe HPC toolbox: fused matrix multiplication, convolution, data-parallel strided tensor primitives, OpenMP facilities, SIMD, JIT Assembler, CPU detection, state-of-the-art vectorized BLAS for floats and integers
Stars: ✭ 191 (+478.79%)
Unisimd AssemblerSIMD macro assembler unified for ARM, MIPS, PPC and x86
Stars: ✭ 63 (+90.91%)
ultra-sortDSL for SIMD Sorting on AVX2 & AVX512
Stars: ✭ 29 (-12.12%)
Quadray EngineRealtime raytracer using SIMD on ARM, MIPS, PPC and x86
Stars: ✭ 13 (-60.61%)
LibsimdppPortable header-only C++ low level SIMD library
Stars: ✭ 914 (+2669.7%)
SimdjsonsharpC# bindings for lemire/simdjson (and full C# port)
Stars: ✭ 506 (+1433.33%)
Fastbase64SIMD-accelerated base64 codecs
Stars: ✭ 309 (+836.36%)
HighwayPerformance-portable, length-agnostic SIMD with runtime dispatch
Stars: ✭ 301 (+812.12%)
NnpackAcceleration package for neural networks on multi-core CPUs
Stars: ✭ 1,538 (+4560.61%)
ThorinThe Higher-Order Intermediate Representation
Stars: ✭ 116 (+251.52%)
Base64simdBase64 coding and decoding with SIMD instructions (SSE/AVX2/AVX512F/AVX512BW/AVX512VBMI/ARM Neon)
Stars: ✭ 115 (+248.48%)
simdutfUnicode routines (UTF8, UTF16): billions of characters per second.
Stars: ✭ 108 (+227.27%)
FastapproxApproximate and vectorized versions of common mathematical functions
Stars: ✭ 128 (+287.88%)
Md5 SimdAccelerate aggregated MD5 hashing performance up to 8x for AVX512 and 4x for AVX2. Useful for server applications that need to compute many MD5 sums in parallel.
Stars: ✭ 71 (+115.15%)
ImpalaAn imperative and functional programming language
Stars: ✭ 118 (+257.58%)
SimdjsonParsing gigabytes of JSON per second
Stars: ✭ 15,115 (+45703.03%)
Chromium ClangChromium browser compiled with the Clang/LLVM compiler.
Stars: ✭ 77 (+133.33%)
penguinVSimple and fast C++ image processing library with focus on heterogeneous systems
Stars: ✭ 110 (+233.33%)
hpcLearning and practice of high performance computing (CUDA, Vulkan, OpenCL, OpenMP, TBB, SSE/AVX, NEON, MPI, coroutines, etc. )
Stars: ✭ 39 (+18.18%)
oversimpleA library for audio oversampling, which tries to offer a simple api while wrapping HIIR, by Laurent De Soras, for minimum phase antialiasing, and r8brain-free-src, by Aleksey Vaneev, for linear phase antialiasing.
Stars: ✭ 25 (-24.24%)
BitmagicBitMagic Library
Stars: ✭ 263 (+696.97%)
Std Simdstd::experimental::simd for GCC [ISO/IEC TS 19570:2018]
Stars: ✭ 275 (+733.33%)
Cglm📽 Highly Optimized Graphics Math (glm) for C
Stars: ✭ 887 (+2587.88%)
WheelsPerformance-optimized wheels for TensorFlow (SSE, AVX, FMA, XLA, MPI)
Stars: ✭ 891 (+2600%)
KfrFast, modern C++ DSP framework, FFT, Sample Rate Conversion, FIR/IIR/Biquad Filters (SSE, AVX, AVX-512, ARM NEON)
Stars: ✭ 985 (+2884.85%)
DespacerC library to remove white space from strings as fast as possible
Stars: ✭ 90 (+172.73%)
OsacaOpen Source Architecture Code Analyzer
Stars: ✭ 162 (+390.91%)
PackettracerThe SIMD-accelereted ray tracing in C# powered by Intel hardware intrinsic of .NET Core.
Stars: ✭ 109 (+230.3%)
MippMIPP is a portable wrapper for SIMD instructions written in C++11. It supports NEON, SSE, AVX and AVX-512.
Stars: ✭ 253 (+666.67%)
tbslasA parallel, fast solver for the scalar advection-diffusion and the incompressible Navier-Stokes equations based on semi-Lagrangian/Volume-Integral method.
Stars: ✭ 21 (-36.36%)
JohnJohn the Ripper jumbo - advanced offline password cracker, which supports hundreds of hash and cipher types, and runs on many operating systems, CPUs, GPUs, and even some FPGAs
Stars: ✭ 5,656 (+17039.39%)
Stdgpustdgpu: Efficient STL-like Data Structures on the GPU
Stars: ✭ 531 (+1509.09%)
ArraymancerA fast, ergonomic and portable tensor library in Nim with a deep learning focus for CPU, GPU and embedded devices via OpenMP, Cuda and OpenCL backends
Stars: ✭ 793 (+2303.03%)
block-alignerSIMD-accelerated library for computing global and X-drop affine gap penalty sequence-to-sequence or sequence-to-profile alignments using an adaptive block-based algorithm.
Stars: ✭ 58 (+75.76%)
articThe AlteRnaTive Impala Compiler
Stars: ✭ 16 (-51.52%)
t8codeParallel algorithms and data structures for tree-based AMR with arbitrary element shapes.
Stars: ✭ 37 (+12.12%)