Sse PopcountSIMD (SSE) population count --- http://0x80.pl/articles/sse-popcount.html
Libpopcnt🚀 Fast C/C++ bit population count library
OnednnoneAPI Deep Neural Network Library (oneDNN)
ToysStorage for my snippets, toy programs, etc.
SimdjsonParsing gigabytes of JSON per second
HighwayhashNode.js implementation of HighwayHash, Google's fast and strong hash function
OsacaOpen Source Architecture Code Analyzer
NsimdAgenium Scale vectorization library for CPUs and GPUs
Sse4 StrstrSIMD (SWAR/SSE/SSE4/AVX2/AVX512F/ARM Neon) of Karp-Rabin algorithm's modification
Corrfunc⚡️⚡️⚡️Blazing fast correlation functions on the CPU.
Base64simdBase64 coding and decoding with SIMD instructions (SSE/AVX2/AVX512F/AVX512BW/AVX512VBMI/ARM Neon)
SimdC++ image processing and machine learning library with using of SIMD: SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, AVX, AVX2, AVX-512, VMX(Altivec) and VSX(Power7), NEON for ARM.
Md5 SimdAccelerate aggregated MD5 hashing performance up to 8x for AVX512 and 4x for AVX2. Useful for server applications that need to compute many MD5 sums in parallel.
UmesimdUME::SIMD A library for explicit simd vectorization.
Op rbfOptimized Recursive Bilateral Filter
SimdeImplementations of SIMD instruction sets for systems which don't natively support them.
SixtyfourHow fast can we brute force a 64-bit comparison?
VcSIMD Vector Classes for C++
LibsimdppPortable header-only C++ low level SIMD library
KsimThe little simulator that could.
DirectxmathDirectXMath is an all inline SIMD C++ linear algebra library for use in games and graphics apps
WheelsPerformance-optimized wheels for TensorFlow (SSE, AVX, FMA, XLA, MPI)
HighwayhashNative Go version of HighwayHash with optimized assembly implementations on Intel and ARM. Able to process over 10 GB/sec on a single core on Intel CPUs - https://en.wikipedia.org/wiki/HighwayHash
LibxsmmLibrary for specialized dense and sparse matrix operations, and deep learning primitives.
Asm DudeVisual Studio extension for assembly syntax highlighting and code completion in assembly files and the disassembly window
HighwayPerformance-portable, length-agnostic SIMD with runtime dispatch
simdutfUnicode routines (UTF8, UTF16): billions of characters per second.
awesome-simdA curated list of awesome SIMD frameworks, libraries and software
block-alignerSIMD-accelerated library for computing global and X-drop affine gap penalty sequence-to-sequence or sequence-to-profile alignments using an adaptive block-based algorithm.
utf8Fast UTF-8 validation with range algorithm (NEON+SSE4+AVX2)
simdutf8SIMD-accelerated UTF-8 validation for Rust.
cpuwhatNim utilities for advanced CPU operations: CPU identification, ISA extension detection, bindings to assorted intrinsics
sliceslice-rsA fast implementation of single-pattern substring search using SIMD acceleration.
argon2Implementation of argon2 (i, d, id) algorithms with CPU dispatching
ternary-logicSupport for ternary logic in SSE, XOP, AVX2 and x86 programs