awesome-simd
A curated list of awesome SIMD frameworks, libraries and software.
This list showcases projects that have achieved 10x performance improvements using SIMD (Single Instruction Multiple Data) instructions. In general this should lead to execution speeds of GBs per second on modern CPUs for that task at hand.
Parsing
- simdjson - C++: Parsing gigabytes of JSON per second
- simdjson-go - Go: Parsing gigabytes of JSON per second
- dictionary - C++: High-performance dictionary coding
- simdcomp - C: A simple library for compressing lists of integers using binary packing
- SIMDCompressionAndIntersection - C++: A library to compress and intersect sorted lists of integers using SIMD instructions
- Hyperscan - C++: High-performance regular expression matching library
- Various string algo's - C: Repository for string algorithms, snippets, toy programs, etc.
- sse-popcount - SIMD (SSE) population count
Erasure Coding and Hashing
- Reed-Solomon - Go: Erasure Coding in Go
- highwayhash - Go: Optimized HighwayHash implementation for Intel (over 10 GB/sec), ARM and Power9
- sha256-simd - Go: Optimized SHA256 computations for Intel, ARM and Power9
Neural Network
- ncnn - C++: High-performance NN inference framework optimized for mobile
- mkl-dnn - C++: Math Kernel Library for Deep Neural Networks
- nnpack - C/c++: Acceleration package for neural networks on multi-core CPUs
Image processing
- Simd - C++: image processing library making use of SIMD
- Pillow-SIMD - Python: SIMD version of PIL (Python Imaging Library)
- ComputeLibrary - C++: Library for Computer Vision and Machine Learning (ARM only)
Data Structures
- bitmap - Go: Dense, zero-allocation, SIMD-enabled bitmap/bitset
Cool
- SIMD-Visualiser - Javascript: Graphically visualize SIMD code
- Visual ARM emulator - VisUAL: a highly visual ARM emulator
- faster - Rust: SIMD for humans
- Vectorized Emulation - Accelerated taint tracking at 2 trillion instructions per second
Blogs
Links
- Agner Fog - Software optimization resources
- uops.info - Latency, throughput, and port usage information
- Felix Cloutier - x86 and amd64 instruction reference
- Compiler Explorer - Run compilers interactively from the browser and interact with the assembly
- awesome-asm - A curated list of awesome Assembler
- awesome-llvm - Curated list of awesome LLVM related docs, tools, and other resources
- awesome-decompilation - Curated list of awesome decompilation resources and projects.
- Intel Manual vol 1 (HTML)
- Intel Manual vol 2 (HTML)
- Intel Manual vol 3 (HTML)
- x86 documentation - x86 documentation
- Go assembly reference - Go assembly language complementary reference
Tools
- avo - Go: Generate x86 Assembly with Go
- PeachPy - Python: x86-64 assembler embedded in Python
- c2goasm - Go: C to Go Assembly
- LLVM MCA - LLVM Machine Code Analyzer
- xsimd - C++: Wrappers for SIMD intrinsics and math implementations (SSE, AVX, NEON, AVX512)
- Intel SDE debugging - Debugging with AVX-512
- Asm-Dude - VS extension for assembly syntax highlighting and code completion
- Intrinsics-Dude - VS extension for compiler instrinsics in C/C++
Online tools
- Online (dis-)assembler - Online assembler and disassembler
- ODA - Online disassembler (disassembler.io)
AVX-512
- x86/x64 SIMD Instructions (AVX512) - AVX-512 overview
- Golang's AVX512 - Go 1.11 introduction of AVX-512 support
- Golang AVX512 test data - Golang AVX-512 test instructions
- alexcrichton - AVX-512 overview
- Colfax: Capabilities of Intel AVX-512 - Capabilities of AVX-512
ARM64 NEON
- Golang's ARM64 NEON support - Intro to arm64 assembler for Golang
- Golang ARM64 test data - Golang ARM64 (incl. NEON) test instructions
- simde - Implementations of SIMD instruction sets for systems which don't natively support them.