All Projects → DLTcollab → Sse2neon

DLTcollab / Sse2neon

Licence: mit
A translator from Intel SSE intrinsics to Arm/Aarch64 NEON implementation

Programming Languages

c
50402 projects - #5 most used programming language

Projects that are alternatives of or similar to Sse2neon

Computelibrary
The Compute Library is a set of computer vision and machine learning functions optimised for both Arm CPUs and GPUs using SIMD technologies.
Stars: ✭ 2,123 (+571.84%)
Mutual labels:  arm, simd, aarch64, armv8, neon
Unisimd Assembler
SIMD macro assembler unified for ARM, MIPS, PPC and x86
Stars: ✭ 63 (-80.06%)
Mutual labels:  x86, simd, aarch64, sse, neon
Simdjson
Parsing gigabytes of JSON per second
Stars: ✭ 15,115 (+4683.23%)
Mutual labels:  arm64, arm, simd, aarch64, neon
Simde
Implementations of SIMD instruction sets for systems which don't natively support them.
Stars: ✭ 1,012 (+220.25%)
Mutual labels:  arm64, arm, simd, sse, neon
Sleef
SIMD Library for Evaluating Elementary Functions, vectorized libm and DFT
Stars: ✭ 353 (+11.71%)
Mutual labels:  arm, simd, aarch64, neon
cpuwhat
Nim utilities for advanced CPU operations: CPU identification, ISA extension detection, bindings to assorted intrinsics
Stars: ✭ 25 (-92.09%)
Mutual labels:  arm, sse, simd, x86
Simd
C++ image processing and machine learning library with using of SIMD: SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, AVX, AVX2, AVX-512, VMX(Altivec) and VSX(Power7), NEON for ARM.
Stars: ✭ 1,263 (+299.68%)
Mutual labels:  arm, simd, sse, neon
simdutf8
SIMD-accelerated UTF-8 validation for Rust.
Stars: ✭ 426 (+34.81%)
Mutual labels:  neon, simd, arm64, aarch64
Rappel
A linux-based assembly REPL for x86, amd64, armv7, and armv8
Stars: ✭ 818 (+158.86%)
Mutual labels:  arm64, x86, aarch64, armv8
Mandibule
linux elf injector for x86 x86_64 arm arm64
Stars: ✭ 171 (-45.89%)
Mutual labels:  arm64, arm, x86, aarch64
alpine-qbittorrent-openvpn
qBittorrent docker container with OpenVPN client running as unprivileged user on alpine linux
Stars: ✭ 230 (-27.22%)
Mutual labels:  arm, arm64, aarch64, armv8
fdtd3d
fdtd3d is an open source 1D, 2D, 3D FDTD electromagnetics solver with MPI, OpenMP and CUDA support for x86, arm, arm64 architectures
Stars: ✭ 77 (-75.63%)
Mutual labels:  arm, x86, arm64, aarch64
Boost.simd
Boost SIMD
Stars: ✭ 238 (-24.68%)
Mutual labels:  simd, aarch64, sse, neon
Raspberrypipkg
DEPRECATED - DO NOT USE | Go here instead ->
Stars: ✭ 758 (+139.87%)
Mutual labels:  arm64, arm, aarch64, armv8
tensorflow-serving-arm
TensorFlow Serving ARM - A project for cross-compiling TensorFlow Serving targeting popular ARM cores
Stars: ✭ 75 (-76.27%)
Mutual labels:  arm, arm64, aarch64, armv8
Capstone.NET
.NET Core and .NET Framework binding for the Capstone Disassembly Framework
Stars: ✭ 108 (-65.82%)
Mutual labels:  arm, x86, arm64, armv8
tensorflow-aarch64
Compiled tensorflow for aarch64 architecture
Stars: ✭ 20 (-93.67%)
Mutual labels:  arm, arm64, aarch64, armv8
SoftLight
A shader-based Software Renderer Using The LightSky Framework.
Stars: ✭ 2 (-99.37%)
Mutual labels:  neon, sse, simd
static-web-server
A blazing fast and asynchronous web server for static files-serving. ⚡
Stars: ✭ 230 (-27.22%)
Mutual labels:  arm, x86, arm64
cross
“Zero setup” cross compilation and “cross testing” of Rust crates
Stars: ✭ 3,550 (+1023.42%)
Mutual labels:  arm, x86, aarch64

sse2neon

Github Actions

A C/C++ header file that converts Intel SSE intrinsics to Arm/Aarch64 NEON intrinsics.

Introduction

sse2neon is a translator of Intel SSE (Streaming SIMD Extensions) intrinsics to Arm NEON, shortening the time needed to get an Arm working program that then can be used to extract profiles and to identify hot paths in the code. The header file sse2neon.h contains several of the functions provided by Intel intrinsic headers such as <xmmintrin.h>, only implemented with NEON-based counterparts to produce the exact semantics of the intrinsics.

Mapping and Coverage

Header file Extension
<mmintrin.h> MMX
<xmmintrin.h> SSE
<emmintrin.h> SSE2
<pmmintrin.h> SSE3
<tmmintrin.h> SSSE3
<smmintrin.h> SSE4.1
<nmmintrin.h> SSE4.2
<wmmintrin.h> AES

sse2neon aims to support SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2 and AES extension.

In order to deliver NEON-equivalent intrinsics for all SSE intrinsics used widely, please be aware that some SSE intrinsics exist a direct mapping with a concrete NEON-equivalent intrinsic. However, others lack of 1-to-1 mapping, that means the equivalents are implemented using several NEON intrinsics.

For example, SSE intrinsic _mm_loadu_si128 has a direct NEON mapping (vld1q_s32), but SSE intrinsic _mm_maddubs_epi16 has to be implemented with 13+ NEON instructions.

Usage

  • Put the file sse2neon.h in to your source code directory.

  • Locate the following SSE header files included in the code:

#include <xmmintrin.h>
#include <emmintrin.h>

{p,t,s,n,w}mmintrin.h should be replaceable, but the coverage of these extensions might be limited though.

  • Replace them with:
#include "sse2neon.h"
  • Explicitly specify platform-specific options to gcc/clang compilers.
    • On ARMv8-A targets, you should specify the following compiler option: (Remove crypto and/or crc if your architecture does not support cryptographic and/or CRC32 extensions)
    -march=armv8-a+fp+simd+crypto+crc
    
    • On ARMv7-A targets, you need to append the following compiler option:
    -mfpu=neon
    

Compile-time Configurations

Considering the balance between correctness and performance, sse2neon recognizes the following compile-time configurations:

  • SSE2NEON_PRECISE_MINMAX: Enable precise implementation of _mm_min_ps and _mm_max_ps. If you need consistent results such as NaN special cases, enable it.
  • SSE2NEON_PRECISE_DIV: Enable precise implementation of _mm_rcp_ps and _mm_div_ps by additional Netwon-Raphson iteration for accuracy.
  • SSE2NEON_PRECISE_SQRT: Enable precise implementation of _mm_sqrt_ps and _mm_rsqrt_ps by additional Netwon-Raphson iteration for accuracy.

The above are turned off by default, and you should define the corresponding macro(s) as 1 before including sse2neon.h if you need the precise implementations.

Run Built-in Test Suite

sse2neon provides a unified interface for developing test cases. These test cases are located in tests directory, and the input data is specified at runtime. Use the following commands to perform test cases:

$ make check

You can specify GNU toolchain for cross compilation as well. QEMU should be installed in advance.

$ make CROSS_COMPILE=aarch64-linux-gnu- check # ARMv8-A

or

$ make CROSS_COMPILE=arm-linux-gnueabihf- check # ARMv7-A

Check the details via Test Suite for SSE2NEON.

Coding Convention

Use the command $ make indent to follow the coding convention.

Adoptions

Here is a partial list of open source projects that have adopted sse2neon for Arm/Aarch64 support.

  • Apache Impala is a lightning-fast, distributed SQL queries for petabytes of data stored in Apache Hadoop clusters.
  • Apache Kudu completes Hadoop's storage layer to enable fast analytics on fast data.
  • ART is an implementation in OCaml of Adaptive Radix Tree (ART).
  • Blender is the free and open source 3D creation suite, supporting the entirety of the 3D pipeline.
  • Boo is a cross-platform windowing and event manager similar to SDL or SFML, with additional 3D rendering functionality.
  • CARTA is a new visualization tool designed for viewing radio astronomy images in CASA, FITS, MIRIAD, and HDF5 formats (using the IDIA custom schema for HDF5).
  • Catcoon is a feedforward neural network implementation in C.
  • dab-cmdline provides entries for the functionality to handle Digital audio broadcasting (DAB)/DAB+ through some simple calls.
  • emp-tool aims to provide a benchmark for secure computation and allowing other researchers to experiment and extend.
  • FoundationDB is a distributed database designed to handle large volumes of structured data across clusters of commodity servers.
  • iqtree_arm_neon is the Arm NEON port of IQ-TREE, fast and effective stochastic algorithm to infer phylogenetic trees by maximum likelihood.
  • kram is a wrapper to several popular encoders to and from PNG/KTX files with LDR/HDR and BC/ASTC/ETC2.
  • libscapi stands for the "Secure Computation API", providing reliable, efficient, and highly flexible cryptographic infrastructure.
  • minimap2 is a versatile sequence alignment program that aligns DNA or mRNA sequences against a large reference database.
  • MMseqs2 (Many-against-Many sequence searching) is a software suite to search and cluster huge protein and nucleotide sequence sets.
  • N2 is an approximate nearest neighborhoods algorithm library written in C++, providing a much faster search speed than other implementations when modeling large dataset.
  • niimath is a general image calculator with superior performance.
  • OBS Studio is software designed for capturing, compositing, encoding, recording, and streaming video content, efficiently.
  • OGRE is a scene-oriented, flexible 3D engine written in C++ designed to make it easier and more intuitive for developers to produce games and demos utilising 3D hardware.
  • OpenXRay is an improved version of the X-Ray engine, used in world famous S.T.A.L.K.E.R. game series by GSC Game World.
  • parallel-n64 is an optimized/rewritten Nintendo 64 emulator made specifically for Libretro.
  • PFFFT does 1D Fast Fourier Transforms, of single precision real and complex vectors.
  • PlutoSDR Firmware is the customized firmware for the PlutoSDR that can be used to introduce fundamentals of Software Defined Radio (SDR) or Radio Frequency (RF) or Communications as advanced topics in electrical engineering in a self or instructor lead setting.
  • Pygame is cross-platform and designed to make it easy to write multimedia software, such as games, in Python.
  • simd_utils is a header-only library implementing common mathematical functions using SIMD intrinsics.
  • Spack is a multi-platform package manager that builds and installs multiple versions and configurations of software.
  • srsLTE is an open source SDR LTE software suite.
  • Surge is an open source digital synthesizer.
  • XMRig is an open source CPU miner for Monero cryptocurrency.

Related Projects

Reference

Licensing

sse2neon is freely redistributable under the MIT License.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].