Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → kimwalisch → Libpopcnt

kimwalisch / Libpopcnt

Licence: bsd-2-clause

🚀 Fast C/C++ bit population count library

Programming Languages

50402 projects - #5 most used programming language

1120 projects

Labels

neon avx2 avx512

Projects that are alternatives of or similar to Libpopcnt

Base64 coding and decoding with SIMD instructions (SSE/AVX2/AVX512F/AVX512BW/AVX512VBMI/ARM Neon)

Stars: ✭ 115 (-47.49%)

Mutual labels: neon, avx2, avx512

SIMD Vector Classes for C++

Stars: ✭ 985 (+349.77%)

Mutual labels: neon, avx2, avx512

Portable header-only C++ low level SIMD library

Stars: ✭ 914 (+317.35%)

Mutual labels: neon, avx2, avx512

SIMD (SWAR/SSE/SSE4/AVX2/AVX512F/ARM Neon) of Karp-Rabin algorithm's modification

Stars: ✭ 115 (-47.49%)

Mutual labels: neon, avx2, avx512

UME::SIMD A library for explicit simd vectorization.

Stars: ✭ 66 (-69.86%)

Mutual labels: neon, avx2, avx512

Implementations of SIMD instruction sets for systems which don't natively support them.

Stars: ✭ 1,012 (+362.1%)

Mutual labels: neon, avx2, avx512

Realtime raytracer using SIMD on ARM, MIPS, PPC and x86

Stars: ✭ 13 (-94.06%)

Mutual labels: neon, avx2, avx512

Agenium Scale vectorization library for CPUs and GPUs

Stars: ✭ 138 (-36.99%)

Mutual labels: neon, avx2, avx512

Performance-portable, length-agnostic SIMD with runtime dispatch

Stars: ✭ 301 (+37.44%)

Mutual labels: neon, avx2, avx512

Boost SIMD

Stars: ✭ 238 (+8.68%)

Mutual labels: neon, avx2, avx512

Unisimd Assembler

SIMD macro assembler unified for ARM, MIPS, PPC and x86

Stars: ✭ 63 (-71.23%)

Mutual labels: neon, avx2, avx512

C++ image processing and machine learning library with using of SIMD: SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, AVX, AVX2, AVX-512, VMX(Altivec) and VSX(Power7), NEON for ARM.

Stars: ✭ 1,263 (+476.71%)

Mutual labels: neon, avx2, avx512

Storage for my snippets, toy programs, etc.

Stars: ✭ 187 (-14.61%)

Mutual labels: avx2, avx512

Open Source Architecture Code Analyzer

Stars: ✭ 162 (-26.03%)

Mutual labels: avx2, avx512

⚡️⚡️⚡️Blazing fast correlation functions on the CPU.

Stars: ✭ 114 (-47.95%)

Mutual labels: avx2, avx512

DirectXMath is an all inline SIMD C++ linear algebra library for use in games and graphics apps

Stars: ✭ 859 (+292.24%)

Mutual labels: neon, avx2

C++ wrappers for SIMD intrinsics and parallelized, optimized mathematical functions (SSE, AVX, NEON, AVX512)

Stars: ✭ 964 (+340.18%)

Mutual labels: neon, avx512

Parsing gigabytes of JSON per second

Stars: ✭ 15,115 (+6801.83%)

Mutual labels: neon, avx2

C++ SIMD Noise Library

Stars: ✭ 542 (+147.49%)

Mutual labels: neon, avx2

Native Go version of HighwayHash with optimized assembly implementations on Intel and ARM. Able to process over 10 GB/sec on a single core on Intel CPUs - https://en.wikipedia.org/wiki/HighwayHash

Stars: ✭ 670 (+205.94%)

Mutual labels: neon, avx2

View All Similar Projects ➔

libpopcnt

libpopcnt.h is a header-only C/C++ library for counting the number of 1 bits (bit population count) in an array as quickly as possible using specialized CPU instructions i.e. POPCNT, AVX2, AVX512, NEON. libpopcnt.h has been tested successfully using the GCC, Clang and MSVC compilers.

The algorithms used in libpopcnt.h are described in the paper Faster Population Counts using AVX2 Instructions by Daniel Lemire, Nathan Kurz and Wojciech Mula (23 Nov 2016).

How it works

On x86 CPUs libpopcnt.h uses a combination of 4 different bit population count algorithms:

For array sizes < 512 bytes a POPCNT algorithm is used.
For array sizes ≥ 512 bytes an AVX2 algorithm is used.
For array sizes ≥ 1024 bytes an AVX512 algorithm is used.
For CPUs without POPCNT instruction a portable integer algorithm is used.

Note that libpopcnt.h works on all CPUs, it checks at run-time whether your CPU supports POPCNT, AVX2, AVX512 before using it and it is also thread-safe.

C/C++ API

#include "libpopcnt.h"

/*
 * Count the number of 1 bits in the data array
 * @data: An array
 * @size: Size of data in bytes
 */
uint64_t popcnt(const void* data, uint64_t size);

Speedup

This benchmark shows the speedup of the 4 popcount algorithms used on x86 CPUs compared to the basic lookup-8 popcount algorithm for different array sizes (in bytes).

Algorithm	32 B	64 B	128 B	256 B	512 B	1024 B	2048 B	4096 B
lookup-8	1.00	1.00	1.00	1.00	1.00	1.00	1.00	1.00
bit-parallel-mul	1.41	1.54	1.63	1.78	1.60	1.62	1.63	1.64
builtin-popcnt	4.75	6.36	8.58	8.55	6.72	7.60	7.88	7.94
avx2-harley-seal	1.15	1.85	3.22	4.17	8.46	10.74	12.52	13.66
avx512-harley-seal	0.35	1.49	2.54	3.83	5.63	15.12	22.18	25.60

libpopcnt.h automatically picks the fastest algorithm for the given array size. This benchmark was run on an Intel Xeon Platinum 8168 CPU with GCC 5.4.

CPU architectures

libpopcnt.h has hardware accelerated popcount algorithms for the following CPU architectures:

x86	`POPCNT`, `AVX2`, `AVX512`
x86-64	`POPCNT`, `AVX2`, `AVX512`
ARM	`NEON`
PPC64	`POPCNTD`

For other CPU architectures a fast integer popcount algorithm is used.

How to compile

libpopcnt.h does not require any special compiler flags like -mavx2! In order to get the best performance we recommend however to compile with optimizations enabled e.g. -O3 or -O2.

cc  -O3 program.c
c++ -O3 program.cpp

Development

cmake .
make -j
make test

The above commands also build the benchmark program which is useful for benchmarking libpopcnt.h. Below is a usage example run on an Intel Xeon Platinum 8168 CPU from 2017:

# Usage: ./benchmark [array bytes] [iters]
./benchmark
Iters: 10000000
Array size: 16.00 KB
Algorithm: AVX512
Status: 100%
Seconds: 1.59
103.4 GB/s

Acknowledgments

The vectorized popcount algorithms used in libpopcnt.h have originally been written by Wojciech Muła, I just made a convenient and portable C/C++ library using these algorithms.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 219

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (0) 🔗