All Projects → powturbo → Turbo-Histogram

powturbo / Turbo-Histogram

Licence: other
Fastest Histogram Construction

Programming Languages

c
50402 projects - #5 most used programming language
Makefile
30231 projects

Projects that are alternatives of or similar to Turbo-Histogram

Turbo Run Length Encoding
TurboRLE-Fastest Run Length Encoding
Stars: ✭ 212 (+381.82%)
Mutual labels:  benchmark, sse, simd, avx2
Libsimdpp
Portable header-only C++ low level SIMD library
Stars: ✭ 914 (+1977.27%)
Mutual labels:  sse, simd, avx2
Quadray Engine
Realtime raytracer using SIMD on ARM, MIPS, PPC and x86
Stars: ✭ 13 (-70.45%)
Mutual labels:  sse, simd, avx2
ternary-logic
Support for ternary logic in SSE, XOP, AVX2 and x86 programs
Stars: ✭ 21 (-52.27%)
Mutual labels:  sse, simd, avx2
simd-byte-lookup
SIMDized check which bytes are in a set
Stars: ✭ 23 (-47.73%)
Mutual labels:  sse, simd, avx2
Fastnoisesimd
C++ SIMD Noise Library
Stars: ✭ 542 (+1131.82%)
Mutual labels:  sse, simd, avx2
Simde
Implementations of SIMD instruction sets for systems which don't natively support them.
Stars: ✭ 1,012 (+2200%)
Mutual labels:  sse, simd, avx2
Vc
SIMD Vector Classes for C++
Stars: ✭ 985 (+2138.64%)
Mutual labels:  sse, simd, avx2
Simd
C++ image processing and machine learning library with using of SIMD: SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, AVX, AVX2, AVX-512, VMX(Altivec) and VSX(Power7), NEON for ARM.
Stars: ✭ 1,263 (+2770.45%)
Mutual labels:  sse, simd, avx2
Base64simd
Base64 coding and decoding with SIMD instructions (SSE/AVX2/AVX512F/AVX512BW/AVX512VBMI/ARM Neon)
Stars: ✭ 115 (+161.36%)
Mutual labels:  sse, simd, avx2
cpuwhat
Nim utilities for advanced CPU operations: CPU identification, ISA extension detection, bindings to assorted intrinsics
Stars: ✭ 25 (-43.18%)
Mutual labels:  sse, simd, avx2
Turbo-Transpose
Transpose: SIMD Integer+Floating Point Compression Filter
Stars: ✭ 50 (+13.64%)
Mutual labels:  sse, simd, avx2
Libxsmm
Library for specialized dense and sparse matrix operations, and deep learning primitives.
Stars: ✭ 518 (+1077.27%)
Mutual labels:  sse, simd, avx2
Directxmath
DirectXMath is an all inline SIMD C++ linear algebra library for use in games and graphics apps
Stars: ✭ 859 (+1852.27%)
Mutual labels:  sse, simd, avx2
simdutf
Unicode routines (UTF8, UTF16): billions of characters per second.
Stars: ✭ 108 (+145.45%)
Mutual labels:  simd, avx2, sse2
Unisimd Assembler
SIMD macro assembler unified for ARM, MIPS, PPC and x86
Stars: ✭ 63 (+43.18%)
Mutual labels:  sse, simd, avx2
Boost.simd
Boost SIMD
Stars: ✭ 238 (+440.91%)
Mutual labels:  sse, simd, avx2
Umesimd
UME::SIMD A library for explicit simd vectorization.
Stars: ✭ 66 (+50%)
Mutual labels:  benchmark, simd, avx2
HLML
Auto-generated maths library for C and C++ based on HLSL/Cg
Stars: ✭ 23 (-47.73%)
Mutual labels:  sse, simd
hpc
Learning and practice of high performance computing (CUDA, Vulkan, OpenCL, OpenMP, TBB, SSE/AVX, NEON, MPI, coroutines, etc. )
Stars: ✭ 39 (-11.36%)
Mutual labels:  sse, simd

TurboHist: Fastest Histogram Construction

  • ~0.18 - 0.90 cycles per byte
  • 100% C (C++ compatible header) without inline assembly
  • Both 32 and 64 bits supported
  • Portable scalar functions faster than SIMD functions
  • Up to 22 times faster than naive solution
  • 🆕 (2022.01) more faster, beats even other very fast assembler functions

Benchmark:

  • Single thread
  • Realistic and practical benchmark with large files.
  • No PURE cache benchmark

- Uniform/Skewed distribution:

  • Uniform: enwik9
  • Skewed: enwik9 bwt generated w. libdivsufsort
  • 1GB zeros
  • Accurate benchmarking with command "turbohist file -I15"
Benchmark Intel CPU: i7-9700K 3.6GHz gcc 11.2

Uniform distribution - enwik9 Text file, size=1.000.0000.000

Function MB/s Cycle/Byte Language Package
1:hist_1_8 naiv 8 bits 2761.01 1.3423 C TurboHist
2:hist_4_8 4 bins/ 8 bits 2725.92 1.3249 C TurboHist
3:hist_8_8 8 bins/ 8 bits 2850.05 1.2627 C TurboHist
4:hist_4_32 4 bins/32 bits 3691.02 0.9660 C TurboHist
5:hist_8_32 8 bins/32 bits 3867.26 0.9561 C TurboHist
6:hist_4_64 4 bins/64 bits 4040.55 0.9103 C TurboHist
7:hist_8_64 8 bins/64 bits 4053.37 0.9035 C TurboHist
8:histr_4_64 4/64+run 3915.85 0.9668 C TurboHist
9:histr_8_64 8/64+run 3916.51 0.9286 C TurboHist
10:hist_4_128 4 bins/sse4.1 3643.20 1.0081 C TurboHist
11:hist_8_128 8 bins/sse4.1 3607.06 0.9845 C TurboHist
12:hist_4_256 4 bins/avx2 3522.27 1.0195 C TurboHist
13:hist_8_256 8 bins/avx2 3542.25 1.0366 C TurboHist
15:hist_8_64asm inline asm 4161.87 0.8787 inline asm TurboHist
18:count2x64 inline asm 3963.91 0.9172 inline asm Countbench
20:histo_ref 2702.57 1.3567 C Ryg
21:histo_cpp_1x 1876.13 1.8236 C Ryg
22:histo_cpp_2x 2664.78 1.5935 C Ryg
23:histo_cpp_4x 2817.77 1.2944 C Ryg
24:histo_asm_scalar4 3130.08 1.1609 asm Ryg
25:histo_asm_scalar8 3353.08 1.0636 asm Ryg
26:histo_asm_scalar8_var 3704.88 0.9856 asm Ryg
27:histo_asm_scalar8_var2 4085.48 0.8913 asm Ryg
28:histo_asm_scalar8_var3 4132.54 0.8870 asm Ryg
29:histo_asm_scalar8_var4 4083.92 0.8970 asm Ryg
30:histo_asm_scalar8_var5 4002.21 0.9025 asm Ryg
31:histo_asm_sse4 3153.01 1.1445 asm Ryg
32:memcpy 13724.29 0.2698 C

Skewed distribution - enwik9.bwt Text file, size=1.000.0000.000

Function MB/s Cycle/Byte Language
1:hist_1_8 naiv 8 bits 1170.89 3.0642 C
2:hist_4_8 4 bins/ 8 bits 2707.74 1.3321 C
3:hist_8_8 8 bins/ 8 bits 2804.08 1.3208 C
4:hist_4_32 4 bins/32 bits 3118.54 1.1402 C
5:hist_8_32 8 bins/32 bits 3780.16 0.9714 C
6:hist_4_64 4 bins/64 bits 3646.25 0.9980 C
7:hist_8_64 8 bins/64 bits 3941.96 0.9282 C
8:histr_4_64 4/64+run 5061.62 0.7270 C
9:histr_8_64 8/64+run 5135.29 0.7229 C
10:hist_4_128 4 bins/sse4.1 3535.36 1.0365 C
11:hist_8_128 8 bins/sse4.1 3654.41 0.9791 C
12:hist_4_256 4 bins/avx2 3329.87 1.1022 C
13:hist_8_256 8 bins/avx2 3540.36 1.0343 C
15:hist_8_64asm inline asm 4047.74 0.9013 inline asm
18:count2x64 inline asm 3969.92 0.9262 inline asm
20:histo_ref 1182.61 3.0718 C
21:histo_cpp_1x 1213.42 2.9748 C
22:histo_cpp_2x 2115.60 1.7373 C
23:histo_cpp_4x 1801.97 2.0024 C
24:histo_asm_scalar4 3092.87 1.1561 asm
25:histo_asm_scalar8 3203.95 1.1139 asm
26:histo_asm_scalar8_var 3460.45 1.0422 asm
27:histo_asm_scalar8_var2 3659.61 0.9878 asm
28:histo_asm_scalar8_var3 3769.96 0.9569 asm
29:histo_asm_scalar8_var4 3996.75 0.8905 asm
30:histo_asm_scalar8_var5 4642.10 0.7719 asm
31:histo_asm_sse4 3091.36 1.1670 asm
32:memcpy 15594.28 0.2412 C

All zeros: size=1.000.0000.000

Function MB/s Cycle/Byte Language
1:hist_1_8 naiv 8 bits 877.27 4.0805 C
2:hist_4_8 4 bins/ 8 bits 2650.84 1.3485 C
3:hist_8_8 8 bins/ 8 bits 2743.40 1.2994 C
4:hist_4_32 4 bins/32 bits 2978.83 1.2006 C
5:hist_8_32 8 bins/32 bits 3775.45 0.9555 C
6:hist_4_64 4 bins/64 bits 3411.11 1.0530 C
7:hist_8_64 8 bins/64 bits 3928.09 0.9342 C
8:histr_4_64 4/64+run 18998.87 0.1868 C
9:histr_8_64 8/64+run 19629.28 0.1869 C
10:hist_4_128 4 bins/sse4.1 3365.40 1.0717 C
11:hist_8_128 8 bins/sse4.1 3632.61 0.9950 C
12:hist_4_256 4 bins/avx2 3112.15 1.1576 C
13:hist_8_256 8 bins/avx2 3497.08 1.0205 C
15:hist_8_64asm inline asm 4089.97 0.8817 inline asm
18:count2x64 inline asm 3881.98 0.9158 inline asm
20:histo_ref 882.93 4.1072 C
21:histo_cpp_1x 873.20 4.1069 C
22:histo_cpp_2x 1720.19 2.0961 C
23:histo_cpp_4x 1866.99 2.0817 C
24:histo_asm_scalar4 2995.84 1.1942 asm
25:histo_asm_scalar8 3107.30 1.1618 asm
26:histo_asm_scalar8_var 3288.67 1.1143 asm
27:histo_asm_scalar8_var2 3290.92 1.0957 asm
28:histo_asm_scalar8_var3 3707.41 0.9763 asm
29:histo_asm_scalar8_var4 3988.01 0.9019 asm
30:histo_asm_scalar8_var5 14076.09 0.2564 asm
31:histo_asm_sse4 3020.32 1.1975 asm
32:memcpy 14057.53 0.2636 C

(bold = pareto) MB=1.000.000

Compile:

    make
 or
    make AVX2=1

Usage:

    turbohist [-e#] file [-I#] [-z]
    options:
    -e#     # = function numbers separated by ,
    -I#     # = number of iteration
            set to -I15 for accurate timings  
    -z      set read buffer to zeros

Examples:

    ./turbohist file
    ./turbohist -e1,7,9

Environment:

OS/Compiler (32 + 64 bits):
  • Windows: MinGW-w64 makefile
  • Linux amd/intel: GNU GCC (>=4.6)
  • Linux amd/intel: Clang (>=3.2)
  • Linux arm: aarch64 ARMv8: gcc (>=6.3)
  • MaxOS: XCode (>=9) + Apple M1
  • PowerPC ppc64le: gcc (>=8.0)

Last update: 01 JAN 2022

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].