All Projects → asmjit → cult

asmjit / cult

Licence: other
CPU Ultimate Latency Test.

Programming Languages

C++
36643 projects - #6 most used programming language
CMake
9771 projects

Projects that are alternatives of or similar to cult

Asmjit
Machine code generation for C++
Stars: ✭ 2,874 (+4189.55%)
Mutual labels:  x86-64, jit, x86, asmjit
c2clat
A tool to measure CPU core to core latency
Stars: ✭ 37 (-44.78%)
Mutual labels:  benchmark, cpu, latency
Unicorn
Unicorn CPU emulator framework (ARM, AArch64, M68K, Mips, Sparc, PowerPC, RiscV, X86)
Stars: ✭ 4,934 (+7264.18%)
Mutual labels:  cpu, x86-64, x86
OpenWRT-x86 64-Install
Create and deploy a LEGACY or EFI OpenWRT bootable image for x86_64 processors
Stars: ✭ 15 (-77.61%)
Mutual labels:  x86-64, x86
dynarmic
An ARM dynamic recompiler.
Stars: ✭ 675 (+907.46%)
Mutual labels:  x86-64, jit
PBD
🖨️🐞 Printf Based Debugger, a user-friendly C debugger
Stars: ✭ 52 (-22.39%)
Mutual labels:  x86-64, x86
pinktrace
Pink's Tracing Library
Stars: ✭ 20 (-70.15%)
Mutual labels:  x86-64, x86
benchmark-http
No description or website provided.
Stars: ✭ 15 (-77.61%)
Mutual labels:  benchmark, latency
AheuiJIT
Aheui JIT compiler for PC and web
Stars: ✭ 27 (-59.7%)
Mutual labels:  x86-64, jit
x86-Assembly-Reverse-Engineering
🛠 Knowledge about the topic of x86 assembly & disassembly 🛠
Stars: ✭ 27 (-59.7%)
Mutual labels:  x86-64, x86
ria-jit
Lightweight and performant dynamic binary translation for RISC–V code on x86–64
Stars: ✭ 38 (-43.28%)
Mutual labels:  x86-64, x86
binary-decompilation
Extracting high level semantic information from binary code
Stars: ✭ 55 (-17.91%)
Mutual labels:  x86-64, x86
Reloaded.Assembler
Minimal .NET wrapper around the simple, easy to use Flat Assembler written by Tomasz Grysztar. Supports both x64 and x86 development.
Stars: ✭ 17 (-74.63%)
Mutual labels:  x86-64, x86
x86e
A simple x86 emulator, debugger, and editor in JavaScript.
Stars: ✭ 89 (+32.84%)
Mutual labels:  x86-64, x86
yjit-bench
Set of benchmarks for the YJIT CRuby JIT compiler
Stars: ✭ 38 (-43.28%)
Mutual labels:  benchmark, jit
RenHook
An open-source x86 / x86-64 hooking library for Windows.
Stars: ✭ 80 (+19.4%)
Mutual labels:  x86-64, x86
kcs
Scripting in C with JIT(x64)/VM.
Stars: ✭ 25 (-62.69%)
Mutual labels:  x86-64, jit
SixtyFourBits
x64 Assembly Demo Framework
Stars: ✭ 21 (-68.66%)
Mutual labels:  x86-64, x86
cpuwhat
Nim utilities for advanced CPU operations: CPU identification, ISA extension detection, bindings to assorted intrinsics
Stars: ✭ 25 (-62.69%)
Mutual labels:  cpu, x86
alpine-php-fpm
Lightweight and optimised PHP-FPM (PHP 7.4, 8.0, 8.1) Docker images with essential extensions on top of latest Alpine Linux.
Stars: ✭ 53 (-20.9%)
Mutual labels:  x86-64, x86

CULT

CPU Ultimate Latency Test.

Online Access

  • AsmGrid is a web application that allows to search data provided by asmdb and cult projects online.

Introduction

CULT (CPU Ultimate Latency Test) is a tool that runs series of tests that help to estimate how many cycles an X86 processor (both 32-bit or 64-bit modes supported) takes to execute available instructions. The tool should help people that generate code for X86/X64 hardware by allowing them to run tests on their machines themselves instead of relying on informations from CPU vendors or third parties that may be incomplete or that may not provide information for all targetted hardware.

The purpose of CULT is to benchmark as many CPUs as possible, to index the results, and to make them searchable, comparable, and accessible online. This information can be then used for various purposes, like statistics about average latencies of certain instructions (like addition, multiplication, and division) of modern CPUs compared to their predecessors, or as a comparison between various CPU generations for people that still write hand-written assembly to optimize certain operations. The output of CULT is JSON for making the results easier to process by third party tools.

Features

  • CpuDetect - Extracts all possible CPUID queries for offline analysis, except for CPU serial code, which is always omitted for privacy reasons (and not available on modern CPUs anyway).
  • Performance - Extracts information of instruction cycles and latencies:
    • Every instruction is benchmarked in sequential mode, which means that all consecutive operations depend on each other. This test is used to calculate instruction latencies.
    • Every instruction is benchmarked in parallel mode, which is used to calculate theoretical throughput of the instruction, when used in parallel with instructions of the same kind. CULT displays this information as reciprocal throughput per clock cycle so for example 0.2 means 5 instructions per clock cycle.

TODOs

  • Instructions that require consecutive registers (vp4dpwssd[s], v4f[n]madd{ps|ss}, vp2intersect{d|q}) are not checked at the moment.
  • Instructions having memory operand are not checked as well.

Building

CULT requires only AsmJit as a dependency, which it expects by default at the same directory level as cult itself. A custom AsmJit directory can be specified with -DASMJIT_DIR=... when invoking cmake. The simplest way to compile cult is by using cmake:

# Clone CULT and AsmJit
$ git clone --depth=1 https://github.com/asmjit/asmjit
$ git clone --depth=1 https://github.com/asmjit/cult

# Create Build Directory
mkdir cult/build
cd cult/build

# Configure and Make
cmake .. -DCMAKE_BUILD_TYPE=Release
make

# Run CULT!
./cult

Command Line Arguments

$ cult [parameters]

  • --help - Show possible command line parameters
  • --dump - Dump assembly generated and executed (useful for testing)
  • --quiet - Run in quiet mode and output only the resulting JSON
  • --estimate - Run faster (to verify it works) with less precision
  • --no-rounding - Don't round cycles and latencies
  • --instruction=name - Only benchmark a single instruction (useful for testing)
  • --output=file - Output to a file instead of STDOUT

CULT Output

CULT outputs information in two formats:

  • Verbose mode that prints what it does into STDOUT
  • JSON, which is send to STDOUT at the end or written to a file

The JSON document has the following structure:

{
  "cult": {
    "version": "X.Y.Z"          // CULT 'major.minor.micro' version.
  },

  // CPU data retrieved by CPUID instruction.
  "cpuData": [
    {
      "level"     : "HEX",      // CPUID:EAX input (main leaf).
      "subleaf"   : "HEX",      // CPUID:ECX input (sub leaf).
      "eax"       : "HEX",      // CPUID:EAX output.
      "ebx"       : "HEX",      // CPUID:EBX output.
      "ecx"       : "HEX",      // CPUID:ECX output.
      "edx"       : "HEX"       // CPUID:EDX output.
    }
    ...
  ],

  // CPU information
  "cpuInfo": {
    "vendorName"  : "String",   // CPU vendor name.
    "vendorString": "String",   // CPU vendor string.
    "brandString" : "String",   // CPU brand string.
    "codename"    : "String",   // CPU code name.
    "modelId"     : "HEX",      // Model ID + Extended Model ID.
    "familyId"    : "HEX",      // Family ID + Extended Family ID.
    "steppingId"  : "HEX"       // Stepping.
  },

  // Array of instructions measured.
  "instructions": [
    {
      "inst"   : "inst x, y"    // Measured instruction and its operands (unique).
      "lat"    : X.YY           // Latency in CPU cycles, including fractions.
      "rcp"    : X.YY           // Reciprocal throughput, including fractions.
    }
    ...
  ]
}

Implementation Notes

  • The application sets CPU affinity at the beginning to make sure that RDTSC results are read from the same core.
  • AsmJit instruction database & instospection features are used to query all supported instructions. Each instruction with all possible operand combinations is analyzed and benchmarked if the host CPU supports it. System instructions and some rarely used instructions are blacklisted though.
  • A single benchmark uses RDTSC and possibly RDTSCP (if available) to estimate the number of cycles consumed by the test. Tests repeat multiple times and only the best time is considered. A single instruction test is executed multiple times and it only finishes after the time of N best results was achieved.
  • Some instructions are tricky to test and require a bit more instructions for data preparation inside the test (for example division), more special cases are expected in the future.

Authors & Maintainers

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].