All Projects → xiadz → Cmov

xiadz / Cmov

Licence: mit
Measuring cmov vs branch-mov performance

Programming Languages

c
50402 projects - #5 most used programming language
assembly
5116 projects

Projects that are alternatives of or similar to Cmov

Perfview
PerfView is a CPU and memory performance-analysis tool
Stars: ✭ 2,924 (+4941.38%)
Mutual labels:  performance, performance-analysis
Loopy
A code generator for array-based code on CPUs and GPUs
Stars: ✭ 367 (+532.76%)
Mutual labels:  performance, performance-analysis
Droidtelescope
DroidTelescope(DT),Android端App性能监控框架
Stars: ✭ 231 (+298.28%)
Mutual labels:  performance, performance-analysis
Tracy
C++ frame profiler
Stars: ✭ 3,115 (+5270.69%)
Mutual labels:  performance, performance-analysis
Watchdoginspector
Shows your current framerate (fps) in the status bar of your iOS app
Stars: ✭ 497 (+756.9%)
Mutual labels:  performance, performance-analysis
Torchfunc
PyTorch functions and utilities to make your life easier
Stars: ✭ 177 (+205.17%)
Mutual labels:  performance, performance-analysis
Sparklens
Qubole Sparklens tool for performance tuning Apache Spark
Stars: ✭ 345 (+494.83%)
Mutual labels:  performance, performance-analysis
Speedracer
Collect performance metrics for your library/application.
Stars: ✭ 1,868 (+3120.69%)
Mutual labels:  performance, performance-analysis
Pprof
pprof is a tool for visualization and analysis of profiling data
Stars: ✭ 4,990 (+8503.45%)
Mutual labels:  performance, performance-analysis
Why Did You Update
💥 Puts your console on blast when React is making unnecessary updates.
Stars: ✭ 4,089 (+6950%)
Mutual labels:  performance, performance-analysis
Fe Performance Journey
🚵 a Journey of Performance Optimizing in Frontend 🚀
Stars: ✭ 169 (+191.38%)
Mutual labels:  performance, performance-analysis
Import Cost
displays the import size of the package you are importing inside the code editor
Stars: ✭ 1,021 (+1660.34%)
Mutual labels:  performance, performance-analysis
Hotspot
The Linux perf GUI for performance analysis.
Stars: ✭ 2,415 (+4063.79%)
Mutual labels:  performance, performance-analysis
Myperf4j
High performance Java APM. Powered by ASM. Try it. Test it. If you feel its better, use it.
Stars: ✭ 2,281 (+3832.76%)
Mutual labels:  performance, performance-analysis
Caliper
Caliper is an instrumentation and performance profiling library
Stars: ✭ 162 (+179.31%)
Mutual labels:  performance, performance-analysis
Quickperf
QuickPerf is a testing library for Java to quickly evaluate and improve some performance-related properties
Stars: ✭ 231 (+298.28%)
Mutual labels:  performance, performance-analysis
Heapinspector For Ios
Find memory issues & leaks in your iOS app without instruments
Stars: ✭ 1,819 (+3036.21%)
Mutual labels:  performance, performance-analysis
Tinydancer
An android library for displaying fps from the choreographer and percentage of time with two or more frames dropped
Stars: ✭ 1,859 (+3105.17%)
Mutual labels:  performance, performance-analysis
Performance
⏱ PHP performance tool analyser your script on time, memory usage and db query. Support Laravel and Composer for web, web console and command line interfaces.
Stars: ✭ 429 (+639.66%)
Mutual labels:  performance, performance-analysis
Inspectit
inspectIT is the leading Open Source APM (Application Performance Management) tool for analyzing your Java (EE) applications.
Stars: ✭ 513 (+784.48%)
Mutual labels:  performance, performance-analysis

Comparing the performance of Intel's cmov instruction with a conditional branch + mov pair.

Suppose you want to write the following code inside a performance critical loop:

    if(cond) {
        foo = bar;
    }

where foo is a local variable (of some simple type, like int), and bar is either a local variable, or a simple memory location (such as A[i]). Such a code is likely to be found for example in a Kadane's algorithm implementation.

There are two standard ways for an x86-64 compiler to translate this conditional into Intel assembly, namely a cmov instruction, or a conditional branch paired with a regular mov.

To make this example more concrete I will focus on this particular code:

    int j;
    int res = 0;
    for(j = 0; j < n; ++j) {
        if(A[j]) {
            res = j;
        }
    }

GCC version 4.8.2 emits the following code for the if at -O3 (gas' assembly notation):

    movl    (%rdi,%rcx,4), %r9d
    testl   %r9d, %r9d
    cmovne  %ecx, %eax

Clearly the compiler stores the location of A in %rdi, the value of j in %rcx (%ecx), and res in %rax (%eax).

We may however ask it to forget about the cmov instruction by using the following command line parameters in addition to -O3: -fno-tree-loop-if-convert -fno-tree-loop-if-convert-stores -fno-if-conversion -fno-if-conversion2. Then we will obtain the following assembly:

    movl    (%rdi,%rcx,4), %r9d
    testl   %r9d, %r9d
    je  .L4
    movl    %ecx, %eax
.L4:

To make this example more interesting let's feed the compiler with more data. GCC (and others probably too) implements a mechanism allowing the programmer to specify that a condition will likely/unlikely be true during the runtime. In our case it would look like this (assuming that we are expecting A[j] to be generally false):

    int j;
    int res = 0;
    for(j = 0; j < n; ++j) {
        if(__builtin_expect(A[j], 0)) {
            res = j;
        }
    }

The corresponding GCC output:

    movl    (%rdi,%rcx,4), %r9d
    testl   %r9d, %r9d
    jne .L20
.L14:
    /* Other code */
.L20:
    movl    %ecx, %eax
    jmp .L14

We are of course free to put 1 in the place of 0 whenever we expect A[j] to be true most of the time:

    int j;
    int res = 0;
    for(j = 0; j < n; ++j) {
        if(__builtin_expect(A[j], 1)) {
            res = j;
        }
    }

And the corresponding GCC output (note the speculative movl execution!):

    movl    %eax, %r8d
    movl    (%rdi,%rcx,4), %r10d
    movl    %ecx, %eax
    testl   %r10d, %r10d
    je  .L29
.L24:
    /* Other code */
.L29:
    movl    %r8d, %eax
    jmp .L24

This way we obtain four versions of the program which I will call cmov, branch, branch_0 and branch_1. One might ask why there is no cmov_0 and no cmov_1, but it is due to the fact that in cmov's case the compiler ignored the likely/unlikely hints.

The point of this excercise is to measure their execution speed, but this of course depends on the contents of the A array. So let's fill it with random independent identically distributed zeros and ones, manipulating the probability of one (the probability that the if's condition will be true).

Update 2018:

Thanks to AndreasPK for providing this data.

On an i7-6700k (Skylake) cmov is now faster under all circumstances: Results

While the performance at the extremes looks the same cmov is still faster which becomes obvious when looking at the raw numbers:

0.000000;0.983000;0.412000;0.550000;0.411000
...
1.000000;0.747000;0.951000;0.591000;0.398000

branch_0 comes (very) close for probability zero but was still always slightly slower than cmov in 10 tests I ran.

Original Results

These are the results I obtained on my machine. It is the execution time, so the lower the better: Results

Three interesting facts can be read from this plot:

  • The cmov instruction is generally faster. It is also immune to the probability
  • However the branch version is faster for stable, predictable conditions.
  • The branch versions take approximately p*(1-p) time to execute. The nice parabola-alike looks is approximately the variance of the distribution that A was sampled from.

The "visual proof" for the third fact: Results with a parabola

To conclude, GCC emits cmov by default, and it seems like a reasonable choice, at least on my Intel Core i5. However, as noted before, it is not the optimal solution for stable, predictable conditions.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].