Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → xiadz → Cmov

xiadz / Cmov

Licence: mit

Measuring cmov vs branch-mov performance

Programming Languages

50402 projects - #5 most used programming language

5116 projects

Labels

performance performance-analysis

Projects that are alternatives of or similar to Cmov

PerfView is a CPU and memory performance-analysis tool

Stars: ✭ 2,924 (+4941.38%)

Mutual labels: performance, performance-analysis

A code generator for array-based code on CPUs and GPUs

Stars: ✭ 367 (+532.76%)

Mutual labels: performance, performance-analysis

DroidTelescope（DT），Android端App性能监控框架

Stars: ✭ 231 (+298.28%)

Mutual labels: performance, performance-analysis

C++ frame profiler

Stars: ✭ 3,115 (+5270.69%)

Mutual labels: performance, performance-analysis

Watchdoginspector

Shows your current framerate (fps) in the status bar of your iOS app

Stars: ✭ 497 (+756.9%)

Mutual labels: performance, performance-analysis

PyTorch functions and utilities to make your life easier

Stars: ✭ 177 (+205.17%)

Mutual labels: performance, performance-analysis

Qubole Sparklens tool for performance tuning Apache Spark

Stars: ✭ 345 (+494.83%)

Mutual labels: performance, performance-analysis

Collect performance metrics for your library/application.

Stars: ✭ 1,868 (+3120.69%)

Mutual labels: performance, performance-analysis

pprof is a tool for visualization and analysis of profiling data

Stars: ✭ 4,990 (+8503.45%)

Mutual labels: performance, performance-analysis

Why Did You Update

💥 Puts your console on blast when React is making unnecessary updates.

Stars: ✭ 4,089 (+6950%)

Mutual labels: performance, performance-analysis

Fe Performance Journey

🚵 a Journey of Performance Optimizing in Frontend 🚀

Stars: ✭ 169 (+191.38%)

Mutual labels: performance, performance-analysis

displays the import size of the package you are importing inside the code editor

Stars: ✭ 1,021 (+1660.34%)

Mutual labels: performance, performance-analysis

The Linux perf GUI for performance analysis.

Stars: ✭ 2,415 (+4063.79%)

Mutual labels: performance, performance-analysis

High performance Java APM. Powered by ASM. Try it. Test it. If you feel its better, use it.

Stars: ✭ 2,281 (+3832.76%)

Mutual labels: performance, performance-analysis

Caliper is an instrumentation and performance profiling library

Stars: ✭ 162 (+179.31%)

Mutual labels: performance, performance-analysis

QuickPerf is a testing library for Java to quickly evaluate and improve some performance-related properties

Stars: ✭ 231 (+298.28%)

Mutual labels: performance, performance-analysis

Heapinspector For Ios

Find memory issues & leaks in your iOS app without instruments

Stars: ✭ 1,819 (+3036.21%)

Mutual labels: performance, performance-analysis

An android library for displaying fps from the choreographer and percentage of time with two or more frames dropped

Stars: ✭ 1,859 (+3105.17%)

Mutual labels: performance, performance-analysis

⏱ PHP performance tool analyser your script on time, memory usage and db query. Support Laravel and Composer for web, web console and command line interfaces.

Stars: ✭ 429 (+639.66%)

Mutual labels: performance, performance-analysis

inspectIT is the leading Open Source APM (Application Performance Management) tool for analyzing your Java (EE) applications.

Stars: ✭ 513 (+784.48%)

Mutual labels: performance, performance-analysis

View All Similar Projects ➔

Comparing the performance of Intel's cmov instruction with a conditional branch + mov pair.

Suppose you want to write the following code inside a performance critical loop:

    if(cond) {
        foo = bar;
    }

where foo is a local variable (of some simple type, like int), and bar is either a local variable, or a simple memory location (such as A[i]). Such a code is likely to be found for example in a Kadane's algorithm implementation.

There are two standard ways for an x86-64 compiler to translate this conditional into Intel assembly, namely a cmov instruction, or a conditional branch paired with a regular mov.

To make this example more concrete I will focus on this particular code:

    int j;
    int res = 0;
    for(j = 0; j < n; ++j) {
        if(A[j]) {
            res = j;
        }
    }

GCC version 4.8.2 emits the following code for the if at -O3 (gas' assembly notation):

    movl    (%rdi,%rcx,4), %r9d
    testl   %r9d, %r9d
    cmovne  %ecx, %eax

Clearly the compiler stores the location of A in %rdi, the value of j in %rcx (%ecx), and res in %rax (%eax).

We may however ask it to forget about the cmov instruction by using the following command line parameters in addition to -O3: -fno-tree-loop-if-convert -fno-tree-loop-if-convert-stores -fno-if-conversion -fno-if-conversion2. Then we will obtain the following assembly:

    movl    (%rdi,%rcx,4), %r9d
    testl   %r9d, %r9d
    je  .L4
    movl    %ecx, %eax
.L4:

To make this example more interesting let's feed the compiler with more data. GCC (and others probably too) implements a mechanism allowing the programmer to specify that a condition will likely/unlikely be true during the runtime. In our case it would look like this (assuming that we are expecting A[j] to be generally false):

    int j;
    int res = 0;
    for(j = 0; j < n; ++j) {
        if(__builtin_expect(A[j], 0)) {
            res = j;
        }
    }

The corresponding GCC output:

    movl    (%rdi,%rcx,4), %r9d
    testl   %r9d, %r9d
    jne .L20
.L14:
    /* Other code */
.L20:
    movl    %ecx, %eax
    jmp .L14

We are of course free to put 1 in the place of 0 whenever we expect A[j] to be true most of the time:

    int j;
    int res = 0;
    for(j = 0; j < n; ++j) {
        if(__builtin_expect(A[j], 1)) {
            res = j;
        }
    }

And the corresponding GCC output (note the speculative movl execution!):

    movl    %eax, %r8d
    movl    (%rdi,%rcx,4), %r10d
    movl    %ecx, %eax
    testl   %r10d, %r10d
    je  .L29
.L24:
    /* Other code */
.L29:
    movl    %r8d, %eax
    jmp .L24

This way we obtain four versions of the program which I will call cmov, branch, branch_0 and branch_1. One might ask why there is no cmov_0 and no cmov_1, but it is due to the fact that in cmov's case the compiler ignored the likely/unlikely hints.

The point of this excercise is to measure their execution speed, but this of course depends on the contents of the A array. So let's fill it with random independent identically distributed zeros and ones, manipulating the probability of one (the probability that the if's condition will be true).

Update 2018:

Thanks to AndreasPK for providing this data.

On an i7-6700k (Skylake) cmov is now faster under all circumstances:

While the performance at the extremes looks the same cmov is still faster which becomes obvious when looking at the raw numbers:

0.000000;0.983000;0.412000;0.550000;0.411000
...
1.000000;0.747000;0.951000;0.591000;0.398000

branch_0 comes (very) close for probability zero but was still always slightly slower than cmov in 10 tests I ran.

Original Results

These are the results I obtained on my machine. It is the execution time, so the lower the better:

Three interesting facts can be read from this plot:

The cmov instruction is generally faster. It is also immune to the probability
However the branch version is faster for stable, predictable conditions.
The branch versions take approximately p*(1-p) time to execute. The nice parabola-alike looks is approximately the variance of the distribution that A was sampled from.

The "visual proof" for the third fact:

To conclude, GCC emits cmov by default, and it seems like a reasonable choice, at least on my Intel Core i5. However, as noted before, it is not the optimal solution for stable, predictable conditions.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 58

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (1) 🔗