All Projects → jeremyong → Mc_ruler

jeremyong / Mc_ruler

Licence: mit
Seamless llvm-mca CMake integration

Labels

Projects that are alternatives of or similar to Mc ruler

Cmake Examples
A collection of as simple as possible, modern CMake projects
Stars: ✭ 843 (+3914.29%)
Mutual labels:  cmake
Microtrac
Software for Open Source Ecology's MicroTrac
Stars: ✭ 11 (-47.62%)
Mutual labels:  cmake
Cpp Dependencies
download, compile && prepare cpp-ethereum dependencies for windows
Stars: ✭ 14 (-33.33%)
Mutual labels:  cmake
Ios Cmake
A CMake toolchain file for iOS, macOS, watchOS & tvOS C/C++/Obj-C++ development
Stars: ✭ 844 (+3919.05%)
Mutual labels:  cmake
Nrf Blinky
Stars: ✭ 10 (-52.38%)
Mutual labels:  cmake
Keygen
KeyGen is a generator for keys and passwords.
Stars: ✭ 11 (-47.62%)
Mutual labels:  cmake
Industrial calibration tutorials
Tutorials for industrial calibration package.
Stars: ✭ 7 (-66.67%)
Mutual labels:  cmake
Cht.exe
cht.sh libcurl client for windows XP+ with changed colorization
Stars: ✭ 15 (-28.57%)
Mutual labels:  cmake
Envtool
Utility to check and search along environment variables. Or where Python/Cmake/Man-pages/pkg-modules/VCPKG-packages are located.
Stars: ✭ 10 (-52.38%)
Mutual labels:  cmake
Matrix Creator Pocketsphinx
Examples using MATRIX Devices and PhocketSphinx.
Stars: ✭ 14 (-33.33%)
Mutual labels:  cmake
Latency Test
Software and hardware to test VR latency on any device
Stars: ✭ 9 (-57.14%)
Mutual labels:  cmake
Vst3sdk
VST 3 Plug-In SDK
Stars: ✭ 853 (+3961.9%)
Mutual labels:  cmake
Ros Academy For Beginners
中国大学MOOC《机器人操作系统入门》代码示例 ROS tutorial
Stars: ✭ 861 (+4000%)
Mutual labels:  cmake
Openspatial Windows
Open Spatial Windows SDK
Stars: ✭ 8 (-61.9%)
Mutual labels:  cmake
Aerial lidar
Stars: ✭ 14 (-33.33%)
Mutual labels:  cmake
Bestnote
👊 持续更新,Java Android 近几年最全面的技术点以及面试题 供自己学习使用
Stars: ✭ 841 (+3904.76%)
Mutual labels:  cmake
Simple arm 01
Repository for RoboND's ROS Basics Lesson 02
Stars: ✭ 11 (-47.62%)
Mutual labels:  cmake
Tribits
TriBITS: Tribal Build, Integrate, and Test System,
Stars: ✭ 20 (-4.76%)
Mutual labels:  cmake
Libuv Cmake
Compile libuv using cmake
Stars: ✭ 14 (-33.33%)
Mutual labels:  cmake
Gr Cbmc
Cumulant-Based Modulation Classification with GNU Radio
Stars: ✭ 13 (-38.1%)
Mutual labels:  cmake

MC Ruler

🎶 Drop the cycles, yo 🎶

MC Ruler (Machine Code Ruler) allows you to easily instrument your code and mark segments to be analyzed with llvm-mca. The tool then produces, for each region marked to be analyzed, a separate report containing analysis produced by llvm-mca. Originally, this code was used to analyze the performance of the Klein library, but it has been extracted into this repository to quickly enable analysis in all projects that can be compiled with clang and leverage CMake.

Important Usage Note

It is important to double check the assembly labels emitted by MC_MEASURE_BEGIN and MC_MEASURE_END correctly fence the code being measured. Both Clang and GCC have moved instructions corresponding to inlined assembly (including assembly marked volatile). If the assembly labels need to be corrected, simply adjust the labels as needed and remove the mc_ruler/*.mcr files in your build tree before rebuilding again. CMake will not re-compile the object files, and the report will rerun with the corrected assembly.

Quick Start

In your CMake file, add the following snippet to fetch this project as a dependency into your build tree:

include(FetchContent)
FetchContent_Declare(
    mc_ruler
    GIT_REPOSITORY https://github.com/jeremyong/mc_ruler.git
    GIT_TAG origin/master
)
FetchContent_MakeAvailable(mc_ruler)

include(MCRuler)

Then, for each target with code you want to instrument, do the following:

# mc_ruler is an interface exposing a single header. No code is linked
# and the only thing that will change is the include path
target_link_libraries(my_target PUBLIC mc_ruler::mc_ruler)

Finally, for each source file containing code you wish to instrument with llvm-mca, do the following:

// In your C or C++ file (this example is dot_product.cpp)
// This header is included by the mc_ruler interface target
#include <mc_ruler.h>
#include <pmmintrin.h>

float dot_product(__m128 const& a, __m128 const& b)
{
    // Code you want to instrument should be surrounded with
    // MC_MEASURE_BEGIN and MC_MEAUSRE_END like so. Supply a name
    // for the region.
    MC_MEASURE_BEGIN(dot_product);
    __m128 s    = _mm_shuffle_ps(a, a, _MM_SHUFFLE(2, 3, 0, 1));
    __m128 sums = _mm_add_ps(a, s);
    s           = _mm_movehl_ps(s, sums);
    sums        = _mm_add_ss(sums, s);
    auto out    = _mm_cvtss_f32(sums);
    MC_MEASURE_END();
    return out;
}

Then, in CMake, after your target is defined, do the following:

mc_ruler(
    my_target
    SOURCES
    dot_product.cpp
    # Any number of source files can be listed here
    LLVM_MCA_FLAGS # optional
    # Additional flags you wish to pass to llvm-mca
    # are optionally defined here
)

After building, there will be a new file in your project's binary tree, in this case under mc_ruler/my_target with a file called dot_product.mcr. The .mcr file is just a text file containing the output of the llvm-mca analysis. In this case, the file emitted will have contents like this:

[0] Code Region - dot_product

Iterations:        100
Instructions:      4500
Total Cycles:      3204
Total uOps:        6600

Dispatch Width:    6
uOps Per Cycle:    2.06
IPC:               1.40
Block RThroughput: 15.0


Instruction Info:
[1]: #uOps
[2]: Latency
[3]: RThroughput
[4]: MayLoad
[5]: MayStore
[6]: HasSideEffects (U)

[1]    [2]    [3]    [4]    [5]    [6]    Instructions:
 1      5     0.50    *                   movq	-120(%rbp), %rax
 1      6     0.50    *                   movaps	(%rax), %xmm0
 1      1     1.00                        shufps	$177, %xmm0, %xmm0
 2      1     1.00           *            movaps	%xmm0, -144(%rbp)
 1      5     0.50    *                   movq	-120(%rbp), %rax
 1      6     0.50    *                   movaps	(%rax), %xmm0
 1      6     0.50    *                   movaps	-144(%rbp), %xmm1
 2      1     1.00           *            movaps	%xmm0, -96(%rbp)
 2      1     1.00           *            movaps	%xmm1, -112(%rbp)
 1      6     0.50    *                   movaps	-96(%rbp), %xmm0
 1      6     0.50    *                   movaps	-112(%rbp), %xmm1
 1      4     0.50                        addps	%xmm1, %xmm0
 2      1     1.00           *            movaps	%xmm0, -160(%rbp)
 1      6     0.50    *                   movaps	-144(%rbp), %xmm0
 1      6     0.50    *                   movaps	-160(%rbp), %xmm1
 2      1     1.00           *            movaps	%xmm0, -16(%rbp)
 2      1     1.00           *            movaps	%xmm1, -32(%rbp)
 1      6     0.50    *                   movapd	-16(%rbp), %xmm0
 1      6     0.50    *                   movapd	-32(%rbp), %xmm1
 1      1     1.00                        unpckhpd	%xmm0, %xmm1
 2      1     1.00           *            movapd	%xmm1, -144(%rbp)
 1      6     0.50    *                   movaps	-160(%rbp), %xmm0
 1      6     0.50    *                   movaps	-144(%rbp), %xmm1
 2      1     1.00           *            movaps	%xmm0, -48(%rbp)
 2      1     1.00           *            movaps	%xmm1, -64(%rbp)
 1      5     0.50    *                   movss	-64(%rbp), %xmm0
 1      6     0.50    *                   movaps	-48(%rbp), %xmm1
 1      4     0.50                        addss	%xmm0, %xmm1
 2      1     1.00           *            movaps	%xmm1, -48(%rbp)
 1      6     0.50    *                   movaps	-48(%rbp), %xmm0
 2      1     1.00           *            movaps	%xmm0, -160(%rbp)
 1      6     0.50    *                   movaps	-160(%rbp), %xmm0
 2      1     1.00           *            movaps	%xmm0, -80(%rbp)
 1      5     0.50    *                   movss	-80(%rbp), %xmm0
 1      1     0.25                        addq	$32, %rsp
 2      6     0.50    *                   popq	%rbp
 3      7     1.00                  U     retq
 3      2     1.00           *            pushq	%rbp
 1      1     0.25                        movq	%rsp, %rbp
 1      1     1.00           *            movl	%edi, -4(%rbp)
 1      1     1.00           *            movl	%esi, -8(%rbp)
 1      5     0.50    *                   movl	-4(%rbp), %eax
 2      6     0.50    *                   addl	-8(%rbp), %eax
 2      6     0.50    *                   popq	%rbp
 3      7     1.00                  U     retq


Resources:
[0]   - SKLDivider
[1]   - SKLFPDivider
[2]   - SKLPort0
[3]   - SKLPort1
[4]   - SKLPort2
[5]   - SKLPort3
[6]   - SKLPort4
[7]   - SKLPort5
[8]   - SKLPort6
[9]   - SKLPort7


Resource pressure per iteration:
[0]    [1]    [2]    [3]    [4]    [5]    [6]    [7]    [8]    [9]
 -      -     3.49   3.49   15.99  16.00  15.00  3.51   3.51   7.01

Resource pressure by instruction:
[0]    [1]    [2]    [3]    [4]    [5]    [6]    [7]    [8]    [9]    Instructions:
 -      -      -      -      -     1.00    -      -      -      -     movq	-120(%rbp), %rax
 -      -      -      -     0.01   0.99    -      -      -      -     movaps	(%rax), %xmm0
 -      -      -      -      -      -      -     1.00    -      -     shufps	$177, %xmm0, %xmm0
 -      -      -      -      -      -     1.00    -      -     1.00   movaps	%xmm0, -144(%rbp)
 -      -      -      -     1.00    -      -      -      -      -     movq	-120(%rbp), %rax
 -      -      -      -     0.99   0.01    -      -      -      -     movaps	(%rax), %xmm0
 -      -      -      -      -     1.00    -      -      -      -     movaps	-144(%rbp), %xmm1
 -      -      -      -      -      -     1.00    -      -     1.00   movaps	%xmm0, -96(%rbp)
 -      -      -      -      -     1.00   1.00    -      -      -     movaps	%xmm1, -112(%rbp)
 -      -      -      -     1.00    -      -      -      -      -     movaps	-96(%rbp), %xmm0
 -      -      -      -      -     1.00    -      -      -      -     movaps	-112(%rbp), %xmm1
 -      -     0.49   0.51    -      -      -      -      -      -     addps	%xmm1, %xmm0
 -      -      -      -     1.00    -     1.00    -      -      -     movaps	%xmm0, -160(%rbp)
 -      -      -      -     1.00    -      -      -      -      -     movaps	-144(%rbp), %xmm0
 -      -      -      -      -     1.00    -      -      -      -     movaps	-160(%rbp), %xmm1
 -      -      -      -      -      -     1.00    -      -     1.00   movaps	%xmm0, -16(%rbp)
 -      -      -      -     0.98   0.02   1.00    -      -      -     movaps	%xmm1, -32(%rbp)
 -      -      -      -     1.00    -      -      -      -      -     movapd	-16(%rbp), %xmm0
 -      -      -      -      -     1.00    -      -      -      -     movapd	-32(%rbp), %xmm1
 -      -      -      -      -      -      -     1.00    -      -     unpckhpd	%xmm0, %xmm1
 -      -      -      -     0.02    -     1.00    -      -     0.98   movapd	%xmm1, -144(%rbp)
 -      -      -      -     1.00    -      -      -      -      -     movaps	-160(%rbp), %xmm0
 -      -      -      -      -     1.00    -      -      -      -     movaps	-144(%rbp), %xmm1
 -      -      -      -      -     0.98   1.00    -      -     0.02   movaps	%xmm0, -48(%rbp)
 -      -      -      -     0.98   0.02   1.00    -      -      -     movaps	%xmm1, -64(%rbp)
 -      -      -      -     1.00    -      -      -      -      -     movss	-64(%rbp), %xmm0
 -      -      -      -      -     1.00    -      -      -      -     movaps	-48(%rbp), %xmm1
 -      -     0.51   0.49    -      -      -      -      -      -     addss	%xmm0, %xmm1
 -      -      -      -     0.01    -     1.00    -      -     0.99   movaps	%xmm1, -48(%rbp)
 -      -      -      -     1.00    -      -      -      -      -     movaps	-48(%rbp), %xmm0
 -      -      -      -      -     0.99   1.00    -      -     0.01   movaps	%xmm0, -160(%rbp)
 -      -      -      -      -     1.00    -      -      -      -     movaps	-160(%rbp), %xmm0
 -      -      -      -     0.99   0.01   1.00    -      -      -     movaps	%xmm0, -80(%rbp)
 -      -      -      -     1.00    -      -      -      -      -     movss	-80(%rbp), %xmm0
 -      -      -     0.49    -      -      -     0.01   0.50    -     addq	$32, %rsp
 -      -     0.01    -      -     1.00    -     0.50   0.49    -     popq	%rbp
 -      -     0.49   0.50   1.00    -      -     0.01   1.00    -     retq
 -      -     0.49   0.49   0.01    -     1.00   0.01   0.01   0.99   pushq	%rbp
 -      -     0.01   0.50    -      -      -      -     0.49    -     movq	%rsp, %rbp
 -      -      -      -     0.98    -     1.00    -      -     0.02   movl	%edi, -4(%rbp)
 -      -      -      -      -      -     1.00    -      -     1.00   movl	%esi, -8(%rbp)
 -      -      -      -      -     1.00    -      -      -      -     movl	-4(%rbp), %eax
 -      -     0.49    -     1.00    -      -     0.49   0.02    -     addl	-8(%rbp), %eax
 -      -     0.50   0.01   0.02   0.98    -     0.49    -      -     popq	%rbp
 -      -     0.50   0.50    -     1.00    -      -     1.00    -     retq

How does it work?

Internally, MC Ruler defines a new target for every source file passed to the mc_ruler CMake function. All the properties from the original target are copied to the new target so that it compiles properly with the correct compile definitions, library linkages, include directories, etc. Then, it is compiled as an individual target with additional flags set to ensure the assembly is saved. Finally, llvm-mca is invoked on the generated assembly to emit the analysis file as shown above.

It is recommended to only enable mc_ruler when the CMake build type is RELEASE as there is not much use in analyzing unoptimized code.

Other details worth noting are that if you have source files in a nested directory, the emitted mcr analysis files will be stored with a similar nesting with the target's folder (which will itself be within ${PROJECT_BINARY_DIR}/mc_ruler).

The test directory in this repo contains an example mini-project which builds when this project is compiled as a standalone.

Footnote

This project has the unfortunate property of possessing the lamest project name I have ever come up with.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].