Analyzers
Talk to fellow users of Intel Analyzer tools (Intel VTune™ Profiler, Intel Advisor)

Vtune with MKL Flag

Chihao
Beginner
1,223 Views

I am trying to analyze the cache performance(L1 L2 L3 cache miss) of the matrix multiplication and I was not able to get the data so I first test out the fib sequence . And I notice the interesting thing.

This is the version of g++

g++ version:
g++ (GCC) 15.1.1 20250729
Copyright (C) 2025 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

This is the version of vtune

vtune version:
Intel(R) VTune(TM) Profiler 2025.0.1 (build 629235) Command Line Tool
Copyright (C) 2009 Intel Corporation. All rights reserved.


This is the fib code

 
#include <iostream>
long long fib(int n) {
    if (n <= 1)
        return n;
    return fib(n - 1) + fib(n - 2);
}

void dummy_spmm_loop() {


    for (int i = 0; i < 40; i++) {
        std::cout << fib(35) << std::endl;
    }  
   
   

}
int main(int argc, char* argv[]) {
    dummy_spmm_loop();

    return 0;
}

The vtune config:

Chihao_2-1758485351912.png

Chihao_3-1758485386740.png

Firstly, When I compile like this(With MKL flag)

g++ matmat_sparse.cpp -o matmat_sparse -O0 -lmkl_intel_ilp64 -lmkl_core -lmkl_sequential

The result is abnormal, the elapsed time is abnormal and I am not able to view the result of cache miss and the fib function  does not show in the stack trace

Chihao_4-1758485645046.png

Chihao_5-1758485685236.png

 

 

 

 

Second, When I compile like this(Without MKL flag)

g++ matmat_sparse.cpp -o matmat_sparse -O0

The result is normal, the elapsed time is normal and I am able to view the result of cache miss the fib function  shows in the stack trace

g++ matmat_sparse.cpp -o matmat_sparse -O0

 

 

 

Chihao_1-1758485270568.png

 

 

Chihao_0-1758485254676.png

 

The machine code for my code is 

.file "matmat_sparse.cpp"
.text
#APP
.globl _ZSt21ios_base_library_initv
#NO_APP
.globl _Z3fibi
.type _Z3fibi, @function
_Z3fibi:
.LFB1984:
.cfi_startproc
pushq %rbp
.cfi_def_cfa_offset 16
.cfi_offset 6, -16
movq %rsp, %rbp
.cfi_def_cfa_register 6
pushq %rbx
subq $24, %rsp
.cfi_offset 3, -24
movl %edi, -20(%rbp)
cmpl $1, -20(%rbp)
jg .L2
movl -20(%rbp), %eax
cltq
jmp .L3
.L2:
movl -20(%rbp), %eax
subl $1, %eax
movl %eax, %edi
call _Z3fibi
movq %rax, %rbx
movl -20(%rbp), %eax
subl $2, %eax
movl %eax, %edi
call _Z3fibi
addq %rbx, %rax
.L3:
movq -8(%rbp), %rbx
leave
.cfi_def_cfa 7, 8
ret
.cfi_endproc
.LFE1984:
.size _Z3fibi, .-_Z3fibi
.globl _Z15dummy_spmm_loopv
.type _Z15dummy_spmm_loopv, @function
_Z15dummy_spmm_loopv:
.LFB1985:
.cfi_startproc
pushq %rbp
.cfi_def_cfa_offset 16
.cfi_offset 6, -16
movq %rsp, %rbp
.cfi_def_cfa_register 6
subq $16, %rsp
movl $0, -4(%rbp)
jmp .L5
.L6:
movl $35, %edi
call _Z3fibi
movq %rax, %rdx
leaq _ZSt4cout(%rip), %rax
movq %rdx, %rsi
movq %rax, %rdi
call _ZNSolsEx@PLT
movq _ZSt4endlIcSt11char_traitsIcEERSt13basic_ostreamIT_T0_ES6_@GOTPCREL(%rip), %rdx
movq %rdx, %rsi
movq %rax, %rdi
call _ZNSolsEPFRSoS_E@PLT
addl $1, -4(%rbp)
.L5:
cmpl $39, -4(%rbp)
jle .L6
nop
nop
leave
.cfi_def_cfa 7, 8
ret
.cfi_endproc
.LFE1985:
.size _Z15dummy_spmm_loopv, .-_Z15dummy_spmm_loopv
.globl main
.type main, @function
main:
.LFB1986:
.cfi_startproc
pushq %rbp
.cfi_def_cfa_offset 16
.cfi_offset 6, -16
movq %rsp, %rbp
.cfi_def_cfa_register 6
subq $16, %rsp
movl %edi, -4(%rbp)
movq %rsi, -16(%rbp)
call _Z15dummy_spmm_loopv
movl $0, %eax
leave
.cfi_def_cfa 7, 8
ret
.cfi_endproc
.LFE1986:
.size main, .-main
.ident "GCC: (GNU) 15.1.1 20250729"
.section .note.GNU-stack,"",@progbits​

To summarize, for the exact the same program, Compiling with the MKL flag seems to cause the problem with vtune.However, in order to test for the matrix multiplication, I have to include the MKL flag, any solution for why this happens?

 

 

 

0 Kudos
3 Replies
optimizergal
New Contributor I
1,156 Views

It's possible that when running the executable through VTune, it isn't finding the MKL library. If you're running it from the VTune GUI, then the environment for that session might not contain the path for MKL.

You can try adding it here:

Screenshot 2025-09-21 143935.png

Something like:

LD_LIBRARY_PATH = /opt/intel/mkl/lib/intel64:$LD_LIBRARY_PATH

0 Kudos
Chihao
Beginner
1,068 Views

Thanks for the suggestion.

But I do not quite find the cell in your picture.

Chihao_0-1758499192646.png

 

0 Kudos
optimizergal
New Contributor I
666 Views

Oh, are you running a remote ssh collection? In that case, you could add the MKL path in .bashrc for the user, or wrap the executable in a script that sets the environment. Or you might be able to use a wrapper script in VTune to set it. This is near the bottom of the advanced settings. 

For the wrapper script in VTune, you could try:

#!/bin/bash

# Set MKL
echo "Setting MKL"
export LD_LIBRARY_PATH=/opt/intel/mkl/lib/intel64:$LD_LIBRARY_PATH # Run VTune collector "$@" # Postfix script ls -la $VTUNE_RESULT_DIR

I haven't tested this use case myself, but it seems like it should work. Sometimes VTune has trouble parsing the script because the last line doesn't end with a line feed, so make sure there's an extra line at the end.

https://www.intel.com/content/www/us/en/docs/vtune-profiler/user-guide/2025-4/analysis-target-options.html 

 

NOTE:
  • VTune Profiler preserves the content of the script. The script is preserved within the project and is run for every analysis within that project. To apply any changes to the script, attach it again using the same Wrapper script field.

  • For Linux targets, make sure that the script file is saved with LF line endings.

0 Kudos
Reply