Re:Run VTune for a function that runs thousands of times

code_code · ‎12-10-2021

Hi,

I am profiling a piece of code with VTune. The code takes less than a minute to run without using VTune, and when I run VTune on it, it still runs in less than a minute if I don't use ITT pause and resume commands. The problem is when I use pause and resume functions before and after one of the functions that repeats thousands of times. I interrupted the execution after 30 minutes of running. In this case, when I run VTune (in command line mode) on my program, it starts writing to a log folder that gets bigger and bigger, in the range of GBs, and also the profiling overhead is significant. Is there any option or mode that can reduce the overhead, for identifying hotspots?

I am using VTune 2021.4 and running the following command:

vtune -collect hotspots -start-paused -q -data-limit=800 -discard-raw-data <target>

Using or not using data-limit and discard-raw-data did not help.

AbhijeetJ_Intel · ‎12-13-2021

Hi,

Could you please try out following step:

1. Try updating Vtune to latest version and again see if the issue persists.

Download link: https://www.intel.com/content/www/us/en/developer/tools/oneapi/vtune-profiler.html#gs.j9amtm

2. Could you please share your OS version, a reproducer script, and steps you followed to run the script so that we may try to reproduce your issue from our end.

Regards

Abhijeet

code_code · ‎12-14-2021

Hi,

Thanks for the help.

I have upgraded to the latest, using the apt package manager. It is now version 2021.9.
I'm using Ubuntu 2018.04.5.
I have prepared a simple code to reproduce the issue, attached at the end.
The command I use to run: vtune -collect hotspots -mrte-mode=native -q -start-paused ./vtune001

I'm passing "mrte-mode=native", because I get the following warning if I use the default:

vtune: Warning: Pause command is not supported for managed code profiling. Runtime overhead is still possible. Data size limit may be exceeded.

I'm passing "start-paused" since I want to only profile part of the code. The attached example is very basic and is not meaningful. My original code is completely different but with the same situation, which is profiling part of the code that repeats many times.

I ran this example twice. It froze once after printing 7000 and the second time after 72000. The log folder in the second case is almost 1GB and I believe it mostly consists of similar commands for pausing and resuming. Is there anyway to avoid writing the log folder? Maybe that fixes the freezing problem and also the overhead, because it is writing a large log file.

I paste the code here:

#include <vector>
#include <iostream>
#include <ittnotify.h>
using namespace std;

int main(){
    const int sz = 200000;
    vector<int> a(sz, 3);
    for(int i = 0; i < sz; ++i){
        if(i % 1000 == 0)
            cout << i << ' ' << flush;

        __itt_resume();
        for(int j = 0; j < sz; ++j){
            a[i] = a[i] * 2 + 2;
        }
        __itt_pause();

        for(int k = 0; k < sz; ++k){
            a[i] = a[i] * 3 + a[i] * 2;
        }
    }
    cout << endl;
    return 0;
}

And, this is the cmake file:

cmake_minimum_required(VERSION 3.10)
project(vtune001)

set(CMAKE_CXX_STANDARD 14)
set(CMAKE_BUILD_TYPE Release)
set(CMAKE_C_FLAGS   "${CMAKE_C_FLAGS} -g")
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -g")

set(ITT_INCLUDE_DIRS "/opt/intel/oneapi/vtune/latest/sdk/include")
set(ITT_LIBS "/opt/intel/oneapi/vtune/latest/sdk/lib64/libittnotify.a")

include_directories(${ITT_INCLUDE_DIRS})

add_executable(${PROJECT_NAME} main.cpp)

# On Linux* systems, you have to link the dl and pthread libraries to enable
# ITT API functionality. Not linking these libraries will not prevent your
# application from running, but no ITT API data will be collected.
target_link_libraries(${PROJECT_NAME} ${ITT_LIBS} ${CMAKE_DL_LIBS} ${CMAKE_THREAD_LIBS_INIT})

AbhijeetJ_Intel · ‎12-21-2021

Hi,

We are investigating your issue at our end and will soon be back with an update.

Regards

coder_profiler · ‎01-03-2022

Hi,

Thanks! I just wanted to give you an update, that I upgraded to the 2022 version, and I still have the issue.

Best

AbhijeetJ_Intel · ‎01-11-2022

Hi,

We were able to reproduce your issue from our end, and we are working on it.

We will get back to you soon.

Regards

Abhijeet

Maria_N_Intel · ‎01-21-2022

Hi,
Unfortunately, the provided case of usage Pause/Resume API is not the best one and should be reworked:

    for(int i = 0; i < sz; ++i){
        if(i % 1000 == 0)
            cout << i << ' ' << flush;

        __itt_resume();
        for(int j = 0; j < sz; ++j){
            a[i] = a[i] * 2 + 2;
        }
        __itt_pause();

Current implementation of the Pause/Resume API is asynchronous. It's call frequency is about 1Hz. That's why we would not recommend to use this API on a frequent basis for small workloads.

Could you please try to use Frame APIs in your workload instead?

AbhijeetJ_Intel · ‎02-15-2022

Hi,

We tried to use Pause/Resume API in other code and that works fine for us.

Could you please try to use mentioned suggestions in your workload and tell us if it works for you?

Regards

Abhijeet

code_code · ‎02-18-2022

Hi,

This task has recently been cancelled for our product. Really appreciate your help anyway!

AbhijeetJ_Intel · ‎02-21-2022

Hi,

Thank you for the clarification.

If you need any additional information, please post a new question as this thread will no longer be monitored by Intel.

Regards

Abhijeet