Re:Suppressing ITT messages

Utku · ‎05-22-2021

Hello,

I am using VTune profiler. I am using ittnotify library and its "__itt_resume()" and "__itt_pause()" functions. Everytime I call the "__itt_resume()" and "__itt_pause()" functions, the following messages are printed:

The sampling collection resumed.

The sampling collection paused.

These printings make my function much slower. Is it possible to suppress these messages?

My vtune profiling command line is as follows:

vtune -collect-with runsa -quiet -start-paused -knob event-config=L1D_PEND_MISS.PENDING,L1D_PEND_MISS.PENDING_CYCLES -target-duration-type short -result-dir ./temp -data-limit=0 -- ./benchmarker

Note #1: I use vtune's "-quite" parameter. But, this parameter only suppresses the "vtune: collector paused/resumed" messages, not the messages above. The messages above, I suppose, are coming from the ittnotify library.

Note #2: My function is composed of two parts. I want to profile only the second part. Hence, I use ittnotify's pause and resume functions to pause the collection right before the first part and resume the collection right after the first part. The first and second parts of the function are executed in a loop millions of times. So, at every iteration of the loop, the messages above are printed...

Thank you very much,

Best regards,

Utku

Gopika_Intel · ‎05-24-2021

Hi,

Thank you for reaching out. For suppressing the itt messages, we will check internally and get back to you as soon as we get an update. In the meantime you can try the workaround with frame APIs

> The collection control APIs(__itt_pause/ __itt_resume) has a call frequency of about 1Hz. This operation pauses and resumes data collection in all processes in the analysis run with the corresponding collection state notification to GUI. It is not recommended to call it on frequent basis.

> To get detailed information on adding Frame APIs to your code, please refer: https://software.intel.com/content/www/us/en/develop/documentation/vtune-help/top/api-support/instrumentation-and-tracing-technology-apis/instrumentation-and-tracing-technology-api-reference/frame-api.html#frame-api

> To know more about Collection control APIs please refer: https://software.intel.com/content/www/us/en/develop/documentation/vtune-help/top/api-support/instrumentation-and-tracing-technology-apis/instrumentation-and-tracing-technology-api-reference/collection-control-api.html

Regards

Gopika

Utku · ‎05-30-2021

Hello,

Thank you for your answer.

I am trying to integrate the Frame API into my code. I have a question. How do I make sure the profiler does not collect data *until* the first __itt_frame_begin_v3() call? I understand that the profiler profiles the code sections between the __itt_frame_begin_v3() and __itt_frame_end_v3(), but how do I make sure it does not profile the code sections right before the *first*__itt_frame_begin_v3()?

Thank you very much,

Best regards,

Utku

Utku · ‎06-03-2021

Hello,

I couldn't hear from you about my questions. For my latest question, I did the following. I started profiling in -start-paused mode. Then, at the beginning of the place where I want to profile my code, I did __itt_resume(), right after which I started putting__itt_frame_begin_v3(pD, NULL) and __itt_frame_end_v3(pD, NULL) blocks. To me, it should solve the problem, but I am not sure. Does this solve my problem?

Assuming yes, I have the following problem: the runtime of my code increased almost 2x when I put __itt_frame_begin_v3 and __itt_frame_end_v3 pairs... I am not sure how reliable the numbers are when the runtime of my program is so different... Is there any solution for that? Why is the overhead of using frames is so high, when regular VTune profiling overhead (when attaching to a process, or using __itt_resume() and __itt_pause() inside the code (when called only once)) is so small? Can I use the frame API with a small overhead?

I really need to isolate a certain part of my program out of the profiling, and I am not able to achieve this for a significant amount of time by now... I would really really appreciate any help.

Thanks a lot,

Best regards,

Utku

Gopika_Intel · ‎06-04-2021

Hi,

Sorry for the delay in response. In order to address your queries and issues, could you share with us the following details?

1. A minimal sample reproducer code.

2. Steps followed or commands used for profiling

3. Results obtained after profiling

Regards

Gopika

Utku · ‎06-10-2021

Hello,

Here is a sample code. It requires two arguments that I explain what they are in the code. As you will see, there are two inner for loops inside an outer for loop. I want to profile only the second inner for loop, for which I used __itt_frame_begin_v3(pD, NULL); and __itt_frame_end_v3(pD, NULL); functions, but it increased the amount of time it takes to perform the second inner loop by almost 2x.

I run the following vtune command to profile it:

vtune -collect-with runsa -verbose -start-paused -knob event-config=L1D_PEND_MISS.PENDING,L1D_PEND_MISS.PENDING_CYCLES -target-duration-type short -result-dir $1 -data-limit=0 -- sample_main.o

Thank you very much,

Best regards,

Utku

Gopika_Intel · ‎06-15-2021

Hi,

Thank you for the information. We will get back to you.

Regards

Gopika

Gopika_Intel · ‎06-21-2021

Hi,

Thank you for your patience.

Frame APIs are different from Pause/Resume APIs. Using Frame APIs does not mean that the collection is stopped/paused outside the frame. When using frame APIs, VTune continue to gather performance data, even outside the frame, at the specified sampling rate. When the sampling interval is reached, VTune will associate the gathered data with the frame which is active at that instance and you can later view this data by grouping them as shown in snapshot below.

You will see the frames you have defined in your code along with a string “[No frame domain - Outside any frame]”. Any samples collected by VTune which cannot be associated with the user-defined frames, will show up under this section.

However, users should pay attention to the sampling interval set for data collection. The sampling interval should be well below the frame time (time taken to execute the code between frame-begin and frame-end API calls) for gathering meaningful performance data.

From your sample code, we could see that the frame APIs have been used to frame a very small piece of code. The recommended maximum rate for calling the frame API is 1000 frames per second (see our product documentation for more information). Based on your sample, you mentioned that the frame APIs created an overhead of 2x. This simply means that your code sample is executing within a very short duration and its execution time is comparable to the overhead created by the frame APIs.

__itt_frame_begin_v3(pD, NULL);
for (unsigned j = i*step_size; j < (i+1)*step_size; j++) {
     auto it = map.find(p[j]);
     if (it != map.end()) {
           s += j;
     }
}
__itt_frame_end_v3(pD, NULL);

In short, if you are profiling a piece of code that executes within few microseconds, neither Pause/Resume nor Frame APIs are suitable for this case. One way to work-around this problem is by increasing the number of loop iterations to increase the workload between the API calls. You can later do the math and calculate how much it took to execute one iteration of the loop. However this approach is not suitable if the execution time vary heavily between different iterations of the loop.

Under such circumstances, you can look at VTune Anomaly Detection which may be a viable option. Users can rely on Anomaly Detection to identify performance anomalies in frequently recurring intervals of code, like loop iterations. It can perform fine-grained analysis at the microsecond and nanosecond level. However, keep in mind that the performance data collected will be associated with each iteration of the loop, different from the way its presented when using a hotspot or microarchitecture analysis. You can check our product documentation for more information on Anomaly detection: https://software.intel.com/content/www/us/en/develop/documentation/vtune-help/top/analyze-performance/algorithm-group/anomaly-detection-analysis.html .

NOTE: The ITT messages (“vtune: Collection paused and vtune: Collection resumed.”) do not create additional overhead to the API calls. These messages are not coming from the APIs per se. Instead the APIs signal the VTune collector and the messages are displayed by the collector itself.

Hope this helps

Regards

Gopika

Gopika_Intel · ‎06-27-2021

Hi,

Has the solution provided helped? Is your query resolved? If yes, can we discontinue monitoring this thread?

Regards

Gopika

Utku · ‎06-28-2021

Hi Gopika,

Thank you for your detailed answer. I didn't have time to check the proposed solution. I haven't used VTune Anomaly Detection before, either. But, for now, we can close this ticket. I will get back to you with a new ticket if I have a question.

Thank you.

Utku

Gopika_Intel · ‎06-28-2021

Hi,

Thank you for the update. Please feel free to raise a new thread if you have any issues or queries as this thread will no longer be monitored.

Regards

Gopika