Re:Advisor report meaning

JEROME_B_Intel1 · ‎05-17-2023

I am trying to understand the meaning of some Advisor output for a program Iran on a PVC. I have a report which includes Active, Stalled and Idle percentages by kernel.

I am guessing that a given EU is active if it is executing instructions, idle if it is not assigned to any task, and stalled if it is waiting for data from memory. Is that true? Is there documentation for this?

Assuming that's right, what does it mean to say that the kernel was active only 8.5%, stalled 46.6%, and idle 44.8% ? Is that an average over all EUs? Does that mean that, during the period that kernel was running tasks, that was the status of the whole GPU?

The report also lists a kernel called [Outside any task]. Does that refer to the time when the GPU was reserved for the process, but no kernels were running? That would make sense, because it says there were 0 GFLOPS and 0 GINTOPS for that "kernel". But then, what does it mean to say that kernel was active 8.4%, stalled 33.5%, and idle 58.1%? I would think that when it is not running kernels, it is 100% idle.

AlekhyaV_Intel · ‎05-22-2023

Hi,

Thank you for posting in Intel Communities. We have addressed your questions below:

Q. A given EU is active if it is executing instructions, idle if it is not assigned to any task, and stalled if it is waiting for data from memory.

a. Is that true?

b. Is there documentation for this?

EU stands for Execution Unit, which is a processing unit within a GPU. EU arrays refer to multiple sets of these execution units.

1. Active: EU arrays refer to multiple sets of these execution units. When the EU arrays are active, it means that they are actively executing tasks or processing data. This typically indicates that the GPU is performing computations or rendering graphics.

2. Stalled: Stalling refers to a situation where the execution of tasks or data processing is temporarily halted or delayed. When the EU arrays are stalled, it suggests that they are not currently able to proceed with their operations due to dependencies, resource limitations, or other factors. Stalling can occur when there are dependencies between tasks, such as when one task relies on the results of another task that has not completed yet.

3. Idle: When the EU arrays are idle, it means that they are not actively processing any tasks or data. They are in a state of inactivity and are not engaged in computations or rendering. This can occur when there is a lack of work or when the GPU is not being utilized to its full potential.

All the metrics(percentages) visible in the EU array are broke down to cycles by activity in GPU core arrays.

Here's the documentation explaining all the Acceleration metrics(https://www.intel.com/content/www/us/en/docs/advisor/user-guide/2023-0/accelerator-metrics.html)

You could also export the report to a HTML file or run the analysis in a GUI so that you could hover over each terminologies and get the meaning of it.

For viewing HTML reports, please refer: https://www.intel.com/content/www/us/en/docs/advisor/user-guide/2023-0/work-with-standalone-html-reports.html

You could export GPU Roofline report from CLI using below command:

advisor --report=all --project-dir=./advi_results --report-output=./gpu_roofline_report.html

Q. what does it mean to say that the kernel was active only 8.5%, stalled 46.6%, and idle 44.8% ? Is that an average over all EUs?

Does that mean that, during the period that kernel was running tasks, that was the status of the whole GPU?

Your anticipation is correct. It shows how effectively the particular kernel uses the GPU resources.

Q. The report also lists a kernel called [Outside any task]. Does that refer to the time when the GPU was reserved for the process, but no kernels were running? That would make sense, because it says there were 0 GFLOPS and 0 GINTOPS for that "kernel". But then, what does it mean to say that kernel was active 8.4%, stalled 33.5%, and idle 58.1%? I would think that when it is not running kernels, it is 100% idle.

Regarding the “Outside any Task” module, we are working on it internally, we will get back to you soon with an update.

Regards,

Alekhya

JEROME_B_Intel1 · ‎05-24-2023

Thanks, this is helpful. I have another question. One of the columns in the Advisor report is "Threads Started". I am trying to understand what that means. I had assumed that a thread would be started for every work item in every execution of a kernel. That is, that Threads Started would equal Calls * WG_size. That does not appear to be the case.

I have a kernel for which I got these figures;

Top GPU Hotspots:
Kernel Time Calls Active Stalled Idle EU Occupancy Threads Started
_________________________________________________________________________________________________________________
advec_mom_vol_ocl_kernel 3.024s 11,820 8.5% 46.7% 44.9% 50.9% 1,298,427,058

I watched a (rather dated) video in which it was suggested that a kernel that was stalled a lot was "thread-launch limited", ie, was stalled because the dispatcher could not launch threads fast enough. So I altered my kernel so that it handles two work items. That is, it used to process a single (x,y) pair. Now it processes (x,y) and (2x,y). And I cut the x range in half. I thought this would cut the number of threads launched in half. But here is what I got

Kernel Time Calls Active Stalled Idle EU Occupancy Threads Started

_________________________________________________________________________________________________________________

advec_mom_vol_ocl_kernel 3.768s 11,820 7.6% 49.5% 42.9% 53.1% 1,189,177,329

What am I missing here?

AlekhyaV_Intel · ‎05-31-2023

Hi,

We would like to have a sample reproducer with which you're trying GPU roofline analysis. Could you please provide all the details i.e. the code, compilation steps, application so that we could triage your issue further?

Regards,

Alekhya

JEROME_B_Intel1 · ‎05-31-2023

I am happy to help, but this is a fairly large code base, and you will need a PVC machine to run it on. You can get a copy using

git clone https://github.com/intel-innersource/applications.hpc.workloads.cloverleaf.cloverleaf-opencl.git

and I can provide build instructions, but they may not work on your machine. It is set up to build and run on the ORTCE machines, which use modules to configure the environment.

I will also be happy run the app on ORTCE and provide advisor output, or other profiler output. I am currently trying to improve the performance, so I am happy to run any diagnostics you think would be helpful.

AlekhyaV_Intel · ‎06-14-2023

Hi,

Thanks for sharing the sample reproducer, We have forwarded it to respective team. Meanwhile, we have an update on "Threads Started" question. The above metric "Threads stared" gets read by Advisor. That number gets calculated by compiler (IGC) or L0 driver. Advisor has nothing to do with the metric.

Regards,

Alekhya

AlekhyaV_Intel · ‎07-24-2023

Hi,

As we are discussing this issue internally, we are closing this thread. If you need any further information, please post a new question as this thread is no longer monitored by Intel.

Regards,

Alekhya