Solved: function-level profile analysis questions

martinyyyy · ‎05-23-2022

Hi,

I would like to do function-level profile analysis(Microarchitecture Exploration)

After reading the docs, I knew that I can enable function-level analysis by set pmu-collection-mode=detailed (from this). Then I can use the vTune to generate the hotspots report to see the metrics per function.

My question is: Are the metrics for a single function self-independent?

For example, I want to get the metrics for the func_A : )

int fun_B() {
     …
}



int fun_A() {
     …
     fun_B();
}

// I will get the metrics from the report
// fun_A:  <metrics of fun_A>
// fun_B:  <metrics of fun_B>

Can I use the metrics of fun_A directly or should I do some computations based on the metrics of the metrics of fun_A and metrics of fun_B ?

It's strange to see the pmu-collection-mode=detailed and pmu-collection-mode=summary generate different results.

Retiring(%)	…	Back-End Bound(%)	mode
84.1	…	1.8	detailed
50.3	…	23.2	summary

The difference between the two is obvious, so which one is accurate?

It would be great to know how the two modes combine the metrics and generate the overviews. Am I missing something?

Thank you for your help.

Regards

JyothisV_Intel · ‎11-01-2022

Hi,

Good day to you.

Sorry for the long delay in responding back to you.

>> Are the metrics for a single function self-independent? Can I use the metrics of fun_A directly or should I do some computations based on the metrics of fun_A and metrics of fun_B?

The metrics for a single function can be both self-independent and dependent based on the pane selected to view the results. For example, for the CPU time displayed in the top-down view, fun_A's metrics will contain fun_B's metrics, but in bottom-up view, fun_A's metrics will not contain fun_B's metrics. You can view the metrics clearly for micro-architecture exploration via the GUI for better understanding of this.

>> PMU collection mode detailed and summary generate different results. Which one is accurate? How the two modes combine the metrics and generate the overviews?

The data collected via PMU collection modes varies. The detailed collection mode uses Driverless Perf system-wide sampling whereas the summary collection mode uses Driverless Perf per-process counting. PMU Detailed collection is the more accurate collection method since it is a system-wide sampling rather than process-wide counting. When doing profiling in both PMU Summary and Detailed collection modes, the collector type is displayed differently by VTune. You can observe this in the report generated via both CLI and GUI (screenshots attached).

>> Retiring + Front-End Bound + Bad Speculation + Back-End Bound should be equal to 100%

Retiring + Front-End Bound + Bad Speculation + Back-End Bound may not be equal to 100%. This can happen due to the nature of sampling methodology VTune takes. In general, sampling methodology will not be able to provide 100% accurate data. Due to the complexities of implementation to cause underestimates or overestimates, using multiple runs option could help approximate more accurate data. In general, the statistical portions of pipeline categories/classifications would serve as meaningful data to for categorizing the performance bottleneck problems.

Hope this helps. If this clears your query, kindly click the “Accept as Solution” button to indicate that your issue is resolved. This will also help others with a similar issue.

Thanks and Regards,

Jyothis V James

View solution in original post

martinyyyy · ‎05-25-2022

Update:

I don't know why I can not just edit my origin post. So I have to comment here.

I found something strange in the result of each function.

Function         Retiring(%)  Front-End Bound(%)  Bad Speculation(%)  Back-End Bound(%)
my_main          91.3         10.1                0.0                 15.5
entry_SYSCALL_64 67.0         40.2                40.2                0.0
memset_erms      100.0        52.5                52.5                0.0

As far as I know, the Retiring + Front-End Bound + Bad Speculation + Back-End Bound should be equal to 100%.

I have no idea why this happened occasionally ？

JyothisV_Intel · ‎05-25-2022

Hi,

Good day to you.

Thanks for posting in Intel Communities.

> Are the metrics for a single function self-independent?

> Can I use the metrics of fun_A directly or should I do some computations based on the metrics of the metrics of fun_A and metrics of fun_B?

> How the two modes combine the metrics and generate the overviews?

> Retiring + Front-End Bound + Bad Speculation + Back-End Bound should be equal to 100%

We are checking internally to provide you with a clear reply to the above questions.

Regarding the results collected in "detailed" and "summary" pmu collection modes, we are unable to replicate that issue from our side. We tried replicating the issue with different collection modes for Microarchitecture Exploration (Detailed: r023ue and Summary: r024ue, refer screenshot: collection_mode.PNG) using the matrix multiplication VTune sample and the results obtained are within error tolerances of the analysis (refer comparison screenshot: comparison.PNG). The analysis was done on a Windows 10 system environment that runs the latest version of VTune 2022.2.0.

To replicate your issue better, can you share a sample reproducer code. It would also be helpful if you could provide the exact command or steps that you followed to perform the analysis.

Regards,

Jyothis V James

martinyyyy · ‎05-26-2022

Hi,

Good day to you. Thank you very much for replying to me.

I follow the instructions to install the vTune in my Ubuntu 20.04.1 Server(2022.01). I also install the dbgsym for Ubuntu. Both the /proc/sys/kernel/perf_event_paranoid and /proc/sys/kernel/kptr_restrict are set to 0.

The output shows that the vTune is using the sampling drivers for collecting events.

Steps to reproduce

1. Download the minimal.tar.gz

2. Just run the script file I provided and check the generated report. If you want to switch between modes, just change the specific line in the script file.

I picked up some lines to show the problem.

Function                      Retiring(%)  Front-End Bound(%)  Bad Speculation(%)  Back-End Bound(%)
update                        40.2         15.4                13.0                31.4
pushup                        51.5         21.3                0.0                 39.7
func@0x63a30                  15.7         10.5                44.6                29.2
std::min<int>                 93.7         44.1                66.1                0.0
std::swap<node>
cmp                           100.0        0.0                 0.0                 30.2

The result of the detailed mode

Retiring%       Front-End Bound%        Bad Speculation%        Back-End Bound%
57.4            10.4                    4.1                     28.1

Change the pmu-collection-mode=summary in the script, and run the script again

The result of the summary mode

Retiring%       Front-End Bound%        Bad Speculation%        Back-End Bound%
17.0            5.1                     1.3                     76.6

Regards,

Martin

JyothisV_Intel · ‎06-19-2022

Hi,

Good day to you.

Sorry for the delay and inconvenience caused.

Thanks for providing us with the detailed steps and logs to replicate the issue from our side. We were able to replicate the issue and have informed the internal team regarding your queries.

We will get back to you with an update soon.

Regards,

Jyothis V James

JyothisV_Intel · ‎11-01-2022

Hi,

Good day to you.

Sorry for the long delay in responding back to you.

>> Are the metrics for a single function self-independent? Can I use the metrics of fun_A directly or should I do some computations based on the metrics of fun_A and metrics of fun_B?

The metrics for a single function can be both self-independent and dependent based on the pane selected to view the results. For example, for the CPU time displayed in the top-down view, fun_A's metrics will contain fun_B's metrics, but in bottom-up view, fun_A's metrics will not contain fun_B's metrics. You can view the metrics clearly for micro-architecture exploration via the GUI for better understanding of this.

>> PMU collection mode detailed and summary generate different results. Which one is accurate? How the two modes combine the metrics and generate the overviews?

The data collected via PMU collection modes varies. The detailed collection mode uses Driverless Perf system-wide sampling whereas the summary collection mode uses Driverless Perf per-process counting. PMU Detailed collection is the more accurate collection method since it is a system-wide sampling rather than process-wide counting. When doing profiling in both PMU Summary and Detailed collection modes, the collector type is displayed differently by VTune. You can observe this in the report generated via both CLI and GUI (screenshots attached).

>> Retiring + Front-End Bound + Bad Speculation + Back-End Bound should be equal to 100%

Retiring + Front-End Bound + Bad Speculation + Back-End Bound may not be equal to 100%. This can happen due to the nature of sampling methodology VTune takes. In general, sampling methodology will not be able to provide 100% accurate data. Due to the complexities of implementation to cause underestimates or overestimates, using multiple runs option could help approximate more accurate data. In general, the statistical portions of pipeline categories/classifications would serve as meaningful data to for categorizing the performance bottleneck problems.

Hope this helps. If this clears your query, kindly click the “Accept as Solution” button to indicate that your issue is resolved. This will also help others with a similar issue.

Thanks and Regards,

Jyothis V James

martinyyyy · ‎11-01-2022

Hi,

Good day to you : )

Thanks for your detailed explanation, which makes things more clear.

Thanks and Regards,

Martin

JyothisV_Intel · ‎11-01-2022

Hi,

Thanks for accepting the solution. We will no longer monitor this thread.

If you require additional assistance from Intel, please start a new thread. Any further interaction in this thread will be considered community only.

Regards,

Jyothis V James