Analyzers
Support for Analyzers (Intel VTune™ Profiler, Intel Advisor, Intel Inspector)
Announcements
This community is designed for sharing of public information. Please do not share Intel or third-party confidential information here.
4682 Discussions

VTune: Relation between "Cycles of 1 port utilized" and "L1 Bound"

HarshVardhanKumar
New Contributor I
586 Views

The L1 bound in VTune user guide https://software.intel.com/content/www/us/en/develop/documentation/vtune-help/top/reference/cpu-metr... is defined as "how often machine was stalled without missing the L1 data cache. "

I assume this meant the overhead due to TLB (address translations). However, further in the next paragraph the guide says, "this metric value may be highlighted due to DTLB Overhead or Cycles of 1 Port Utilized issues.".

Now, it is understandable why DTLB Overhead may contribute to this. But I fail to understand why Cycles of 1 Port Utilized (dependency issues) will affect the L1 bound?

If there are data dependencies, the HW prefetcher more likely would've brought that data to the caches itself (again, no misses at L1).

 

If it is only due to computation dependencies (the data is still not ready), then it makes sense. Just wanted to make sure that this indeed is the case.

 

Thanks.

0 Kudos
1 Solution
Dmitry_R_Intel1
Employee
569 Views

Let me post a description of the Cycles of 1 Port Utilized metric which has some hints into this question:

"This metric represents cycles fraction where the CPU executed total of 1 uop per cycle on all execution ports. This can be due to heavy data-dependency among software instructions, or oversubscribing a particular hardware resource. In some other cases with high Cycles of 1 Port Utilized and L1 Bound, this metric can point to L1 data-cache latency bottleneck that may not necessarily manifest with complete execution starvation (due to the short L1 latency e.g. walking a linked list) - looking at the assembly can be helpful."

View solution in original post

3 Replies
Dmitry_R_Intel1
Employee
570 Views

Let me post a description of the Cycles of 1 Port Utilized metric which has some hints into this question:

"This metric represents cycles fraction where the CPU executed total of 1 uop per cycle on all execution ports. This can be due to heavy data-dependency among software instructions, or oversubscribing a particular hardware resource. In some other cases with high Cycles of 1 Port Utilized and L1 Bound, this metric can point to L1 data-cache latency bottleneck that may not necessarily manifest with complete execution starvation (due to the short L1 latency e.g. walking a linked list) - looking at the assembly can be helpful."

HarshVardhanKumar
New Contributor I
544 Views
RaeesaM_Intel
Moderator
516 Views

Hi,


Thank you for accepting the solution provided by Dmitry .

If you need any additional information, please post a new question as this thread will no longer be monitored by Intel.


Regards,

Raeesa




Reply