Analyzers
Talk to fellow users of Intel Analyzer tools (Intel VTune™ Profiler, Intel Advisor)
4994 Discussions

VTune: Relation between "Cycles of 1 port utilized" and "L1 Bound"

HarshVardhanKumar
New Contributor I
1,468 Views

The L1 bound in VTune user guide https://software.intel.com/content/www/us/en/develop/documentation/vtune-help/top/reference/cpu-metrics-reference/l1-bound.html is defined as "how often machine was stalled without missing the L1 data cache. "

I assume this meant the overhead due to TLB (address translations). However, further in the next paragraph the guide says, "this metric value may be highlighted due to DTLB Overhead or Cycles of 1 Port Utilized issues.".

Now, it is understandable why DTLB Overhead may contribute to this. But I fail to understand why Cycles of 1 Port Utilized (dependency issues) will affect the L1 bound?

If there are data dependencies, the HW prefetcher more likely would've brought that data to the caches itself (again, no misses at L1).

 

If it is only due to computation dependencies (the data is still not ready), then it makes sense. Just wanted to make sure that this indeed is the case.

 

Thanks.

0 Kudos
1 Solution
Dmitry_R_Intel1
Employee
1,451 Views

Let me post a description of the Cycles of 1 Port Utilized metric which has some hints into this question:

"This metric represents cycles fraction where the CPU executed total of 1 uop per cycle on all execution ports. This can be due to heavy data-dependency among software instructions, or oversubscribing a particular hardware resource. In some other cases with high Cycles of 1 Port Utilized and L1 Bound, this metric can point to L1 data-cache latency bottleneck that may not necessarily manifest with complete execution starvation (due to the short L1 latency e.g. walking a linked list) - looking at the assembly can be helpful."

View solution in original post

0 Kudos
3 Replies
Dmitry_R_Intel1
Employee
1,452 Views

Let me post a description of the Cycles of 1 Port Utilized metric which has some hints into this question:

"This metric represents cycles fraction where the CPU executed total of 1 uop per cycle on all execution ports. This can be due to heavy data-dependency among software instructions, or oversubscribing a particular hardware resource. In some other cases with high Cycles of 1 Port Utilized and L1 Bound, this metric can point to L1 data-cache latency bottleneck that may not necessarily manifest with complete execution starvation (due to the short L1 latency e.g. walking a linked list) - looking at the assembly can be helpful."

0 Kudos
HarshVardhanKumar
New Contributor I
1,426 Views
0 Kudos
RaeesaM_Intel
Moderator
1,398 Views

Hi,


Thank you for accepting the solution provided by Dmitry .

If you need any additional information, please post a new question as this thread will no longer be monitored by Intel.


Regards,

Raeesa




0 Kudos
Reply