Community
cancel
Showing results for 
Search instead for 
Did you mean: 
HarshVardhanKumar
New Contributor I
215 Views

VTune: Relation between "Cycles of 1 port utilized" and "L1 Bound"

Jump to solution

The L1 bound in VTune user guide https://software.intel.com/content/www/us/en/develop/documentation/vtune-help/top/reference/cpu-metr... is defined as "how often machine was stalled without missing the L1 data cache. "

I assume this meant the overhead due to TLB (address translations). However, further in the next paragraph the guide says, "this metric value may be highlighted due to DTLB Overhead or Cycles of 1 Port Utilized issues.".

Now, it is understandable why DTLB Overhead may contribute to this. But I fail to understand why Cycles of 1 Port Utilized (dependency issues) will affect the L1 bound?

If there are data dependencies, the HW prefetcher more likely would've brought that data to the caches itself (again, no misses at L1).

 

If it is only due to computation dependencies (the data is still not ready), then it makes sense. Just wanted to make sure that this indeed is the case.

 

Thanks.

Tags (1)
0 Kudos
1 Solution
Dmitry_R_Intel1
Employee
198 Views

Let me post a description of the Cycles of 1 Port Utilized metric which has some hints into this question:

"This metric represents cycles fraction where the CPU executed total of 1 uop per cycle on all execution ports. This can be due to heavy data-dependency among software instructions, or oversubscribing a particular hardware resource. In some other cases with high Cycles of 1 Port Utilized and L1 Bound, this metric can point to L1 data-cache latency bottleneck that may not necessarily manifest with complete execution starvation (due to the short L1 latency e.g. walking a linked list) - looking at the assembly can be helpful."

View solution in original post

3 Replies
Dmitry_R_Intel1
Employee
199 Views

Let me post a description of the Cycles of 1 Port Utilized metric which has some hints into this question:

"This metric represents cycles fraction where the CPU executed total of 1 uop per cycle on all execution ports. This can be due to heavy data-dependency among software instructions, or oversubscribing a particular hardware resource. In some other cases with high Cycles of 1 Port Utilized and L1 Bound, this metric can point to L1 data-cache latency bottleneck that may not necessarily manifest with complete execution starvation (due to the short L1 latency e.g. walking a linked list) - looking at the assembly can be helpful."

View solution in original post

HarshVardhanKumar
New Contributor I
173 Views

Hey, Thanks! I missed that line...

RaeesaM_Intel
Moderator
145 Views

Hi,


Thank you for accepting the solution provided by Dmitry .

If you need any additional information, please post a new question as this thread will no longer be monitored by Intel.


Regards,

Raeesa




Reply