Analyzers
Talk to fellow users of Intel Analyzer tools (Intel VTune™ Profiler, Intel Advisor)
4995 Discussions

Number of walk cycles more than number of execution cycles

AkshayBaviskar
1,192 Views

Hi All,

I am measuring number of walk cycles of an application on an Intel(R) Xeon(R) Silver 4110 CPU @ 2.10GHz machine.  However, the number of walk cycles obtained are more than number of cycles.

54354529590       dtlb_store_misses.walk_pending:u
427005133679    dtlb_load_misses.walk_pending:u
51905087642      dtlb_store_misses.walk_active:u
249519683387    dtlb_load_misses.walk_active:u
283877210858    cycles:u

Am I using wrong counters to measure walk cycles?

Or, these walk cycles also include the walk caused due to prefetcher? In that case how do I measure only the demand walk cycles?

Any hint would be highly appreciated.

Thanks in advance!

Regards,

Akshay

0 Kudos
4 Replies
McCalpinJohn
Honored Contributor III
1,185 Views

Starting in the SKL processor, there are two Page Table Walkers per core (Intel Optimization Reference Manual section 2.3.3, document 248966-043), and it looks like you are seeing both of them in use most cycles -- averaging 1.5 load miss walks pending plus 0.2 store miss walks pending over the full execution time.

I don't think I have tested this on SKX, but in the past these performance counter events only counted activity due to demand references -- not activity due to the next-page-prefetcher.  

Based on the definitions of these events in Tables 19-6 of the Intel SWDM Volume 3 (document 325384-073), the DTLB_LOAD_MISSES.WALK_ACTIVE event counts cycles in which each least one Page Miss Handler (PMH) is active, while DTLB_LOAD_MISSES.WALK_PENDING increments by the number of PMHs that are active in each cycle.    Your results show:

  • In cycles with at least one PMH handling a load miss, there were an average of 1.71 PMHs active handling loads. (load.walk_pending/load.walk_active)
  • In cycles with at least one PMH handling a store miss, there were an average of 1.05 PMHs active handling stores.  (store.walk_pending/store.walk_active)

Also

  • 88% of cycles had at least one PMH handling a load (load.walk_active.cycles)
  • 18% of cycles had at least one PMH handling a store (store.walk_active/cycles)
  • The combination of these two imply that about 6% of the cycles had to have one PMH busy handling a load and one PMH busy handling a store

 

McCalpinJohn
Honored Contributor III
1,152 Views

With a little more magic middle-school algebra, I think I derived bounds on the breakdown of activity by cycle.   There are six possible categories of activity and only five data items, so bounds are the most one can hope for....

PMH0 activity PMH1 activity % of time with minimum overlap of LD and ST TLB misses % of time with maximum overlap of LD and ST TLB misses
LD LD 62.5%
ST ST 0.9%
LD ST 6.2% 17.4%
LD (idle) 19.2% 8.0%
(idle) LD 11.2% 0.0%
(idle) (idle) 0.0% 11.2%
0 Kudos
RaeesaM_Intel
Moderator
1,108 Views

Hi,


Is your issue resolved ? Can you share an update on this issue .


Raeesa


0 Kudos
RaeesaM_Intel
Moderator
1,055 Views

Hi,


We haven't heard back from you. We are assuming that the solution provided helped and would no longer be monitoring this issue. Please raise a new thread if you have further issues.


Regards,

Raeesa


0 Kudos
Reply