Number of walk cycles more than number of execution cycles

AkshayBaviskar · ‎02-12-2021

Hi All,

I am measuring number of walk cycles of an application on an Intel(R) Xeon(R) Silver 4110 CPU @ 2.10GHz machine. However, the number of walk cycles obtained are more than number of cycles.

54354529590 dtlb_store_misses.walk_pending:u
427005133679 dtlb_load_misses.walk_pending:u
51905087642 dtlb_store_misses.walk_active:u
249519683387 dtlb_load_misses.walk_active:u
283877210858 cycles:u

Am I using wrong counters to measure walk cycles?

Or, these walk cycles also include the walk caused due to prefetcher? In that case how do I measure only the demand walk cycles?

Any hint would be highly appreciated.

Thanks in advance!

Regards,

Akshay

McCalpinJohn · ‎02-12-2021

Starting in the SKL processor, there are two Page Table Walkers per core (Intel Optimization Reference Manual section 2.3.3, document 248966-043), and it looks like you are seeing both of them in use most cycles -- averaging 1.5 load miss walks pending plus 0.2 store miss walks pending over the full execution time.

I don't think I have tested this on SKX, but in the past these performance counter events only counted activity due to demand references -- not activity due to the next-page-prefetcher.

Based on the definitions of these events in Tables 19-6 of the Intel SWDM Volume 3 (document 325384-073), the DTLB_LOAD_MISSES.WALK_ACTIVE event counts cycles in which each least one Page Miss Handler (PMH) is active, while DTLB_LOAD_MISSES.WALK_PENDING increments by the number of PMHs that are active in each cycle. Your results show:

In cycles with at least one PMH handling a load miss, there were an average of 1.71 PMHs active handling loads. (load.walk_pending/load.walk_active)
In cycles with at least one PMH handling a store miss, there were an average of 1.05 PMHs active handling stores. (store.walk_pending/store.walk_active)

Also

88% of cycles had at least one PMH handling a load (load.walk_active.cycles)
18% of cycles had at least one PMH handling a store (store.walk_active/cycles)
The combination of these two imply that about 6% of the cycles had to have one PMH busy handling a load and one PMH busy handling a store

McCalpinJohn · ‎02-13-2021

With a little more magic middle-school algebra, I think I derived bounds on the breakdown of activity by cycle. There are six possible categories of activity and only five data items, so bounds are the most one can hope for....

PMH0 activity	PMH1 activity	% of time with minimum overlap of LD and ST TLB misses	% of time with maximum overlap of LD and ST TLB misses
LD	LD	62.5%
ST	ST	0.9%
LD	ST	6.2%	17.4%
LD	(idle)	19.2%	8.0%
(idle)	LD	11.2%	0.0%
(idle)	(idle)	0.0%	11.2%

RaeesaM_Intel · ‎02-24-2021

Hi,

Is your issue resolved ? Can you share an update on this issue .

Raeesa

RaeesaM_Intel · ‎03-05-2021

Hi,

We haven't heard back from you. We are assuming that the solution provided helped and would no longer be monitoring this issue. Please raise a new thread if you have further issues.

Regards,

Raeesa