Hi All,
I am measuring number of walk cycles of an application on an Intel(R) Xeon(R) Silver 4110 CPU @ 2.10GHz machine. However, the number of walk cycles obtained are more than number of cycles.
54354529590 dtlb_store_misses.walk_pending:u
427005133679 dtlb_load_misses.walk_pending:u
51905087642 dtlb_store_misses.walk_active:u
249519683387 dtlb_load_misses.walk_active:u
283877210858 cycles:u
Am I using wrong counters to measure walk cycles?
Or, these walk cycles also include the walk caused due to prefetcher? In that case how do I measure only the demand walk cycles?
Any hint would be highly appreciated.
Thanks in advance!
Regards,
Akshay
Link Copied
Starting in the SKL processor, there are two Page Table Walkers per core (Intel Optimization Reference Manual section 2.3.3, document 248966-043), and it looks like you are seeing both of them in use most cycles -- averaging 1.5 load miss walks pending plus 0.2 store miss walks pending over the full execution time.
I don't think I have tested this on SKX, but in the past these performance counter events only counted activity due to demand references -- not activity due to the next-page-prefetcher.
Based on the definitions of these events in Tables 19-6 of the Intel SWDM Volume 3 (document 325384-073), the DTLB_LOAD_MISSES.WALK_ACTIVE event counts cycles in which each least one Page Miss Handler (PMH) is active, while DTLB_LOAD_MISSES.WALK_PENDING increments by the number of PMHs that are active in each cycle. Your results show:
Also
With a little more magic middle-school algebra, I think I derived bounds on the breakdown of activity by cycle. There are six possible categories of activity and only five data items, so bounds are the most one can hope for....
PMH0 activity | PMH1 activity | % of time with minimum overlap of LD and ST TLB misses | % of time with maximum overlap of LD and ST TLB misses |
LD | LD | 62.5% | |
ST | ST | 0.9% | |
LD | ST | 6.2% | 17.4% |
LD | (idle) | 19.2% | 8.0% |
(idle) | LD | 11.2% | 0.0% |
(idle) | (idle) | 0.0% | 11.2% |
Hi,
Is your issue resolved ? Can you share an update on this issue .
Raeesa
For more complete information about compiler optimizations, see our Optimization Notice.