- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I want to measure following things for an application:
- TLB miss rate
- Number of cycles spent in Page Walks
- Runtime in number of cycles
I have an Intel(R) Xeon(R) Silver 4110 CPU @ 2.10GHz system.
To calculate these I am using following perf counters:
- Total number of memory references ( X ) = mem_inst_retired.all_loads:u + mem_inst_retired.all_stores:u
- Total number of memory references that missed in TLB ( Y ) =
mem_inst_retired.stlb_miss_loads:u + mem_inst_retired.stlb_miss_stores:u
- TLB miss rate = Y/X
- Number of cycles spent in Page Walks = dtlb_store_misses.walk_pending:u + dtlb_load_misses.walk_pending:u
- Runtime in number of cycles = cycles
I am confused between three parameters to count the total number of references that missed the TLB:
- dtlb_load_misses.miss_causes_a_walk + dtlb_store_misses.miss_causes_a_walk
- dtlb_load_misses.walk_completed + dtlb_store_misses.walk_completed
- mem_inst_retired.stlb_miss_loads + mem_inst_retired.stlb_miss_stores
However, when I ran the sequential array access of size 64MB. { arr[i] = i;} I am getting following values for above counters: (with THP disabled)
dtlb_store_misses.miss_causes_a_walk = 154771
dtlb_store_misses.walk_completed = 116499
mem_inst_retired.stlb_miss_stores = 15566
When I double the array size to 128 MB and then to 256 MB. These counters are also getting doubled approximately. Since, 64 MB array has 16K pages, I see that mem_inst_retired.stlb_miss_stores is giving the closest value.
Also, I didn’t see any effect of Next-page prefetcher in this as mentioned in this post ( https://community.intel.com/t5/Software-Tuning-Performance/Inconsistency-in-TLB-miss-counters/td-p/1... ). So, I suppose that my machine which has a SkyLake architecture, doesn’t have NPP.
Could you please let me know if I have chosen the right counters for my measurements?
Thanks in advance!
Best Regards,
Akshay
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
In your Y/X ratio, the count in the denominator includes only load and store requests from retired instructions (the events are described to be counted at retirement). So it makes more sense to me to use the sum of mem_inst_retired.stlb_miss_loads + mem_inst_retired.stlb_miss_stores to count what you've described as "Total number of memory references that missed in TLB."
These events are counted together. For example, if a load retires and it missed in the STLB, the event counts of mem_inst_retired.all_loads and mem_inst_retired.stlb_miss_loads are incremented and, on SKL/SKX in particular, they are incremented by the same amount, which is 1.
The STLB is the last level TLB on SKL/SKX. A miss in the STLB doesn't trigger a page walk if there is already an outstanding speculative walk initiated by the NPP. Also, there is a possibility a miss in the STLB doesn't trigger a walk if it happens that the walk that is about to start got cancelled by the time the miss determination is completed. Otherwise, a miss in the STLB triggers a walk.

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page