I want to measure following things for an application:
I have an Intel(R) Xeon(R) Silver 4110 CPU @ 2.10GHz system.
To calculate these I am using following perf counters:
mem_inst_retired.stlb_miss_loads:u + mem_inst_retired.stlb_miss_stores:u
I am confused between three parameters to count the total number of references that missed the TLB:
However, when I ran the sequential array access of size 64MB. { arr[i] = i;} I am getting following values for above counters: (with THP disabled)
dtlb_store_misses.miss_causes_a_walk = 154771
dtlb_store_misses.walk_completed = 116499
mem_inst_retired.stlb_miss_stores = 15566
When I double the array size to 128 MB and then to 256 MB. These counters are also getting doubled approximately. Since, 64 MB array has 16K pages, I see that mem_inst_retired.stlb_miss_stores is giving the closest value.
Also, I didn’t see any effect of Next-page prefetcher in this as mentioned in this post ( https://community.intel.com/t5/Software-Tuning-Performance/Inconsistency-in-TLB-miss-counters/td-p/1... ). So, I suppose that my machine which has a SkyLake architecture, doesn’t have NPP.
Could you please let me know if I have chosen the right counters for my measurements?
Thanks in advance!
Best Regards,
Akshay
Link Copied
For more complete information about compiler optimizations, see our Optimization Notice.