Perf counters for measuring TLB miss rate

I want to measure following things for an application:

  1. TLB miss rate
  2. Number of cycles spent in Page Walks
  3. Runtime in number of cycles

I have an Intel(R) Xeon(R) Silver 4110 CPU @ 2.10GHz system. 

To calculate these I am using following perf counters:

  1. Total number of memory references ( X ) = mem_inst_retired.all_loads:u + mem_inst_retired.all_stores:u
  2. Total number of memory references that missed in TLB ( Y ) = 

mem_inst_retired.stlb_miss_loads:u + mem_inst_retired.stlb_miss_stores:u

  1. TLB miss rate = Y/X
  2. Number of cycles spent in Page Walks = dtlb_store_misses.walk_pending:u + dtlb_load_misses.walk_pending:u
  3. Runtime in number of cycles = cycles

I am confused between three parameters to count the total number of references that missed the TLB:

  1. dtlb_load_misses.miss_causes_a_walk + dtlb_store_misses.miss_causes_a_walk
  2. dtlb_load_misses.walk_completed + dtlb_store_misses.walk_completed
  3. mem_inst_retired.stlb_miss_loads + mem_inst_retired.stlb_miss_stores

However, when I ran the sequential array access of size 64MB. { arr[i] = i;} I am getting following values for above counters: (with THP disabled)

dtlb_store_misses.miss_causes_a_walk = 154771

dtlb_store_misses.walk_completed = 116499

mem_inst_retired.stlb_miss_stores = 15566

When I double the array size to 128 MB and then to 256 MB. These counters are also getting doubled approximately. Since, 64 MB array has 16K pages, I see that mem_inst_retired.stlb_miss_stores is giving the closest value.

Also, I didn’t see any effect of Next-page prefetcher in this as mentioned in this post ( ). So, I suppose that my machine which has a SkyLake architecture, doesn’t have NPP.

Could you please let me know if I have chosen the right counters for my measurements?

Thanks in advance!

Best Regards,


