Software Tuning, Performance Optimization & Platform Monitoring
Discussion around monitoring and software tuning methodologies, Performance Monitoring Unit (PMU) of Intel microprocessors, and platform monitoring

VTune: Compare row-buffer locality of two programs

HarshVardhanKumar
New Contributor I
415 Views

I wish to show that a certain optimization makes a program stall less due to improved row-buffer locality (DRAM). I looked at all the VTune metrics. There are a couple of metrics that talk about memory-bandwidth stalls and memory-bound, store-bound, etc.

I'm not sure which metric to use for my purpose. Any comments will be helpful.

Thanks

0 Kudos
1 Reply
McCalpinJohn
Black Belt
385 Views

For the server processors, the performance counters in the IMC support events that can be used to measure open-page and closed-page accesses.  The nomenclature differs a bit from what I have seen elsewhere in the industry, but it is not hard to translate.

In the Xeon systems at the Texas Advanced Computer Center, we program the four IMC performance counters on each DDR4 channel to measure: 

  1. CAS_COUNT.RD  -- all DRAM read accesses
  2. CAS_COUNT.WR -- all DRAM write accesses
  3. ACT_COUNT.ALL -- all DRAM ACTIVATE commands
  4. PRE_COUNT.PAGE_MISS -- all DRAM pages closed due to row conflict

The formulas for converting these four values to the page hit and miss rates are included in the uncore performance monitoring reference manual for each server processor.

  • Page Conflict ratio = PRE_COUNT.PAGE_MISS / (CAS_COUNT.RD + CAS_COUNT.WR)
  • Page Empty ratio = (ACT_COUNT.ALL - PRE_COUNT.PAGE_MISS) / (CAS_COUNT.RD + CAS_COUNT.WR)
  • Page Hit ratio = 1 - PageConflictRatio - PageEmptyRatio

These counters appear to be reliable on all the systems I have tested, but finding useful information in the values is challenging.

Reply