Software Tuning, Performance Optimization & Platform Monitoring
Discussion regarding monitoring and software tuning methodologies, Performance Monitoring Unit (PMU) of Intel microprocessors, and platform updating.

Low MEM_LOAD_RETIRED.* counts on CXL memory

ms88
Débutant
102 Visites

Hello,

I am running experiments comparing memory monitoring behavior between DRAM and CXL. Underlying hardware is Intel Xeon Platinum 8468 (Sapphire Rapids, family 6, model 143, stepping 8). Memory layout:

RANGE                                  SIZE  STATE NODE
0x0000000000000000-0x000000007fffffff    2G online    0
0x0000000100000000-0x000000407fffffff  254G online    0
0x0000004080000000-0x000000807fffffff  256G online    1
0x0000008080000000-0x000000c07fffffff  256G online    2

I use a program that has a fixed operations count. CXL memory exposed as cpu-less numa node.

# DRAM
Performance counter stats for 'numactl --membind 0 --cpunodebind 0 taskset -c 0-7 gupstoy 64G 0 0':
   187,328,768,167      instructions
       869,840,754      MEM_LOAD_RETIRED.L1_MISS
       844,497,107      MEM_LOAD_RETIRED.L2_MISS
       768,242,564      MEM_LOAD_RETIRED.L3_MISS

# CXL
 Performance counter stats for 'numactl --membind 2 --cpunodebind 0 taskset -c 0-7 gupstoy 64G 0 0':
   188,135,204,015      instructions
       147,051,380      MEM_LOAD_RETIRED.L1_MISS
       107,168,112      MEM_LOAD_RETIRED.L2_MISS
        49,201,455      MEM_LOAD_RETIRED.L3_MISS
I have also verified these counters are working as expected on persistent memory and remote dram (they gave similar numbers for both the cases). 
 
Also when i check the other counter `OFFCORE_REQUESTS.L3_MISS_DEMAND_DATA_RD`: this one is increasing how i expected. However `retired` ones still very low. 
Performance counter stats for 'numactl --membind 0 --cpunodebind 0 taskset -c 0-7 gupstoy 64G 0 0':
   187,331,953,610      instructions
       768,895,926      mem_load_retired.l3_miss
     1,427,617,196      OFFCORE_REQUESTS.L3_MISS_DEMAND_DATA_RD

Performance counter stats for 'numactl --membind 2 --cpunodebind 0 taskset -c 0-7 gupstoy 64G 0 0':
   188,146,385,927      instructions
        49,121,066      mem_load_retired.l3_miss
     1,437,029,250      OFFCORE_REQUESTS.L3_MISS_DEMAND_DATA_RD

My question is that, are there any known limitations or restrictions for retired mem events (MEM_LOAD_RETIRED.*) on Sapphire Rapids when memory is CXL-attached?

Thanks

0 Compliments
0 Réponses
Répondre