<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Low MEM_LOAD_RETIRED.* counts on CXL memory in Software Tuning, Performance Optimization &amp; Platform Monitoring</title>
    <link>https://community.intel.com/t5/Software-Tuning-Performance/Low-MEM-LOAD-RETIRED-counts-on-CXL-memory/m-p/1714271#M8564</link>
    <description>&lt;P&gt;Hello,&lt;/P&gt;&lt;P&gt;I am running experiments comparing memory monitoring behavior between DRAM and CXL. Underlying hardware is Intel Xeon Platinum 8468 (Sapphire Rapids, family 6, model 143, stepping 8). Memory layout:&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;RANGE                                  SIZE  STATE NODE
0x0000000000000000-0x000000007fffffff    2G online    0
0x0000000100000000-0x000000407fffffff  254G online    0
0x0000004080000000-0x000000807fffffff  256G online    1
0x0000008080000000-0x000000c07fffffff  256G online    2&lt;/LI-CODE&gt;&lt;P&gt;I use a program that has a fixed operations count. CXL memory exposed as cpu-less numa node.&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;# DRAM
Performance counter stats for 'numactl --membind 0 --cpunodebind 0 taskset -c 0-7 gupstoy 64G 0 0':
   187,328,768,167      instructions
       869,840,754      MEM_LOAD_RETIRED.L1_MISS
       844,497,107      MEM_LOAD_RETIRED.L2_MISS
       768,242,564      MEM_LOAD_RETIRED.L3_MISS

# CXL
 Performance counter stats for 'numactl --membind 2 --cpunodebind 0 taskset -c 0-7 gupstoy 64G 0 0':
   188,135,204,015      instructions
       147,051,380      MEM_LOAD_RETIRED.L1_MISS
       107,168,112      MEM_LOAD_RETIRED.L2_MISS
        49,201,455      MEM_LOAD_RETIRED.L3_MISS&lt;/LI-CODE&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;I have also verified these counters are working as expected on persistent memory and remote dram (they gave similar numbers for both the cases).&amp;nbsp;&lt;/DIV&gt;&lt;DIV class=""&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV class=""&gt;Also when i check the other counter `OFFCORE_REQUESTS.L3_MISS_DEMAND_DATA_RD`: this one is increasing how i expected. However `retired` ones still very low.&amp;nbsp;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;LI-CODE lang="markup"&gt;Performance counter stats for 'numactl --membind 0 --cpunodebind 0 taskset -c 0-7 gupstoy 64G 0 0':
   187,331,953,610      instructions
       768,895,926      mem_load_retired.l3_miss
     1,427,617,196      OFFCORE_REQUESTS.L3_MISS_DEMAND_DATA_RD

Performance counter stats for 'numactl --membind 2 --cpunodebind 0 taskset -c 0-7 gupstoy 64G 0 0':
   188,146,385,927      instructions
        49,121,066      mem_load_retired.l3_miss
     1,437,029,250      OFFCORE_REQUESTS.L3_MISS_DEMAND_DATA_RD&lt;/LI-CODE&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;&lt;P&gt;My question is that, are there any known limitations or restrictions for retired mem events (MEM_LOAD_RETIRED.*) on Sapphire Rapids when memory is CXL-attached?&lt;/P&gt;&lt;P&gt;Thanks&lt;/P&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;</description>
    <pubDate>Sun, 31 Aug 2025 10:29:29 GMT</pubDate>
    <dc:creator>ms88</dc:creator>
    <dc:date>2025-08-31T10:29:29Z</dc:date>
    <item>
      <title>Low MEM_LOAD_RETIRED.* counts on CXL memory</title>
      <link>https://community.intel.com/t5/Software-Tuning-Performance/Low-MEM-LOAD-RETIRED-counts-on-CXL-memory/m-p/1714271#M8564</link>
      <description>&lt;P&gt;Hello,&lt;/P&gt;&lt;P&gt;I am running experiments comparing memory monitoring behavior between DRAM and CXL. Underlying hardware is Intel Xeon Platinum 8468 (Sapphire Rapids, family 6, model 143, stepping 8). Memory layout:&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;RANGE                                  SIZE  STATE NODE
0x0000000000000000-0x000000007fffffff    2G online    0
0x0000000100000000-0x000000407fffffff  254G online    0
0x0000004080000000-0x000000807fffffff  256G online    1
0x0000008080000000-0x000000c07fffffff  256G online    2&lt;/LI-CODE&gt;&lt;P&gt;I use a program that has a fixed operations count. CXL memory exposed as cpu-less numa node.&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;# DRAM
Performance counter stats for 'numactl --membind 0 --cpunodebind 0 taskset -c 0-7 gupstoy 64G 0 0':
   187,328,768,167      instructions
       869,840,754      MEM_LOAD_RETIRED.L1_MISS
       844,497,107      MEM_LOAD_RETIRED.L2_MISS
       768,242,564      MEM_LOAD_RETIRED.L3_MISS

# CXL
 Performance counter stats for 'numactl --membind 2 --cpunodebind 0 taskset -c 0-7 gupstoy 64G 0 0':
   188,135,204,015      instructions
       147,051,380      MEM_LOAD_RETIRED.L1_MISS
       107,168,112      MEM_LOAD_RETIRED.L2_MISS
        49,201,455      MEM_LOAD_RETIRED.L3_MISS&lt;/LI-CODE&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;I have also verified these counters are working as expected on persistent memory and remote dram (they gave similar numbers for both the cases).&amp;nbsp;&lt;/DIV&gt;&lt;DIV class=""&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV class=""&gt;Also when i check the other counter `OFFCORE_REQUESTS.L3_MISS_DEMAND_DATA_RD`: this one is increasing how i expected. However `retired` ones still very low.&amp;nbsp;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;LI-CODE lang="markup"&gt;Performance counter stats for 'numactl --membind 0 --cpunodebind 0 taskset -c 0-7 gupstoy 64G 0 0':
   187,331,953,610      instructions
       768,895,926      mem_load_retired.l3_miss
     1,427,617,196      OFFCORE_REQUESTS.L3_MISS_DEMAND_DATA_RD

Performance counter stats for 'numactl --membind 2 --cpunodebind 0 taskset -c 0-7 gupstoy 64G 0 0':
   188,146,385,927      instructions
        49,121,066      mem_load_retired.l3_miss
     1,437,029,250      OFFCORE_REQUESTS.L3_MISS_DEMAND_DATA_RD&lt;/LI-CODE&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;&lt;P&gt;My question is that, are there any known limitations or restrictions for retired mem events (MEM_LOAD_RETIRED.*) on Sapphire Rapids when memory is CXL-attached?&lt;/P&gt;&lt;P&gt;Thanks&lt;/P&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;</description>
      <pubDate>Sun, 31 Aug 2025 10:29:29 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Tuning-Performance/Low-MEM-LOAD-RETIRED-counts-on-CXL-memory/m-p/1714271#M8564</guid>
      <dc:creator>ms88</dc:creator>
      <dc:date>2025-08-31T10:29:29Z</dc:date>
    </item>
  </channel>
</rss>

