<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic verifying first-touch memory allocation in Software Tuning, Performance Optimization &amp; Platform Monitoring</title>
    <link>https://community.intel.com/t5/Software-Tuning-Performance/verifying-first-touch-memory-allocation/m-p/921514#M1266</link>
    <description>&lt;P&gt;Is anyone aware of a basic tool for verifying first-touch memory allocation on a NUMA platform such as Xeon EP?&lt;/P&gt;
&lt;P&gt;According to usual expectation, pinning of MPI processes to a single CPU should result in this happening automatically (barring running out of memory, etc.), unless a non-NUMA BIOS option has been selected.&lt;/P&gt;
&lt;P&gt;Likewise, OpenMP where data are initialized by a parallel data access scheme consistent with the way they will be used should result in allocation local to the CPU, rather than on remote memory.&lt;/P&gt;
&lt;P&gt;For this to work, apparently, MPI or OpenMP libraries have to coordinate with the BIOS.&lt;/P&gt;
&lt;P&gt;It seems there might be a way to determine the address ranges which are local to each CPU on a shared memory platform and perform tests to see where each thread is placing its first touch allocation.&lt;/P&gt;
&lt;P&gt;As you might guess, I'm looking for verification of suspected performance problems which seem to indicate threads within MPI ranks pinned to certain CPUs consistently using remote memory.&lt;/P&gt;</description>
    <pubDate>Fri, 14 Jun 2013 17:26:34 GMT</pubDate>
    <dc:creator>TimP</dc:creator>
    <dc:date>2013-06-14T17:26:34Z</dc:date>
    <item>
      <title>verifying first-touch memory allocation</title>
      <link>https://community.intel.com/t5/Software-Tuning-Performance/verifying-first-touch-memory-allocation/m-p/921514#M1266</link>
      <description>&lt;P&gt;Is anyone aware of a basic tool for verifying first-touch memory allocation on a NUMA platform such as Xeon EP?&lt;/P&gt;
&lt;P&gt;According to usual expectation, pinning of MPI processes to a single CPU should result in this happening automatically (barring running out of memory, etc.), unless a non-NUMA BIOS option has been selected.&lt;/P&gt;
&lt;P&gt;Likewise, OpenMP where data are initialized by a parallel data access scheme consistent with the way they will be used should result in allocation local to the CPU, rather than on remote memory.&lt;/P&gt;
&lt;P&gt;For this to work, apparently, MPI or OpenMP libraries have to coordinate with the BIOS.&lt;/P&gt;
&lt;P&gt;It seems there might be a way to determine the address ranges which are local to each CPU on a shared memory platform and perform tests to see where each thread is placing its first touch allocation.&lt;/P&gt;
&lt;P&gt;As you might guess, I'm looking for verification of suspected performance problems which seem to indicate threads within MPI ranks pinned to certain CPUs consistently using remote memory.&lt;/P&gt;</description>
      <pubDate>Fri, 14 Jun 2013 17:26:34 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Tuning-Performance/verifying-first-touch-memory-allocation/m-p/921514#M1266</guid>
      <dc:creator>TimP</dc:creator>
      <dc:date>2013-06-14T17:26:34Z</dc:date>
    </item>
    <item>
      <title>Hey Tim,</title>
      <link>https://community.intel.com/t5/Software-Tuning-Performance/verifying-first-touch-memory-allocation/m-p/921515#M1267</link>
      <description>&lt;P&gt;Hey Tim,&lt;/P&gt;
&lt;P&gt;Have you looked at NumaTop (&lt;A href="https://01.org/numatop"&gt;https://01.org/numatop&lt;/A&gt;) ? Assuming you are using linux...&lt;/P&gt;
&lt;P&gt;It seems like, if you have each thread malloc a big array and repeatedly run through the array, then something like numatop should be able to show the local vs remote stats pretty easily. I've never actually used NumaTop.. I'm just pretty sure the guys who created it know what they are doing.&lt;/P&gt;
&lt;P&gt;Pat&lt;/P&gt;</description>
      <pubDate>Fri, 14 Jun 2013 19:12:18 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Tuning-Performance/verifying-first-touch-memory-allocation/m-p/921515#M1267</guid>
      <dc:creator>Patrick_F_Intel1</dc:creator>
      <dc:date>2013-06-14T19:12:18Z</dc:date>
    </item>
    <item>
      <title>Thanks, that looks like an</title>
      <link>https://community.intel.com/t5/Software-Tuning-Performance/verifying-first-touch-memory-allocation/m-p/921516#M1268</link>
      <description>&lt;P&gt;Thanks, that looks like an interesting option.&amp;nbsp; It requires building a custom kernel with PEBS latency counters, with the step "build kernel as usual" ( it says that verbatim in the man page) looking a bit daunting.&lt;/P&gt;
&lt;P&gt;I was able to build a running kernel (including access to this forum page) according to the numatop instructions as best I understood.&amp;nbsp; However, numatop says "CPU is not supported." Not surprisingly, at a minimum, the Intel(c) Xeon Phi(tm) would need to be rebuilt for that to run. &lt;/P&gt;
&lt;P&gt;GUI tools such as red hat system monitor are still present but show fewer cores than they did under Red Hat (where they didn't see all the cores).&lt;/P&gt;
&lt;P&gt;/proc/cpuinfo still looks OK.&lt;/P&gt;</description>
      <pubDate>Sat, 15 Jun 2013 13:50:00 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Tuning-Performance/verifying-first-touch-memory-allocation/m-p/921516#M1268</guid>
      <dc:creator>TimP</dc:creator>
      <dc:date>2013-06-15T13:50:00Z</dc:date>
    </item>
    <item>
      <title>The developers confirmed that</title>
      <link>https://community.intel.com/t5/Software-Tuning-Performance/verifying-first-touch-memory-allocation/m-p/921517#M1269</link>
      <description>&lt;P&gt;The developers confirmed that it's sufficient to add the CPU model number to the list in order to make numatop accept it.&lt;/P&gt;
&lt;P&gt;FIrefox has particularly bad memory locality, probably no surprise there.&lt;/P&gt;
&lt;P&gt;My application ran around 50% remote memory accesses when running just 1 MPI process (OpenMP threaded across all cores) but shows good locality when running an even number of processes.&amp;nbsp; Must look elsewhere for problems.&lt;/P&gt;</description>
      <pubDate>Mon, 17 Jun 2013 14:05:30 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Tuning-Performance/verifying-first-touch-memory-allocation/m-p/921517#M1269</guid>
      <dc:creator>TimP</dc:creator>
      <dc:date>2013-06-17T14:05:30Z</dc:date>
    </item>
    <item>
      <title>Standard Linux systems track</title>
      <link>https://community.intel.com/t5/Software-Tuning-Performance/verifying-first-touch-memory-allocation/m-p/921518#M1270</link>
      <description>&lt;P&gt;Standard Linux systems track whether they were able to provide pages according to the NUMA policy requested.&lt;/P&gt;
&lt;P&gt;You can dump the stats before and after your run using &lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; cat /sys/devices/system/node/node0/numastat&lt;/P&gt;
&lt;P&gt;The output looks like:&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; numa_hit 672421856&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; numa_miss 632409&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; numa_foreign 185449&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; interleave_hit 269407187&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; local_node 672420899&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; other_node 633366&lt;/P&gt;
&lt;P&gt;I find the naming a bit confusing, and typically have to run test cases using numactl with various processor and memory binding options to remind myself what they mean.&lt;/P&gt;</description>
      <pubDate>Mon, 24 Jun 2013 19:34:36 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Tuning-Performance/verifying-first-touch-memory-allocation/m-p/921518#M1270</guid>
      <dc:creator>McCalpinJohn</dc:creator>
      <dc:date>2013-06-24T19:34:36Z</dc:date>
    </item>
  </channel>
</rss>

