<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Hi Jim in Software Archive</title>
    <link>https://community.intel.com/t5/Software-Archive/Flat-Mode-Memory-Allocation/m-p/1125208#M77613</link>
    <description>&lt;P&gt;Hi Jim&lt;/P&gt;

&lt;P&gt;&lt;/P&gt;&lt;BLOCKQUOTE&gt;jimdempseyatthecove wrote:&lt;BR /&gt;&lt;P&gt;&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="font-size: 1em;"&gt;Run numctl -H to confirm your assumption about which node (0 or 1) has the MCDRAM (use size to disambiguate).&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="font-size: 1em;"&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;/BLOCKQUOTE&gt;&lt;P&gt;&lt;/P&gt;

&lt;P&gt;I do run "numactl -H" in parallel terminal to see which memory is being consumed and all I see is that MCDRAM getting consumed as below:&lt;/P&gt;

&lt;DIV&gt;
	&lt;P&gt;node 0 size: 98178 MB&lt;BR /&gt;
		&lt;SPAN style="font-size: 1em;"&gt;node 0 free: 94201 MB&lt;/SPAN&gt;&lt;BR /&gt;
		&lt;SPAN style="font-size: 1em;"&gt;node 1 cpus:&lt;/SPAN&gt;&lt;BR /&gt;
		&lt;SPAN style="font-size: 1em;"&gt;node 1 size: 16384 MB&amp;nbsp;&lt;/SPAN&gt;&lt;BR /&gt;
		&lt;SPAN style="font-size: 1em;"&gt;node 1 free: 12 MB&lt;/SPAN&gt;&lt;/P&gt;

	&lt;P&gt;&lt;SPAN style="font-size: 13.008px;"&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;BLOCKQUOTE&gt;jimdempseyatthecove wrote:&lt;BR /&gt;&lt;P&gt;&lt;/P&gt;

	&lt;P style="font-size: 13.008px;"&gt;Also, I think there is a BIOS setting relating to where the OS is placed. This may affect C/C++ malloc/new as well.&lt;BR /&gt;
		&lt;SPAN style="font-size: 13.008px;"&gt;Have you tried using&lt;/SPAN&gt;&lt;BR /&gt;
		&lt;SPAN style="font-size: 13.008px;"&gt;void* numa_alloc_onnode(size_t size, int node);&lt;/SPAN&gt;&lt;BR /&gt;
		&lt;SPAN style="font-size: 13.008px;"&gt;This will remove any questions as to what malloc/new (ALLOCATE) is doing.&lt;/SPAN&gt;&lt;/P&gt;

	&lt;P style="font-size: 13.008px;"&gt;&lt;/P&gt;&lt;/BLOCKQUOTE&gt;&lt;P&gt;&lt;/P&gt;

	&lt;P style="font-size: 13.008px;"&gt;For Flat and Memory&amp;nbsp;&lt;SPAN style="font-size: 12px;"&gt;"Hot-Pluggable" is set and the OS is supposed to use MCDRAM of user programs and DDR4 for Kernel.&lt;/SPAN&gt;&lt;/P&gt;

	&lt;P style="font-size: 13.008px;"&gt;&lt;SPAN style="font-size: 12px;"&gt;Thanks.&lt;/SPAN&gt;&lt;/P&gt;
&lt;/DIV&gt;</description>
    <pubDate>Fri, 15 Sep 2017 12:39:55 GMT</pubDate>
    <dc:creator>CPati2</dc:creator>
    <dc:date>2017-09-15T12:39:55Z</dc:date>
    <item>
      <title>Flat Mode - Memory Allocation</title>
      <link>https://community.intel.com/t5/Software-Archive/Flat-Mode-Memory-Allocation/m-p/1125205#M77610</link>
      <description>&lt;P&gt;Hi All,&lt;/P&gt;

&lt;P&gt;The Xeon Phi 7210 I using has following setup:&amp;nbsp;&lt;/P&gt;

&lt;BLOCKQUOTE&gt;
	&lt;P&gt;Cluster mode: Quadrant or All-2-All&lt;BR /&gt;
		Memory mode: Flat&lt;/P&gt;
&lt;/BLOCKQUOTE&gt;

&lt;P&gt;With this configuration, I will have two nodes: node 0 (CPU + DDR4) and node 1 (MCDRAM). I want application to allocate memory on DDR4 and not on MCDRAM. For this I use numactl command as follows:&lt;/P&gt;

&lt;BLOCKQUOTE&gt;
	&lt;P&gt;numactl -m 0 ./sbench&lt;/P&gt;
&lt;/BLOCKQUOTE&gt;

&lt;P&gt;What I am observing as per memory being consumed, is that allocation is happening on MCDRAM (node 1) instead of DDR4. Can anyone please suggest why this is the case?&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="font-size: 13.008px;"&gt;P.S.: In chapter 23 of "&lt;/SPAN&gt;&lt;SPAN style="font-size: 13.008px;"&gt;Intel Xeon Phi Processor High Performance Programming: Knights Landing Edition" book I see authors have also performed similar experiment with very small data footprint of 24 MiB (I am using benchmark that needs 8GB) and they show speed up based on whether allocation is occurring at DDR4 vs MCDRAM. However, based on my analysis the allocation always occurs at MCDRAM when in flat mode irrespective of the binding.&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;Thanks.&lt;/P&gt;</description>
      <pubDate>Fri, 15 Sep 2017 01:52:43 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Archive/Flat-Mode-Memory-Allocation/m-p/1125205#M77610</guid>
      <dc:creator>CPati2</dc:creator>
      <dc:date>2017-09-15T01:52:43Z</dc:date>
    </item>
    <item>
      <title>Chetan,</title>
      <link>https://community.intel.com/t5/Software-Archive/Flat-Mode-Memory-Allocation/m-p/1125206#M77611</link>
      <description>&lt;P&gt;Chetan,&lt;/P&gt;

&lt;P&gt;Run numctl -H to confirm your assumption about which node (0 or 1) has the MCDRAM (use size to disambiguate).&lt;/P&gt;

&lt;P&gt;Jim Dempsey&lt;/P&gt;</description>
      <pubDate>Fri, 15 Sep 2017 12:17:09 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Archive/Flat-Mode-Memory-Allocation/m-p/1125206#M77611</guid>
      <dc:creator>jimdempseyatthecove</dc:creator>
      <dc:date>2017-09-15T12:17:09Z</dc:date>
    </item>
    <item>
      <title>Also, I think there is a BIOS</title>
      <link>https://community.intel.com/t5/Software-Archive/Flat-Mode-Memory-Allocation/m-p/1125207#M77612</link>
      <description>&lt;P&gt;Also, I think there is a BIOS setting relating to where the OS is placed. This may affect C/C++ malloc/new as well.&lt;/P&gt;

&lt;P&gt;Have you tried using&lt;/P&gt;

&lt;P&gt;void* numa_alloc_onnode(size_t size, int node);&lt;/P&gt;

&lt;P&gt;This will remove any questions as to what malloc/new (ALLOCATE) is doing.&lt;/P&gt;

&lt;P&gt;Jim Dempsey&lt;/P&gt;</description>
      <pubDate>Fri, 15 Sep 2017 12:28:44 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Archive/Flat-Mode-Memory-Allocation/m-p/1125207#M77612</guid>
      <dc:creator>jimdempseyatthecove</dc:creator>
      <dc:date>2017-09-15T12:28:44Z</dc:date>
    </item>
    <item>
      <title>Hi Jim</title>
      <link>https://community.intel.com/t5/Software-Archive/Flat-Mode-Memory-Allocation/m-p/1125208#M77613</link>
      <description>&lt;P&gt;Hi Jim&lt;/P&gt;

&lt;P&gt;&lt;/P&gt;&lt;BLOCKQUOTE&gt;jimdempseyatthecove wrote:&lt;BR /&gt;&lt;P&gt;&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="font-size: 1em;"&gt;Run numctl -H to confirm your assumption about which node (0 or 1) has the MCDRAM (use size to disambiguate).&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="font-size: 1em;"&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;/BLOCKQUOTE&gt;&lt;P&gt;&lt;/P&gt;

&lt;P&gt;I do run "numactl -H" in parallel terminal to see which memory is being consumed and all I see is that MCDRAM getting consumed as below:&lt;/P&gt;

&lt;DIV&gt;
	&lt;P&gt;node 0 size: 98178 MB&lt;BR /&gt;
		&lt;SPAN style="font-size: 1em;"&gt;node 0 free: 94201 MB&lt;/SPAN&gt;&lt;BR /&gt;
		&lt;SPAN style="font-size: 1em;"&gt;node 1 cpus:&lt;/SPAN&gt;&lt;BR /&gt;
		&lt;SPAN style="font-size: 1em;"&gt;node 1 size: 16384 MB&amp;nbsp;&lt;/SPAN&gt;&lt;BR /&gt;
		&lt;SPAN style="font-size: 1em;"&gt;node 1 free: 12 MB&lt;/SPAN&gt;&lt;/P&gt;

	&lt;P&gt;&lt;SPAN style="font-size: 13.008px;"&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;BLOCKQUOTE&gt;jimdempseyatthecove wrote:&lt;BR /&gt;&lt;P&gt;&lt;/P&gt;

	&lt;P style="font-size: 13.008px;"&gt;Also, I think there is a BIOS setting relating to where the OS is placed. This may affect C/C++ malloc/new as well.&lt;BR /&gt;
		&lt;SPAN style="font-size: 13.008px;"&gt;Have you tried using&lt;/SPAN&gt;&lt;BR /&gt;
		&lt;SPAN style="font-size: 13.008px;"&gt;void* numa_alloc_onnode(size_t size, int node);&lt;/SPAN&gt;&lt;BR /&gt;
		&lt;SPAN style="font-size: 13.008px;"&gt;This will remove any questions as to what malloc/new (ALLOCATE) is doing.&lt;/SPAN&gt;&lt;/P&gt;

	&lt;P style="font-size: 13.008px;"&gt;&lt;/P&gt;&lt;/BLOCKQUOTE&gt;&lt;P&gt;&lt;/P&gt;

	&lt;P style="font-size: 13.008px;"&gt;For Flat and Memory&amp;nbsp;&lt;SPAN style="font-size: 12px;"&gt;"Hot-Pluggable" is set and the OS is supposed to use MCDRAM of user programs and DDR4 for Kernel.&lt;/SPAN&gt;&lt;/P&gt;

	&lt;P style="font-size: 13.008px;"&gt;&lt;SPAN style="font-size: 12px;"&gt;Thanks.&lt;/SPAN&gt;&lt;/P&gt;
&lt;/DIV&gt;</description>
      <pubDate>Fri, 15 Sep 2017 12:39:55 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Archive/Flat-Mode-Memory-Allocation/m-p/1125208#M77613</guid>
      <dc:creator>CPati2</dc:creator>
      <dc:date>2017-09-15T12:39:55Z</dc:date>
    </item>
    <item>
      <title>&gt;&gt;"Hot-Pluggable" is set and</title>
      <link>https://community.intel.com/t5/Software-Archive/Flat-Mode-Memory-Allocation/m-p/1125209#M77614</link>
      <description>&lt;P&gt;&amp;gt;&amp;gt;"Hot-Pluggable" is set and the OS is &lt;EM&gt;&lt;STRONG&gt;supposed &lt;/STRONG&gt;&lt;/EM&gt;to use MCDRAM of user programs and DDR4 for Kernel&lt;/P&gt;

&lt;P&gt;Apparently your testing shows it did not. Have you modified your test program to use numa_alloc_onnode&lt;/P&gt;

&lt;P&gt;Note, you can overload C++ new for specific types to use a different memory allocator than the default allocator.&lt;/P&gt;

&lt;PRE class="brush:cpp;"&gt;
struct Enable_Use_qt_malloc
{
&amp;nbsp;bool&amp;nbsp;b;
&amp;nbsp;Enable_Use_qt_malloc()
&amp;nbsp;{
&amp;nbsp;&amp;nbsp;if(qt::tlsThreadContext)
&amp;nbsp;&amp;nbsp;&amp;nbsp;b = qt::tlsThreadContext-&amp;gt;set_Use_qt_malloc(true);
&amp;nbsp;&amp;nbsp;else
&amp;nbsp;&amp;nbsp;&amp;nbsp;b = false;
&amp;nbsp;}
&amp;nbsp;~Enable_Use_qt_malloc()
&amp;nbsp;{
&amp;nbsp;&amp;nbsp;if(qt::tlsThreadContext)
&amp;nbsp;&amp;nbsp;&amp;nbsp;qt::tlsThreadContext-&amp;gt;set_Use_qt_malloc(b);
&amp;nbsp;}
};

// main threads state (lives for duration of app)
Enable_Use_qt_malloc&amp;nbsp;Enable_Use_qt_malloc_now;

void * operator new( size_t cb )
{
&amp;nbsp;if(cb == 0) cb = 1;&amp;nbsp;// allocation of at least 1 byte
&amp;nbsp;if(qt::tlsThreadContext &amp;amp;&amp;amp; qt::tlsThreadContext-&amp;gt;Use_qt_malloc)
&amp;nbsp;&amp;nbsp;return qt::qt_malloc(cb);
&amp;nbsp;return malloc(cb);
}
void * operator new[]( size_t cb )
{
&amp;nbsp;if(cb == 0) cb = 1;&amp;nbsp;// allocation of at least 1 byte
&amp;nbsp;if(qt::tlsThreadContext &amp;amp;&amp;amp; qt::tlsThreadContext-&amp;gt;Use_qt_malloc)
&amp;nbsp;&amp;nbsp;return qt::qt_malloc(cb);
&amp;nbsp;return malloc(cb);
}

void __CRTDECL operator delete(void *p) _THROW0()
{
&amp;nbsp;if(!p) return;
&amp;nbsp;if(qt::tlsThreadContext &amp;amp;&amp;amp; qt::tlsThreadContext-&amp;gt;Use_qt_malloc)
&amp;nbsp;&amp;nbsp;return qt::qt_free(p);
&amp;nbsp;free(p);
}

void __CRTDECL operator delete [] (void *p) _THROW0()
{
&amp;nbsp;if(!p) return;
&amp;nbsp;if(qt::tlsThreadContext &amp;amp;&amp;amp; qt::tlsThreadContext-&amp;gt;Use_qt_malloc)
&amp;nbsp;&amp;nbsp;return qt::qt_free(p);
&amp;nbsp;free(p);
}
&lt;/PRE&gt;

&lt;P&gt;The above is from my QuickThread parallel programming library. Feel free to rename the repurpose the code.&lt;/P&gt;

&lt;P&gt;*** You would modify this to replace qt_malloc/qt_free with numa_alloc_onnode, etc...&lt;BR /&gt;
	*** and extend Enable_Use_qt_malloc (renamed to Enable_Use_numa_malloc)&lt;BR /&gt;
	*** to not only enable/disable the NUMA allocator, but also to specify (per thread) a preferred/required node&lt;/P&gt;

&lt;P&gt;The purpose of the Enable_Use_qt_malloc (you rename this) is to switch between the default C/C++ allocator and your NUMA allocator&lt;/P&gt;

&lt;P&gt;Jim Dempsey&lt;/P&gt;</description>
      <pubDate>Fri, 15 Sep 2017 15:30:10 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Archive/Flat-Mode-Memory-Allocation/m-p/1125209#M77614</guid>
      <dc:creator>jimdempseyatthecove</dc:creator>
      <dc:date>2017-09-15T15:30:10Z</dc:date>
    </item>
    <item>
      <title>Quote:jimdempseyatthecove</title>
      <link>https://community.intel.com/t5/Software-Archive/Flat-Mode-Memory-Allocation/m-p/1125210#M77615</link>
      <description>&lt;P&gt;Hi Jim,&lt;/P&gt;

&lt;P&gt;&lt;/P&gt;&lt;BLOCKQUOTE&gt;jimdempseyatthecove wrote:&lt;BR /&gt;&lt;P&gt;&lt;/P&gt;

&lt;P&gt;&amp;gt;&amp;gt;"Hot-Pluggable" is set and the OS is &lt;EM&gt;&lt;STRONG&gt;supposed &lt;/STRONG&gt;&lt;/EM&gt;to use MCDRAM of user programs and DDR4 for Kernel&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="font-size: 1em;"&gt;*** You would modify this to replace qt_malloc/qt_free with numa_alloc_onnode, etc...&lt;/SPAN&gt;&lt;BR /&gt;
	&lt;SPAN style="font-size: 1em;"&gt;*** and extend Enable_Use_qt_malloc (renamed to Enable_Use_numa_malloc)&lt;/SPAN&gt;&lt;BR /&gt;
	&lt;SPAN style="font-size: 1em;"&gt;*** to not only enable/disable the NUMA allocator, but also to specify (per thread) a preferred/required node&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;The purpose of the Enable_Use_qt_malloc (you rename this) is to switch between the default C/C++ allocator and your NUMA allocato&lt;/P&gt;

&lt;P&gt;&lt;/P&gt;&lt;/BLOCKQUOTE&gt;&lt;P&gt;&lt;/P&gt;

&lt;P&gt;Sorry, I am bit lost here.&lt;/P&gt;

&lt;P&gt;When "Hot-Pluggable" is ON, then application (user program) goes to MCDRAM, which is correct as per the setting. I am not using any test program, I am using "numactl" to allocate application either to DDR4 or MCDRAM.&amp;nbsp;&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="font-size: 1em;"&gt;Also, as of now my goal is not to write a code (or change a code) that will query the device and then allocate to either DDR4 or MCDRAM.&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN style="font-size: 1em;"&gt;I simply want MKL benchmarks to allocate based on what node I give numactl.&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN style="font-size: 1em;"&gt;If Intel and other resources emphasis so much on directly using numactl to allocate or pin the benchmark to specific memory, then I should be able to validate this approach.&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;What you suggested above is useful when a code is written from scratch and that too when it uses "memkind" library. Please correct me if I am wrong.&lt;/P&gt;

&lt;P&gt;Thanks.&lt;/P&gt;</description>
      <pubDate>Fri, 15 Sep 2017 15:52:00 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Archive/Flat-Mode-Memory-Allocation/m-p/1125210#M77615</guid>
      <dc:creator>CPati2</dc:creator>
      <dc:date>2017-09-15T15:52:00Z</dc:date>
    </item>
    <item>
      <title>Hi All,</title>
      <link>https://community.intel.com/t5/Software-Archive/Flat-Mode-Memory-Allocation/m-p/1125211#M77616</link>
      <description>&lt;P&gt;Hi All,&lt;/P&gt;

&lt;P&gt;Following worked for me:&lt;/P&gt;

&lt;P&gt;export MEMKIND_HBW_NODES=0&lt;BR /&gt;
	numactl -m 0 ./sbench&lt;/P&gt;

&lt;P&gt;After digging a lot I found that I need to use "&lt;SPAN style="font-size: 13.008px;"&gt;MEMKIND_HBW_NODES" flag to overwrite memkind.&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="font-size: 13.008px;"&gt;I didn't knew that benchmarks are using memkind. I am still a bit confused on this.&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN style="font-size: 13.008px;"&gt;Since,&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN style="font-size: 13.008px;"&gt;I am using DeepBench and &lt;/SPAN&gt;&lt;SPAN style="font-size: 1em;"&gt;Intel® Optimized LINPACK Benchmark for Linux, even if I don't use numactl,&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN style="font-size: 13.008px;"&gt;MEMKIND_HBW_NODES will be able to divert the application to either MCDRAM or DDR4.&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="font-size: 13.008px;"&gt;Thanks.&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Fri, 15 Sep 2017 17:38:00 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Archive/Flat-Mode-Memory-Allocation/m-p/1125211#M77616</guid>
      <dc:creator>CPati2</dc:creator>
      <dc:date>2017-09-15T17:38:00Z</dc:date>
    </item>
  </channel>
</rss>

