<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic did you allocate memory on in Intel® Moderncode for Parallel Architectures</title>
    <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/performance-loss/m-p/931878#M4970</link>
    <description>&lt;P&gt;did you allocate memory on fast node or slow one?&lt;/P&gt;

&lt;P&gt;--Vladimir&lt;/P&gt;</description>
    <pubDate>Tue, 04 Mar 2014 06:02:51 GMT</pubDate>
    <dc:creator>Vladimir_P_1234567890</dc:creator>
    <dc:date>2014-03-04T06:02:51Z</dc:date>
    <item>
      <title>performance loss</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/performance-loss/m-p/931873#M4965</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;

&lt;P&gt;some interesting performance loss happened with my measurements.&lt;/P&gt;

&lt;P&gt;I have a system with two sockets, each socket is a E5-2680 processor. Each processor has 8 cores and with hyper-threading. The hyper-threading was ignored.&amp;nbsp;&lt;/P&gt;

&lt;P&gt;On this system, I started a program 16 times at the same time and each time pinned the program to different cores. At first, i set all cores to 2.7GHz and saw :&lt;/P&gt;

&lt;P&gt;Program 0 Runtime 7.7s&lt;/P&gt;

&lt;P&gt;Program 8 Runtime 7.63s&lt;/P&gt;

&lt;P&gt;And then, i set&amp;nbsp; cores on the second socket&amp;nbsp; to 1.2GHz and saw:&lt;/P&gt;

&lt;P&gt;Program 0 Runtime 12.18s&lt;/P&gt;

&lt;P&gt;Program 8 Runtime 15.73s&lt;/P&gt;

&lt;P&gt;The program 8 ran slower. It is clear, because core 8 had lower frequency. But why was program 0 also slower? Its frequency wasn't touched.&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;

&lt;P&gt;Regards,&lt;/P&gt;

&lt;P&gt;Bo&lt;/P&gt;</description>
      <pubDate>Fri, 28 Feb 2014 14:18:33 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/performance-loss/m-p/931873#M4965</guid>
      <dc:creator>Bo_W_3</dc:creator>
      <dc:date>2014-02-28T14:18:33Z</dc:date>
    </item>
    <item>
      <title>Did you verify that you</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/performance-loss/m-p/931874#M4966</link>
      <description>&lt;P&gt;Did you verify that you actually can set different clock rates per socket? (measure the rates too)&lt;/P&gt;

&lt;P&gt;Jim Dempsey&lt;/P&gt;</description>
      <pubDate>Mon, 03 Mar 2014 16:07:52 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/performance-loss/m-p/931874#M4966</guid>
      <dc:creator>jimdempseyatthecove</dc:creator>
      <dc:date>2014-03-03T16:07:52Z</dc:date>
    </item>
    <item>
      <title>Yes. I get following output</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/performance-loss/m-p/931875#M4967</link>
      <description>&lt;P&gt;Yes. I get following output with "cat /proc/cpuinfo | grep MHz"&lt;/P&gt;

&lt;P&gt;cpu MHz&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;: 2700.000&lt;BR /&gt;
	cpu MHz&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;: 2700.000&lt;BR /&gt;
	cpu MHz&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;: 2700.000&lt;BR /&gt;
	cpu MHz&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;: 2700.000&lt;BR /&gt;
	cpu MHz&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;: 2700.000&lt;BR /&gt;
	cpu MHz&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;: 2700.000&lt;BR /&gt;
	cpu MHz&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;: 2700.000&lt;BR /&gt;
	cpu MHz&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;: 2700.000&lt;BR /&gt;
	cpu MHz&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;: 1200.000&lt;BR /&gt;
	cpu MHz&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;: 1200.000&lt;BR /&gt;
	cpu MHz&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;: 1200.000&lt;BR /&gt;
	cpu MHz&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;: 1200.000&lt;BR /&gt;
	cpu MHz&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;: 1200.000&lt;BR /&gt;
	cpu MHz&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;: 1200.000&lt;BR /&gt;
	cpu MHz&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;: 1200.000&lt;BR /&gt;
	cpu MHz&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;: 1200.000&lt;BR /&gt;
	cpu MHz&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;: 2700.000&lt;BR /&gt;
	cpu MHz&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;: 2700.000&lt;BR /&gt;
	cpu MHz&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;: 2700.000&lt;BR /&gt;
	cpu MHz&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;: 2700.000&lt;BR /&gt;
	cpu MHz&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;: 2700.000&lt;BR /&gt;
	cpu MHz&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;: 2700.000&lt;BR /&gt;
	cpu MHz&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;: 2700.000&lt;BR /&gt;
	cpu MHz&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;: 2700.000&lt;BR /&gt;
	cpu MHz&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;: 1200.000&lt;BR /&gt;
	cpu MHz&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;: 1200.000&lt;BR /&gt;
	cpu MHz&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;: 1200.000&lt;BR /&gt;
	cpu MHz&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;: 1200.000&lt;BR /&gt;
	cpu MHz&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;: 1200.000&lt;BR /&gt;
	cpu MHz&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;: 1200.000&lt;BR /&gt;
	cpu MHz&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;: 1200.000&lt;BR /&gt;
	cpu MHz&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;: 1200.000&lt;/P&gt;

&lt;P&gt;With numaclt --hardware, i get:&lt;/P&gt;

&lt;P&gt;available: 2 nodes (0-1)&lt;BR /&gt;
	node 0 cpus: 0 1 2 3 4 5 6 7 16 17 18 19 20 21 22 23&lt;BR /&gt;
	node 0 size: 32735 MB&lt;BR /&gt;
	node 0 free: 30458 MB&lt;BR /&gt;
	node 1 cpus: 8 9 10 11 12 13 14 15 24 25 26 27 28 29 30 31&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 03 Mar 2014 16:19:09 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/performance-loss/m-p/931875#M4967</guid>
      <dc:creator>Bo_W_3</dc:creator>
      <dc:date>2014-03-03T16:19:09Z</dc:date>
    </item>
    <item>
      <title>BTW, you can recognize the</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/performance-loss/m-p/931876#M4968</link>
      <description>&lt;P&gt;BTW, you can recognize the new frequency with different runtime of these two measurements.&lt;/P&gt;</description>
      <pubDate>Mon, 03 Mar 2014 16:23:25 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/performance-loss/m-p/931876#M4968</guid>
      <dc:creator>Bo_W_3</dc:creator>
      <dc:date>2014-03-03T16:23:25Z</dc:date>
    </item>
    <item>
      <title>To check whether a new</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/performance-loss/m-p/931877#M4969</link>
      <description>&lt;P&gt;To check whether a new frequency has been set, I have&amp;nbsp; "cat /proc/cpuinfo | grep MHz" and get:&lt;/P&gt;

&lt;P&gt;cpu MHz&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;: 2700.000&lt;BR /&gt;
	cpu MHz&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;: 2700.000&lt;BR /&gt;
	cpu MHz&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;: 2700.000&lt;BR /&gt;
	cpu MHz&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;: 2700.000&lt;BR /&gt;
	cpu MHz&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;: 2700.000&lt;BR /&gt;
	cpu MHz&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;: 2700.000&lt;BR /&gt;
	cpu MHz&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;: 2700.000&lt;BR /&gt;
	cpu MHz&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;: 2700.000&lt;BR /&gt;
	cpu MHz&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;: 1200.000&lt;BR /&gt;
	cpu MHz&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;: 1200.000&lt;BR /&gt;
	cpu MHz&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;: 1200.000&lt;BR /&gt;
	cpu MHz&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;: 1200.000&lt;BR /&gt;
	cpu MHz&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;: 1200.000&lt;BR /&gt;
	cpu MHz&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;: 1200.000&lt;BR /&gt;
	cpu MHz&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;: 1200.000&lt;BR /&gt;
	cpu MHz&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;: 1200.000&lt;BR /&gt;
	cpu MHz&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;: 2700.000&lt;BR /&gt;
	cpu MHz&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;: 2700.000&lt;BR /&gt;
	cpu MHz&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;: 2700.000&lt;BR /&gt;
	cpu MHz&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;: 2700.000&lt;BR /&gt;
	cpu MHz&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;: 2700.000&lt;BR /&gt;
	cpu MHz&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;: 2700.000&lt;BR /&gt;
	cpu MHz&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;: 2700.000&lt;BR /&gt;
	cpu MHz&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;: 2700.000&lt;BR /&gt;
	cpu MHz&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;: 1200.000&lt;BR /&gt;
	cpu MHz&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;: 1200.000&lt;BR /&gt;
	cpu MHz&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;: 1200.000&lt;BR /&gt;
	cpu MHz&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;: 1200.000&lt;BR /&gt;
	cpu MHz&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;: 1200.000&lt;BR /&gt;
	cpu MHz&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;: 1200.000&lt;BR /&gt;
	cpu MHz&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;: 1200.000&lt;BR /&gt;
	cpu MHz&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;: 1200.000&lt;/P&gt;

&lt;P&gt;With "numaclt --hardware", i get:&lt;/P&gt;

&lt;P&gt;available: 2 nodes (0-1)&lt;BR /&gt;
	node 0 cpus: 0 1 2 3 4 5 6 7 16 17 18 19 20 21 22 23&lt;BR /&gt;
	node 0 size: 32735 MB&lt;BR /&gt;
	node 0 free: 30458 MB&lt;BR /&gt;
	node 1 cpus: 8 9 10 11 12 13 14 15 24 25 26 27 28 29 30 31&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 03 Mar 2014 16:25:56 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/performance-loss/m-p/931877#M4969</guid>
      <dc:creator>Bo_W_3</dc:creator>
      <dc:date>2014-03-03T16:25:56Z</dc:date>
    </item>
    <item>
      <title>did you allocate memory on</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/performance-loss/m-p/931878#M4970</link>
      <description>&lt;P&gt;did you allocate memory on fast node or slow one?&lt;/P&gt;

&lt;P&gt;--Vladimir&lt;/P&gt;</description>
      <pubDate>Tue, 04 Mar 2014 06:02:51 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/performance-loss/m-p/931878#M4970</guid>
      <dc:creator>Vladimir_P_1234567890</dc:creator>
      <dc:date>2014-03-04T06:02:51Z</dc:date>
    </item>
    <item>
      <title>Each program has local memory</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/performance-loss/m-p/931879#M4971</link>
      <description>&lt;P&gt;Each program has local memory, i.e. memory is distributed over these two sockets.&lt;/P&gt;</description>
      <pubDate>Tue, 04 Mar 2014 11:04:57 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/performance-loss/m-p/931879#M4971</guid>
      <dc:creator>Bo_W_3</dc:creator>
      <dc:date>2014-03-04T11:04:57Z</dc:date>
    </item>
    <item>
      <title>In your motherboard BIOS you</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/performance-loss/m-p/931880#M4972</link>
      <description>&lt;P&gt;In your motherboard BIOS you can configure the memory in two different ways&lt;/P&gt;

&lt;P&gt;UMA: all banks attached to both sockets are interleaved (sequential addresses are sequentially distributed across all banks)&amp;nbsp;such that &lt;STRONG&gt;on average &lt;/STRONG&gt;memory access everywhere is uniform. Depending on who wrote the BIOS user guide there is often (occasionally) a mistranslation of what interleaved means. Some list this backwards&lt;/P&gt;

&lt;P&gt;NUMA:&amp;nbsp;Each memory attached to each socket has&amp;nbsp;contiguous&amp;nbsp;address blocks. Meaning CPU0 can access the block locally attached faster than the block remotely attached.&lt;/P&gt;

&lt;P&gt;Now then, in your sample program, should your memory system be configured UMA, then slowing down one CPU will slow down both CPU's access to memory. Should your memory system be configured NUMA, then&lt;STRONG&gt; provided that &lt;/STRONG&gt;memory is allocated from the addresses local to the CPU, then each CPU would experience your expected results.&lt;/P&gt;

&lt;P&gt;You will have to read up on how to configure your memory system (UMA or NUMA), as well as read up on the rules to follow to assure your memory allocations, and use, reside with the socket you expect.&lt;/P&gt;

&lt;P&gt;Jim Dempsey&lt;/P&gt;</description>
      <pubDate>Tue, 04 Mar 2014 13:51:14 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/performance-loss/m-p/931880#M4972</guid>
      <dc:creator>jimdempseyatthecove</dc:creator>
      <dc:date>2014-03-04T13:51:14Z</dc:date>
    </item>
    <item>
      <title>right, the simplest way to</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/performance-loss/m-p/931881#M4973</link>
      <description>&lt;P&gt;right, the simplest way to check what's going on is to take vtune amplifier and look at hotspots difference.&lt;/P&gt;

&lt;P&gt;--Vladimir&lt;/P&gt;</description>
      <pubDate>Tue, 04 Mar 2014 14:00:58 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/performance-loss/m-p/931881#M4973</guid>
      <dc:creator>Vladimir_P_1234567890</dc:creator>
      <dc:date>2014-03-04T14:00:58Z</dc:date>
    </item>
  </channel>
</rss>

