<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Shared memory on Xeon in Intel® Moderncode for Parallel Architectures</title>
    <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Shared-memory-on-Xeon/m-p/963472#M5367</link>
    <description>&lt;P&gt;Hi,&amp;nbsp;&lt;/P&gt;

&lt;P&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp;Here is an observation I have. Can you help me explain it.&lt;/P&gt;

&lt;P&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp;Setup -1 : A process updates shared memory allocated on the local node(0) and writes to it constantly from a core (3) on package (0) attached to the node. Another process reads it from a core (1) on the same package(0) and attached node(0) constantly. The read cycle I am measuring in clock cycles is around 70.&lt;/P&gt;

&lt;P&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp;Setup-2 : A process running on a core (2) running on package (1) updates shared memory allocated on the remote node (0) and writes to it constantly. Another process reads it from a core (1) on package(0), local to the shared memory node (0). In this case the reader reads it in about 3 cycles (within a statistical error)&lt;/P&gt;

&lt;P&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp;What is the explanation for the reader incurring less penalty in reading this shared memory location when a process running on the remote node is updating it as opposed to a process running on another core on the local package updating it?&lt;/P&gt;

&lt;P&gt;Thanks,&lt;BR /&gt;
	Madhav.&lt;/P&gt;</description>
    <pubDate>Mon, 20 Jan 2014 16:14:44 GMT</pubDate>
    <dc:creator>Madhav_A_</dc:creator>
    <dc:date>2014-01-20T16:14:44Z</dc:date>
    <item>
      <title>Shared memory on Xeon</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Shared-memory-on-Xeon/m-p/963472#M5367</link>
      <description>&lt;P&gt;Hi,&amp;nbsp;&lt;/P&gt;

&lt;P&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp;Here is an observation I have. Can you help me explain it.&lt;/P&gt;

&lt;P&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp;Setup -1 : A process updates shared memory allocated on the local node(0) and writes to it constantly from a core (3) on package (0) attached to the node. Another process reads it from a core (1) on the same package(0) and attached node(0) constantly. The read cycle I am measuring in clock cycles is around 70.&lt;/P&gt;

&lt;P&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp;Setup-2 : A process running on a core (2) running on package (1) updates shared memory allocated on the remote node (0) and writes to it constantly. Another process reads it from a core (1) on package(0), local to the shared memory node (0). In this case the reader reads it in about 3 cycles (within a statistical error)&lt;/P&gt;

&lt;P&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp;What is the explanation for the reader incurring less penalty in reading this shared memory location when a process running on the remote node is updating it as opposed to a process running on another core on the local package updating it?&lt;/P&gt;

&lt;P&gt;Thanks,&lt;BR /&gt;
	Madhav.&lt;/P&gt;</description>
      <pubDate>Mon, 20 Jan 2014 16:14:44 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Shared-memory-on-Xeon/m-p/963472#M5367</guid>
      <dc:creator>Madhav_A_</dc:creator>
      <dc:date>2014-01-20T16:14:44Z</dc:date>
    </item>
    <item>
      <title>Can we see your test code?</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Shared-memory-on-Xeon/m-p/963473#M5368</link>
      <description>&lt;P&gt;Can we see your test code?&lt;/P&gt;

&lt;P&gt;If I were to guess Setup 1 is reading from RAM, whereas Setup 2 is reading from L1.&lt;/P&gt;

&lt;P&gt;This seems to be reversed from what you would expect.&lt;/P&gt;

&lt;P&gt;Are you timing reads without regards to memory change?&lt;/P&gt;

&lt;P&gt;If so, Setup 2 would have longer write intervals thus making fewer cache line evictions for the other socket&amp;nbsp;(and reading same value multiple times).&lt;/P&gt;

&lt;P&gt;Jim Dempsey&lt;/P&gt;</description>
      <pubDate>Tue, 21 Jan 2014 22:45:23 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Shared-memory-on-Xeon/m-p/963473#M5368</guid>
      <dc:creator>jimdempseyatthecove</dc:creator>
      <dc:date>2014-01-21T22:45:23Z</dc:date>
    </item>
  </channel>
</rss>

