<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Shared memory on Nehalem  in Software Tuning, Performance Optimization &amp; Platform Monitoring</title>
    <link>https://community.intel.com/t5/Software-Tuning-Performance/Shared-memory-on-Nehalem/m-p/963421#M2597</link>
    <description>&lt;P&gt;Hi,&amp;nbsp;&lt;/P&gt;

&lt;P&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp;Here is an observation I have. Can you help me explain it.&lt;/P&gt;

&lt;P&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp;Setup -1 : A process updates shared memory allocated on the local node(0) and writes to it constantly from a core (3) on package (0) attached to the node. Another process reads it from a core (1) on the same package(0) and attached node(0) constantly. The read cycle I am measuring in clock cycles is around 70.&lt;/P&gt;

&lt;P&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp;Setup-2 : A process running on a core (2) running on package (1) updates shared memory allocated on the remote node (0) and writes to it constantly. Another process reads it from a core (1) on package(0), local to the shared memory node (0). In this case the reader reads it in about 3 cycles (within a statistical error)&lt;/P&gt;

&lt;P&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp;What is the explanation for the reader incurring less penalty in reading this shared memory location when a process running on the remote node is updating it as opposed to a process running on another core on the local package updating it?&lt;/P&gt;

&lt;P&gt;Thanks,&lt;BR /&gt;
	Madhav.&lt;/P&gt;</description>
    <pubDate>Mon, 20 Jan 2014 16:13:08 GMT</pubDate>
    <dc:creator>Madhav_A_</dc:creator>
    <dc:date>2014-01-20T16:13:08Z</dc:date>
    <item>
      <title>Shared memory on Nehalem</title>
      <link>https://community.intel.com/t5/Software-Tuning-Performance/Shared-memory-on-Nehalem/m-p/963421#M2597</link>
      <description>&lt;P&gt;Hi,&amp;nbsp;&lt;/P&gt;

&lt;P&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp;Here is an observation I have. Can you help me explain it.&lt;/P&gt;

&lt;P&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp;Setup -1 : A process updates shared memory allocated on the local node(0) and writes to it constantly from a core (3) on package (0) attached to the node. Another process reads it from a core (1) on the same package(0) and attached node(0) constantly. The read cycle I am measuring in clock cycles is around 70.&lt;/P&gt;

&lt;P&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp;Setup-2 : A process running on a core (2) running on package (1) updates shared memory allocated on the remote node (0) and writes to it constantly. Another process reads it from a core (1) on package(0), local to the shared memory node (0). In this case the reader reads it in about 3 cycles (within a statistical error)&lt;/P&gt;

&lt;P&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp;What is the explanation for the reader incurring less penalty in reading this shared memory location when a process running on the remote node is updating it as opposed to a process running on another core on the local package updating it?&lt;/P&gt;

&lt;P&gt;Thanks,&lt;BR /&gt;
	Madhav.&lt;/P&gt;</description>
      <pubDate>Mon, 20 Jan 2014 16:13:08 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Tuning-Performance/Shared-memory-on-Nehalem/m-p/963421#M2597</guid>
      <dc:creator>Madhav_A_</dc:creator>
      <dc:date>2014-01-20T16:13:08Z</dc:date>
    </item>
    <item>
      <title>Hello Madhav,</title>
      <link>https://community.intel.com/t5/Software-Tuning-Performance/Shared-memory-on-Nehalem/m-p/963422#M2598</link>
      <description>&lt;P&gt;Hello Madhav,&lt;/P&gt;

&lt;P&gt;My first guess is that you have messed up the test. From the stats, it looks like case 1 is actually remote memory access and case 2 is actually local memory access (maybe even same core access).&lt;/P&gt;

&lt;P&gt;Are you using Linux or windows? How are you identifying nodes/cores/cpus? How are pinning to a node/core/cpu? When you write the memory, are you writing a different value each time? Are you testing that the reader is getting the value being written by the writer? if there are no locks, is the reader in case 2 just getting a subset of the write values? Do you know how frequently the update thread and the reader thread is running? Why are you running the tests?&lt;/P&gt;

&lt;P&gt;Just the first 10 minutes of questions...&lt;/P&gt;

&lt;P&gt;pat&lt;/P&gt;</description>
      <pubDate>Fri, 24 Jan 2014 14:31:16 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Tuning-Performance/Shared-memory-on-Nehalem/m-p/963422#M2598</guid>
      <dc:creator>Patrick_F_Intel1</dc:creator>
      <dc:date>2014-01-24T14:31:16Z</dc:date>
    </item>
  </channel>
</rss>

