<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic As soon as a processor in Intel® Moderncode for Parallel Architectures</title>
    <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Concurrent-stores-seen-in-a-consistent-order/m-p/987189#M5864</link>
    <description>As soon as a processor updates its cache, a copy of that cache line in another processor's L1 or L2 becomes invalid and cannot be accessed (presents a cache miss) until all updates are completed.  If a thread is reading a cache line which is modified by another, false sharing occurs, presenting a likely serious performance issue.  Snoops have to deal only with updates to last level cache on other packages.</description>
    <pubDate>Wed, 09 Jan 2013 15:50:09 GMT</pubDate>
    <dc:creator>TimP</dc:creator>
    <dc:date>2013-01-09T15:50:09Z</dc:date>
    <item>
      <title>Concurrent stores seen in a consistent order</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Concurrent-stores-seen-in-a-consistent-order/m-p/987188#M5863</link>
      <description>&lt;P&gt;The &lt;A href="http://www.intel.com/content/www/us/en/processors/architectures-software-developer-manuals.html"&gt;&lt;EM&gt;Intel Architectures Software Developer's Manual,&lt;/EM&gt;&lt;/A&gt;&amp;nbsp;Aug. 2012, vol. 3A, sect. 8.2.2:&lt;/P&gt;
&lt;BLOCKQUOTE&gt;
&lt;P&gt;Any two stores are seen in a consistent order by processors other than&amp;nbsp;those performing the stores.&lt;/P&gt;
&lt;/BLOCKQUOTE&gt;
&lt;P&gt;But can this be so?&lt;/P&gt;
&lt;P&gt;The reason I ask is this: Consider a dual-core Intel i7 processor with HyperThreading. According to the &lt;EM&gt;Manual's&lt;/EM&gt;&amp;nbsp;vol. 1, Fig. 2-8, the i7's logical processors 0 and 1 share an L1/L2 cache, but its logical processors 2 and 3 share a different L1/L2 cache -- whereas all the logical processors share a single L3 cache. Suppose that logical processors 0 and 2 -- which do not share an L1/L2 cache -- write to the same memory location at about the same time, and that the writes go no deeper than L2 for the moment. Could not logical processors 1 and 3 (which are "processors other than those performing the stores") then see the "two stores in an inconsistent order"?&lt;/P&gt;
&lt;P&gt;To achieve consistency, must not logical processors 0 and 2 issue SFENCE instructions, and logical processors 1 and 3 issue LFENCE instructions? Notwithstanding, the &lt;EM&gt;Manual&lt;/EM&gt; seems to think otherwise, and indeed supports its opinion with a seemingly clear example in sect. 8.2.3.7.&lt;/P&gt;
&lt;P&gt;Does this mean that every cache at every level snoops all writes to every other cache, even across cores, even across packages? &amp;nbsp;If so, would this not imply that every store to a valid local line of cache must lock the &lt;EM&gt;global&lt;/EM&gt; address bus? &amp;nbsp;This does not sound right. &amp;nbsp;Is it right? &amp;nbsp;After all, what is the point of multithreading, what is the point of caching, when ordinary stores must always lock global resources? &amp;nbsp;I am confused.&lt;/P&gt;
&lt;P&gt;(Incidentally, I have indeed checked the &lt;EM&gt;Manual's&lt;/EM&gt;&lt;EM&gt;&amp;nbsp;&lt;/EM&gt;latest errata, which do not seem to address the matter.)&lt;/P&gt;
&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 09 Jan 2013 14:02:26 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Concurrent-stores-seen-in-a-consistent-order/m-p/987188#M5863</guid>
      <dc:creator>T__H__Black</dc:creator>
      <dc:date>2013-01-09T14:02:26Z</dc:date>
    </item>
    <item>
      <title>As soon as a processor</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Concurrent-stores-seen-in-a-consistent-order/m-p/987189#M5864</link>
      <description>As soon as a processor updates its cache, a copy of that cache line in another processor's L1 or L2 becomes invalid and cannot be accessed (presents a cache miss) until all updates are completed.  If a thread is reading a cache line which is modified by another, false sharing occurs, presenting a likely serious performance issue.  Snoops have to deal only with updates to last level cache on other packages.</description>
      <pubDate>Wed, 09 Jan 2013 15:50:09 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Concurrent-stores-seen-in-a-consistent-order/m-p/987189#M5864</guid>
      <dc:creator>TimP</dc:creator>
      <dc:date>2013-01-09T15:50:09Z</dc:date>
    </item>
    <item>
      <title>&gt;&gt;...Does this mean that</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Concurrent-stores-seen-in-a-consistent-order/m-p/987190#M5865</link>
      <description>&amp;gt;&amp;gt;...&lt;STRONG&gt;Does this mean that every cache at every level&lt;/STRONG&gt; snoops all writes to every other cache, even across
&amp;gt;&amp;gt;cores, even across packages?...

In documentation it is clearly stated that in case of &lt;STRONG&gt;Level 2&lt;/STRONG&gt; cache that '...Level 2 cache enables efficient data sharing between &lt;STRONG&gt;two cores&lt;/STRONG&gt; to reduce memory traffic to the system bus....'.

It is impossible for Logical Processors 3 and 4 to have access to cache lines Level L1&amp;amp;L2 of Logical Processors 1 and 2 ( I indexed CPUs from 1 to 8 ) according to the manual.

Note: Please take a look at a page 43 in &lt;STRONG&gt;Intel® 64 and IA-32 Architectures Software Developer’s Manual&lt;/STRONG&gt; Volume 1: Basic Architecture ( Order Number: 253665-044US ) August 2012</description>
      <pubDate>Sun, 13 Jan 2013 23:15:11 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Concurrent-stores-seen-in-a-consistent-order/m-p/987190#M5865</guid>
      <dc:creator>SergeyKostrov</dc:creator>
      <dc:date>2013-01-13T23:15:11Z</dc:date>
    </item>
    <item>
      <title>&gt;&gt;&gt;In documentation it is</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Concurrent-stores-seen-in-a-consistent-order/m-p/987191#M5866</link>
      <description>&lt;P&gt;&amp;gt;&amp;gt;&amp;gt;In documentation it is clearly stated that in case of &lt;STRONG&gt;Level 2&lt;/STRONG&gt; cache that '...Level 2 cache enables efficient data sharing between &lt;STRONG&gt;two cores&lt;/STRONG&gt; to reduce memory traffic to the system bus....'.&lt;/P&gt;
&lt;P&gt;It is impossible for Logical Processors 3 and 4 to have access to cache lines Level L1&amp;amp;L2 of Logical Processors 1 and 2 ( I indexed CPUs from 1 to 8 ) according to the manual&amp;gt;&amp;gt;&amp;gt;&lt;/P&gt;
&lt;P&gt;I think that in case of desktop CPU cache coherency is maintained at the physical core level and do not cross physical cross boundaries.Regarding Xeon CPU cache coherency can be maintained differently need to consult software manual.&lt;/P&gt;</description>
      <pubDate>Mon, 14 Jan 2013 05:43:00 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Concurrent-stores-seen-in-a-consistent-order/m-p/987191#M5866</guid>
      <dc:creator>Bernard</dc:creator>
      <dc:date>2013-01-14T05:43:00Z</dc:date>
    </item>
  </channel>
</rss>

