<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: mfence and/or lock in multi-core systems in Intel® Moderncode for Parallel Architectures</title>
    <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/mfence-and-or-lock-in-multi-core-systems/m-p/891902#M3810</link>
    <description>&lt;DIV style="margin:0px;"&gt;
&lt;DIV id="quote_reply" style="width: 100%; margin-top: 5px;"&gt;
&lt;DIV style="margin-left:2px;margin-right:2px;"&gt;Quoting - &lt;A href="https://community.intel.com/en-us/profile/415783"&gt;shiningram&lt;/A&gt;&lt;/DIV&gt;
&lt;DIV style="background-color:#E5E5E5; padding:5px;border: 1px; border-style: inset;margin-left:2px;margin-right:2px;"&gt;&lt;EM&gt;2. If I use mfence then do I need to use the lock ?&lt;BR /&gt;3. What should be correct use of mfence and/or lock under muti-core CPU systems in the above example?&lt;BR /&gt;&lt;/EM&gt;&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;BR /&gt;LOCKed instruction includes previous and subsequent full memory fences. So just remove all MFENCEs from the code.&lt;BR /&gt;&lt;BR /&gt;</description>
    <pubDate>Sat, 28 Feb 2009 08:57:34 GMT</pubDate>
    <dc:creator>Dmitry_Vyukov</dc:creator>
    <dc:date>2009-02-28T08:57:34Z</dc:date>
    <item>
      <title>mfence and/or lock in multi-core systems</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/mfence-and-or-lock-in-multi-core-systems/m-p/891900#M3808</link>
      <description>Hi,&lt;BR /&gt;I went thru the information about mfence &lt;BR /&gt;&lt;BR /&gt;"Performs a serializing operation on all load and store instructions that were issued prior the MFENCE instruction. This serializing operation guarantees that every load and store instruction that precedes the MFENCE instruction is globally visible before any load or store instruction that follows the MFENCE instruction. The MFENCE instruction is ordered with respect to all load and store instructions, other MFENCE instructions, any SFENCE and LFENCE instructions, and any serializing instructions (such as the CPUID instruction)."&lt;BR /&gt;&lt;BR /&gt;But few things are not clear to me.I am adding a sample code to understand thepractical use ofmfence. &lt;BR /&gt;&lt;BR /&gt;1. How many serializing operationson all load and store instructions it performsprior the MFENCE instruction?&lt;BR /&gt;2. If I use mfence then do I need to use the lock ?&lt;BR /&gt;&lt;BR /&gt;Here is example which increments 64 bit counter in 32 bit system.&lt;BR /&gt;&lt;BR /&gt;mov ecx, edx&lt;BR /&gt;mov ebx, eax&lt;BR /&gt;add ebx, 1&lt;BR /&gt;adc ecx, 0&lt;BR /&gt;mfence&lt;BR /&gt;lock cmpxchg8b [edi]&lt;BR /&gt;mfence&lt;BR /&gt;jnz again&lt;BR /&gt;mov eax, dummy&lt;BR /&gt;mov [eax], ebx&lt;BR /&gt;mov [eax+4], ecx&lt;BR /&gt;pop ebx&lt;BR /&gt;pop edi&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;I went thru this post &lt;BR /&gt;&lt;A href="http://software.intel.com/en-us/forums/showthread.php?t=56040" target="_blank"&gt;http://software.intel.com/en-us/forums/showthread.php?t=56040&lt;/A&gt;&lt;BR /&gt;&lt;BR /&gt;3. What should be correct use of mfence and/or lock under muti-core CPU systems in the above example?&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;Thanks,&lt;BR /&gt;Ram Regar</description>
      <pubDate>Sat, 28 Feb 2009 00:12:45 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/mfence-and-or-lock-in-multi-core-systems/m-p/891900#M3808</guid>
      <dc:creator>shiningram</dc:creator>
      <dc:date>2009-02-28T00:12:45Z</dc:date>
    </item>
    <item>
      <title>Re: mfence and/or lock in multi-core systems</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/mfence-and-or-lock-in-multi-core-systems/m-p/891901#M3809</link>
      <description>&lt;DIV style="margin:0px;"&gt;
&lt;DIV id="quote_reply" style="width: 100%; margin-top: 5px;"&gt;
&lt;DIV style="margin-left:2px;margin-right:2px;"&gt;Quoting - &lt;A href="https://community.intel.com/en-us/profile/415783"&gt;shiningram&lt;/A&gt;&lt;/DIV&gt;
&lt;DIV style="background-color:#E5E5E5; padding:5px;border: 1px; border-style: inset;margin-left:2px;margin-right:2px;"&gt;&lt;EM&gt;1. How many serializing operationson all load and store instructions it performsprior the MFENCE instruction?&lt;BR /&gt;&lt;/EM&gt;&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;BR /&gt;MFENCE serializes ALL previous memory accesses with ALL subsequent memory accesses.&lt;BR /&gt;&lt;BR /&gt;</description>
      <pubDate>Sat, 28 Feb 2009 08:26:02 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/mfence-and-or-lock-in-multi-core-systems/m-p/891901#M3809</guid>
      <dc:creator>Dmitry_Vyukov</dc:creator>
      <dc:date>2009-02-28T08:26:02Z</dc:date>
    </item>
    <item>
      <title>Re: mfence and/or lock in multi-core systems</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/mfence-and-or-lock-in-multi-core-systems/m-p/891902#M3810</link>
      <description>&lt;DIV style="margin:0px;"&gt;
&lt;DIV id="quote_reply" style="width: 100%; margin-top: 5px;"&gt;
&lt;DIV style="margin-left:2px;margin-right:2px;"&gt;Quoting - &lt;A href="https://community.intel.com/en-us/profile/415783"&gt;shiningram&lt;/A&gt;&lt;/DIV&gt;
&lt;DIV style="background-color:#E5E5E5; padding:5px;border: 1px; border-style: inset;margin-left:2px;margin-right:2px;"&gt;&lt;EM&gt;2. If I use mfence then do I need to use the lock ?&lt;BR /&gt;3. What should be correct use of mfence and/or lock under muti-core CPU systems in the above example?&lt;BR /&gt;&lt;/EM&gt;&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;BR /&gt;LOCKed instruction includes previous and subsequent full memory fences. So just remove all MFENCEs from the code.&lt;BR /&gt;&lt;BR /&gt;</description>
      <pubDate>Sat, 28 Feb 2009 08:57:34 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/mfence-and-or-lock-in-multi-core-systems/m-p/891902#M3810</guid>
      <dc:creator>Dmitry_Vyukov</dc:creator>
      <dc:date>2009-02-28T08:57:34Z</dc:date>
    </item>
    <item>
      <title>Re: mfence and/or lock in multi-core systems</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/mfence-and-or-lock-in-multi-core-systems/m-p/891903#M3811</link>
      <description>You've probably already seen this on c.p.t but I will post it here too for completeness.&lt;BR /&gt;LOCK may not synchronize non-temporal stores and WC-memory, this is architecture dependent:&lt;BR /&gt;&lt;BR /&gt;&amp;gt; ------------------------------&lt;BR /&gt;&amp;gt; For the P6 family processors, locked operations serialize all&lt;BR /&gt;&amp;gt; outstanding load and store operations (that is, wait for them to&lt;BR /&gt;&amp;gt; complete). This rule is also true for the Pentium 4 and Intel Xeon&lt;BR /&gt;&amp;gt; processors, with one exception. Load operations that reference weakly&lt;BR /&gt;&amp;gt; ordered memory types (such as the WC memory type) may not be&lt;BR /&gt;&amp;gt; serialized.&lt;BR /&gt;&amp;gt; ------------------------------&lt;BR /&gt;&lt;BR /&gt;In order to synchronize non-temporal stores and WC-memory you have to issue SFENCE (not MFENCE) before LOCKed instruction.&lt;BR /&gt;&lt;BR /&gt;</description>
      <pubDate>Sun, 01 Mar 2009 08:06:26 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/mfence-and-or-lock-in-multi-core-systems/m-p/891903#M3811</guid>
      <dc:creator>Dmitry_Vyukov</dc:creator>
      <dc:date>2009-03-01T08:06:26Z</dc:date>
    </item>
    <item>
      <title>Re: mfence and/or lock in multi-core systems</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/mfence-and-or-lock-in-multi-core-systems/m-p/891904#M3812</link>
      <description>&lt;DIV style="margin:0px;"&gt;
&lt;DIV id="quote_reply" style="margin-top: 5px; width: 100%;"&gt;
&lt;DIV style="margin-left:2px;margin-right:2px;"&gt;Quoting - &lt;A href="https://community.intel.com/en-us/profile/347331"&gt;Dmitriy Vyukov&lt;/A&gt;&lt;/DIV&gt;
&lt;DIV style="background-color:#E5E5E5; padding:5px;border: 1px; border-style: inset;margin-left:2px;margin-right:2px;"&gt;&lt;EM&gt;&lt;BR /&gt;MFENCE serializes ALL previous memory accesses with ALL subsequent memory accesses.&lt;BR /&gt;&lt;BR /&gt;&lt;/EM&gt;&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;BR /&gt;1. What is the scope of serialization? &lt;BR /&gt;{&lt;BR /&gt;load1&lt;BR /&gt;load2&lt;BR /&gt;store1&lt;BR /&gt;load3&lt;BR /&gt;store2&lt;BR /&gt;&lt;BR /&gt;mfence&lt;BR /&gt;&lt;BR /&gt;store3&lt;BR /&gt;load4&lt;BR /&gt;store4&lt;BR /&gt;load5&lt;BR /&gt;&lt;BR /&gt;}&lt;BR /&gt;&lt;BR /&gt;In the above case load and store serializationwill bescoped byparenthesis? Can you please explain the meaning of ALL here? &lt;BR /&gt;The processor1 executing mfence will issue signal to all processors to finish all store/loads operations and wait before processor1 can start the executing cmpxchg8b atomically. Its like getting lock and releasing lock. Trying to understand how mfence works.&lt;BR /&gt;&lt;BR /&gt;I learnt that cmpxchg8b implicitly has "lock" in it. Does that mean lock, mfence is not required at all when using cmpxchg8b ?&lt;BR /&gt;&lt;BR /&gt;Thanks,&lt;BR /&gt;Ram Regar</description>
      <pubDate>Tue, 03 Mar 2009 02:08:54 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/mfence-and-or-lock-in-multi-core-systems/m-p/891904#M3812</guid>
      <dc:creator>shiningram</dc:creator>
      <dc:date>2009-03-03T02:08:54Z</dc:date>
    </item>
    <item>
      <title>Re: mfence and/or lock in multi-core systems</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/mfence-and-or-lock-in-multi-core-systems/m-p/891905#M3813</link>
      <description>&lt;DIV style="margin:0px;"&gt;
&lt;DIV id="quote_reply" style="width: 100%; margin-top: 5px;"&gt;
&lt;DIV style="margin-left:2px;margin-right:2px;"&gt;Quoting - &lt;A href="https://community.intel.com/en-us/profile/415783"&gt;shiningram&lt;/A&gt;&lt;/DIV&gt;
&lt;DIV style="background-color:#E5E5E5; padding:5px;border: 1px; border-style: inset;margin-left:2px;margin-right:2px;"&gt;&lt;EM&gt;
&lt;DIV style="margin:0px;"&gt;&lt;/DIV&gt;
In the above case load and store serializationwill bescoped byparenthesis? Can you please explain the meaning of ALL here? &lt;BR /&gt;The processor1 executing mfence will issue signal to all processors to finish all store/loads operations and wait before processor1 can start the executing cmpxchg8b atomically. Its like getting lock and releasing lock. Trying to understand how mfence works.&lt;BR /&gt;&lt;BR /&gt;I learnt that cmpxchg8b implicitly has "lock" in it. Does that mean lock, mfence is not required at all when using cmpxchg8b ?&lt;BR /&gt;&lt;BR /&gt;&lt;/EM&gt;&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;BR /&gt;&lt;BR /&gt;All preceding in program order memory accesses are serialized with all subsequent in program order memory accesses.&lt;BR /&gt;Modern processors lock only target cache-line, so if it is already cached in the core in M status, then NO global inter-core/processor interaction occurs. And I believe MFENCE is always local, i.e. NO global inter-core/processor interaction occurs. Global inter-core/processor ordering is handled by cache-coherence protocol.&lt;BR /&gt;XCHG has implicit LOCK, CMPXCHG has not.&lt;BR /&gt;When you are using LOCK CMPXCHG, MFENCE is not required (if someone uses non-temporal stores, then it's better to assume that it's HIS responsibility to serialize them).&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;</description>
      <pubDate>Tue, 03 Mar 2009 06:30:15 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/mfence-and-or-lock-in-multi-core-systems/m-p/891905#M3813</guid>
      <dc:creator>Dmitry_Vyukov</dc:creator>
      <dc:date>2009-03-03T06:30:15Z</dc:date>
    </item>
  </channel>
</rss>

