<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic &amp;gt;&amp;gt;But the real reading (and in Intel® Moderncode for Parallel Architectures</title>
    <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/interlocked-or-not-interlocked/m-p/1011855#M6512</link>
    <description>&lt;P&gt;&amp;gt;&amp;gt;But the real reading (and of course writing)&amp;nbsp;is always done with interlocked operations.&lt;/P&gt;

&lt;P&gt;With Single Producer, Single Consumer queues, not all variables need to be interlocked. For example a ring buffer with a fill index, and empty index and no count need not use interlocked instructions. You may find sfence handy if (when) you want to lower the latency between the fill and the observation of the fill.&lt;/P&gt;

&lt;P&gt;Jim Dempsey&lt;/P&gt;</description>
    <pubDate>Wed, 17 Dec 2014 01:35:30 GMT</pubDate>
    <dc:creator>jimdempseyatthecove</dc:creator>
    <dc:date>2014-12-17T01:35:30Z</dc:date>
    <item>
      <title>interlocked or not interlocked?</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/interlocked-or-not-interlocked/m-p/1011850#M6507</link>
      <description>&lt;P&gt;I'm using an InterlockedCompareExchange to set a variable to my id (something like "while(0 != InterlockedCompareExchange(&amp;amp;var, myId,&amp;nbsp;0))&amp;nbsp;::Sleep(100);"&amp;nbsp;)&lt;/P&gt;

&lt;P&gt;now... no other thread will change this variable until it becomes 0 again... after using it, I could do an "InterlockedExchange(&amp;amp;var, 0);" or simply "var = 0;" ... I'm not sure, but I think, this doesn't change much... which one is the bether solution? which one the faster? ... or is one even wrong? ... I thought, the second one could be the faster one, when I don't expect to see a lot of threads trying to "take" this variable at the same time... is that correct?&lt;/P&gt;</description>
      <pubDate>Thu, 04 Dec 2014 19:16:03 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/interlocked-or-not-interlocked/m-p/1011850#M6507</guid>
      <dc:creator>Rudolf_M_</dc:creator>
      <dc:date>2014-12-04T19:16:03Z</dc:date>
    </item>
    <item>
      <title>The var=0; is safe excepting</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/interlocked-or-not-interlocked/m-p/1011851#M6508</link>
      <description>&lt;P&gt;The var=0; is safe excepting when you subsequently re-reference var. Compiler optimizations may remember you set it to 0 and assume it is still zero. To correct for this behavior either attribute the variable with volatile or make it one of the atomic class variables.&lt;/P&gt;

&lt;P&gt;also consider&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; var = 0; _mm_sfence();&lt;/P&gt;

&lt;P&gt;If you reference var outside of your locked region, consider making it volatile. e.g. seeing if locked before attempting to lock.&lt;/P&gt;

&lt;P&gt;There are atomic class variables that can be used as well.&lt;/P&gt;

&lt;P&gt;Jim Dempsey&lt;/P&gt;</description>
      <pubDate>Fri, 05 Dec 2014 13:32:14 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/interlocked-or-not-interlocked/m-p/1011851#M6508</guid>
      <dc:creator>jimdempseyatthecove</dc:creator>
      <dc:date>2014-12-05T13:32:14Z</dc:date>
    </item>
    <item>
      <title>thank's a lot, this confirms</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/interlocked-or-not-interlocked/m-p/1011852#M6509</link>
      <description>&lt;P&gt;thank's a lot, this confirms what I was expecting... but sometimes I'm&amp;nbsp;getting unsure when comparing the docs plus what others&amp;nbsp;write in forums...&lt;/P&gt;

&lt;P&gt;I'm only reading the variable for a "lazy check" to find out which resource could be free... and in these cases I marked the variable as volatile. But the real reading (and of course writing)&amp;nbsp;is always done with interlocked operations.&lt;/P&gt;</description>
      <pubDate>Sat, 06 Dec 2014 14:39:13 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/interlocked-or-not-interlocked/m-p/1011852#M6509</guid>
      <dc:creator>Rudolf_M_</dc:creator>
      <dc:date>2014-12-06T14:39:13Z</dc:date>
    </item>
    <item>
      <title>Just wanted to add that the</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/interlocked-or-not-interlocked/m-p/1011853#M6510</link>
      <description>&lt;P&gt;Just wanted to add that the sfence should come before the assignment:&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="font-size: 12px; line-height: 14.3999996185303px;"&gt;_mm_sfence();&lt;/SPAN&gt;&lt;SPAN style="font-size: 12px; line-height: 14.3999996185303px;"&gt;&amp;nbsp;var = 0;&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="font-size: 12px; line-height: 14.3999996185303px;"&gt;Unless you are using streaming write functions, I'd say a compiler barrier would suffice:&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="font-size: 12px; line-height: 14.3999996185303px;"&gt;asm("":::"memory");&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="font-size: 12px; line-height: 14.3999996185303px;"&gt;var = 0;&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;The volatile won't prevent reordering, whereas the compiler barrier there will ensure that {var=0} is the last things that gets to be executed.&lt;/P&gt;

&lt;P&gt;Assuming var is properly aligned, writes will be atomic.&lt;/P&gt;</description>
      <pubDate>Fri, 12 Dec 2014 12:53:26 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/interlocked-or-not-interlocked/m-p/1011853#M6510</guid>
      <dc:creator>Fabio_F_1</dc:creator>
      <dc:date>2014-12-12T12:53:26Z</dc:date>
    </item>
    <item>
      <title>You want _mm_sfence();</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/interlocked-or-not-interlocked/m-p/1011854#M6511</link>
      <description>&lt;P&gt;You want _mm_sfence(); &lt;EM&gt;following &lt;/EM&gt;var=0;&lt;/P&gt;

&lt;P&gt;From msdn.microsoft.com:&lt;/P&gt;

&lt;P&gt;&lt;EM&gt;Microsoft Specific&lt;/EM&gt;&lt;/P&gt;

&lt;P&gt;&lt;EM&gt;Guarantees that every preceding store is globally visible before any subsequent store.&lt;/EM&gt;&lt;/P&gt;

&lt;P&gt;&lt;EM&gt;void _mm_sfence(void);&lt;/EM&gt;&lt;/P&gt;

&lt;P&gt;Jim Dempsey&lt;/P&gt;</description>
      <pubDate>Wed, 17 Dec 2014 01:31:17 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/interlocked-or-not-interlocked/m-p/1011854#M6511</guid>
      <dc:creator>jimdempseyatthecove</dc:creator>
      <dc:date>2014-12-17T01:31:17Z</dc:date>
    </item>
    <item>
      <title>&gt;&gt;But the real reading (and</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/interlocked-or-not-interlocked/m-p/1011855#M6512</link>
      <description>&lt;P&gt;&amp;gt;&amp;gt;But the real reading (and of course writing)&amp;nbsp;is always done with interlocked operations.&lt;/P&gt;

&lt;P&gt;With Single Producer, Single Consumer queues, not all variables need to be interlocked. For example a ring buffer with a fill index, and empty index and no count need not use interlocked instructions. You may find sfence handy if (when) you want to lower the latency between the fill and the observation of the fill.&lt;/P&gt;

&lt;P&gt;Jim Dempsey&lt;/P&gt;</description>
      <pubDate>Wed, 17 Dec 2014 01:35:30 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/interlocked-or-not-interlocked/m-p/1011855#M6512</guid>
      <dc:creator>jimdempseyatthecove</dc:creator>
      <dc:date>2014-12-17T01:35:30Z</dc:date>
    </item>
    <item>
      <title>I'd say the opposite is</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/interlocked-or-not-interlocked/m-p/1011856#M6513</link>
      <description>&lt;P&gt;I'd say the opposite is correct / thread-safe. Let me try to explain.&lt;/P&gt;

&lt;P&gt;In the following example we update an object and when complete, we mark "obj.var=0;" in other to signal another thread that we are done and the object can be consumed:&lt;/P&gt;

&lt;PRE class="brush:cpp;"&gt;obj.price = 77;
obj.quantity = 32;
obj.var = 0;&lt;/PRE&gt;

&lt;P&gt;With the above code, the compiler is free to reorder the assignments because it doesn't see a dependency between the statements. For example it could easily generate the assembly code that assigns obj.var before the other data.&lt;/P&gt;

&lt;P&gt;If another thread is just waiting for obj.var to become 0 it might start consuming obj before other fields have been set (e.g. it could read a price that was update but quantity not yet). So the producer thread here is assigning obj.var = 0 too early (because the compiler generated the assembly code out of order).&lt;/P&gt;

&lt;P&gt;We need a barrier to ensure obj.var =0 write is only carried out &lt;EM&gt;after&lt;/EM&gt;&amp;nbsp;other fields, as the last thing. We would &lt;SPAN style="font-size: 12.7272720336914px; line-height: 17.7381820678711px;"&gt;(conservatively)&amp;nbsp;&lt;/SPAN&gt;need:&lt;/P&gt;

&lt;PRE class="brush:cpp;" style="font-size: 12.7272720336914px; line-height: 17.7381820678711px;"&gt;obj.price = 77;
obj.quantity = 32;
_mm_sfence();
obj.var = 0;&lt;/PRE&gt;

&lt;P&gt;Here we are forbidding the compiler to reorder statements before/after the fence: the statements are not "allowed" to cross the fence.&amp;nbsp;&lt;/P&gt;

&lt;P&gt;For correctness the above should suffice.&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="font-size: 1em; line-height: 1.5;"&gt;I'd like to add though that for optimal performance it is possible to do even better.&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;_mm_sfence() generates an assembly instruction SFENCE which takes around ~5 clock cycles. According to Intel manuals, and assuming I interpreted it correctly, this instruction is only needed if you are using streaming write (_mm_stream_ps) or REP-string assembly instructions (e.g. memcpy(), REP MOVSD,etc) to update the state/object. If you are updating your object like in the example above, you'd only be generating simple MOV instructions that do not involve streaming nor REP. In this case SFENCE would be overkill, although the code would still be correct.&lt;/P&gt;

&lt;P&gt;So the most efficient way to do it for this particular example would be to just have a compiler barrier (_ReadWriteBarrier if in MSVC and not into c++11 yet). These do not generate assembly instructions (so its free), but only serve to prohibit the compiler from reordering the statements when generating the code:&lt;/P&gt;

&lt;PRE class="brush:cpp;" style="font-size: 12.7272720336914px; line-height: 17.7381820678711px;"&gt;obj.price = 77;
obj.quantity = 32;
_ReadWriteBarrier(); // gcc: asm("":::"memory");
obj.var = 0;&lt;/PRE&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 19 Dec 2014 14:21:00 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/interlocked-or-not-interlocked/m-p/1011856#M6513</guid>
      <dc:creator>Fabio_F_1</dc:creator>
      <dc:date>2014-12-19T14:21:00Z</dc:date>
    </item>
    <item>
      <title>_mm_sfence() is an intrinsic</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/interlocked-or-not-interlocked/m-p/1011857#M6514</link>
      <description>&lt;P&gt;_mm_sfence() is an intrinsic that performs two functions:&lt;/P&gt;

&lt;P&gt;a) the hardware function of the store fence&lt;BR /&gt;
	b) the compiler function of a memory barrier&lt;/P&gt;

&lt;P&gt;Your test program is seeing is the side effect of b)&lt;/P&gt;

&lt;P&gt;function b) does not directly execute any instructions (e.g. MFENCE, SFENCE, LFENCE), rather it tells the compiler to assure any compiler reordered writes are to be written (code emitted) prior to completion of the pseudo function.&lt;/P&gt;

&lt;P&gt;When you want the observing thread to have the least amount of latency in seeing the memory change, consider:&lt;/P&gt;

&lt;PRE class="brush:cpp;"&gt;obj.price = 77;
obj.quantity = 32;
_ReadWriteBarrier(); // gcc: asm("":::"memory");
obj.var = 0;
_mm_sfence();
&lt;/PRE&gt;

&lt;P&gt;The above assures price and quantity instructions to write to memory are issued prior to var being written, then the _mm_sfence performs the a) and b) functionalities: compiler inserts write of var&amp;nbsp;to memory followed by the SFENCE instruction.&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;

&lt;P&gt;When inter-thread communication latency is not so critical then you might be able to omit the _mm_sfence&lt;/P&gt;

&lt;P&gt;*** However&lt;/P&gt;

&lt;P&gt;Without the _mm_sfence, and assuming obj.var is not volatile, then the compiler is permitted to reorder obj.var= with other instructions and this in turn may cause other issues.&lt;/P&gt;

&lt;P&gt;Jim Dempsey&lt;/P&gt;</description>
      <pubDate>Fri, 26 Dec 2014 21:36:16 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/interlocked-or-not-interlocked/m-p/1011857#M6514</guid>
      <dc:creator>jimdempseyatthecove</dc:creator>
      <dc:date>2014-12-26T21:36:16Z</dc:date>
    </item>
  </channel>
</rss>

