<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic How to solve bugs of simultaneously misaligned memory accesses in Intel® Moderncode for Parallel Architectures</title>
    <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/How-to-solve-bugs-of-simultaneously-misaligned-memory-accesses/m-p/799470#M562</link>
    <description>Jim,&lt;BR /&gt;&lt;BR /&gt;We can all write microbenchmarks, but ultimately that won't help the author of this post one bit. What matter is the actual code being run. If the ratio of synchronized memory access to non-synchronized memory accesses is low enough, then locked instructions won't affect scalability that much. If the ratio is high, then tricks may need to be played to avoid the locked instructions to get scalability.&lt;BR /&gt;&lt;BR /&gt;Each programmer has to make the portability/usability vs. performance tradeoff themselves given their particular program. That is the bottom line for the author who initiated this post. Without the results of the author's experiments onthier owncode, there is nothing we can tellthe authorthat will be guaranteed to be the right decision because we cannot evaluate the tradeoff without much more data.&lt;BR /&gt;&lt;BR /&gt;- Grant</description>
    <pubDate>Thu, 17 Jun 2010 15:38:40 GMT</pubDate>
    <dc:creator>Grant_H_Intel</dc:creator>
    <dc:date>2010-06-17T15:38:40Z</dc:date>
    <item>
      <title>How to solve bugs of simultaneously misaligned memory accesses</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/How-to-solve-bugs-of-simultaneously-misaligned-memory-accesses/m-p/799449#M541</link>
      <description>&lt;UL&gt;&lt;LI&gt;First Qeustion&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;Recently, I encountered a rarely happened bug.&lt;/P&gt;&lt;P&gt;&lt;SPAN style="text-decoration: underline;"&gt;&lt;STRONG&gt;**Environment:&lt;/STRONG&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;1. The address of a pointer(called pMemory) is mis-aligned.&lt;/P&gt;&lt;P&gt;2. Two thread simultaneously access pMmeory&lt;/P&gt;&lt;P&gt;3. Our program runs on a server with 8 CPUs &lt;/P&gt;&lt;P&gt;4. Original value of pMemory is 0xFFFF FFFF&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;&lt;SPAN style="text-decoration: underline;"&gt;**Operation Sequence:&lt;/SPAN&gt;&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;1. One thread read the value of pMemory while the other thread modified pMemory.&lt;/P&gt;&lt;P&gt; the read/modify instructions both are MOV&lt;BR /&gt;&lt;BR /&gt;2. The first thread firstly read the lower part of pMemory, that is 0xFFFF&lt;/P&gt;&lt;P&gt;3. The second thread modified pMemory from 0xFFFF FFFF to 0x02de 2c68&lt;/P&gt;&lt;P&gt;4. The first thread secondly read the higher part of pMemoyr, that is 0x02de,&lt;/P&gt;&lt;P&gt;and finally the first thread read the pMemory as 0x02de ffff which is a invalid pointer.&lt;/P&gt;&lt;P&gt;&lt;SPAN style="text-decoration: underline;"&gt;Currently we are discussing the way to solve the problem.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="text-decoration: underline;"&gt;Do you have any suggestion?&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="text-decoration: underline;"&gt;I don't have too much time, so would you please rely as soon as possible.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;EM&gt;BTW, our program is a network program, so the memory is designed to be aligned on one-byte with compiler options such as /Zp1.&lt;BR /&gt;It's impossible for us to change /Zp1 to natural alignment with aspect of risks and workload.&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;/EM&gt;&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;Second question&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;&lt;EM&gt;Intel 64 and IA-32 Architectures&lt;/EM&gt;&lt;/P&gt;&lt;P&gt;&lt;EM&gt;Software Developers Manual&lt;/EM&gt;&lt;/P&gt;&lt;P&gt;&lt;EM&gt;Volume 3A:&lt;/EM&gt;&lt;/P&gt;&lt;P&gt;&lt;EM&gt;System Programming Guide, Part 1&lt;/EM&gt;&lt;/P&gt;&lt;P&gt;&lt;EM&gt;8.1.1 Guaranteed Atomic Operations&lt;/EM&gt;&lt;/P&gt;&lt;P&gt;&lt;EM&gt;The Intel Core 2 Duo, Intel Atom, Intel Core Duo, Pentium M,Pentium 4, Intel Xeon, &lt;/EM&gt;&lt;/P&gt;&lt;P&gt;&lt;EM&gt;and P6 family processors &lt;STRONG&gt;provide bus control signals that permit external memory &lt;/STRONG&gt;&lt;/EM&gt;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;&lt;EM&gt;subsystems to make split accesses atomic; &lt;/EM&gt;&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&lt;EM&gt;however,nonaligned data accesses will seriously impact the performance of the &lt;/EM&gt;&lt;/P&gt;&lt;P&gt;&lt;EM&gt;processor and should be avoided.&lt;/EM&gt;&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;SPAN style="text-decoration: underline;"&gt;Would you please detail the way:&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="text-decoration: underline;"&gt;"provide bus control signals that permit external memory subsystems to make split accesses atomic"&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 15 Jun 2010 02:40:53 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/How-to-solve-bugs-of-simultaneously-misaligned-memory-accesses/m-p/799449#M541</guid>
      <dc:creator>gangti</dc:creator>
      <dc:date>2010-06-15T02:40:53Z</dc:date>
    </item>
    <item>
      <title>How to solve bugs of simultaneously misaligned memory accesses</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/How-to-solve-bugs-of-simultaneously-misaligned-memory-accesses/m-p/799450#M542</link>
      <description>There are 2 ways to solve the problem. First is to make the data aligned. And the second way is to use LOCKed instructions (that's what "provide bus control signals that permit external memory subsystems to make split accesses atomic" about). I think that it's enough just to modify the data with LOCK XCHG instruction, and on reader side you can leave plain MOV. But keep in mind that on QPI-based systems (Core i7), LOCKed instructions on unaligned data can be *very* slow (order of 5000 cycles).&lt;BR /&gt;</description>
      <pubDate>Tue, 15 Jun 2010 15:31:09 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/How-to-solve-bugs-of-simultaneously-misaligned-memory-accesses/m-p/799450#M542</guid>
      <dc:creator>Dmitry_Vyukov</dc:creator>
      <dc:date>2010-06-15T15:31:09Z</dc:date>
    </item>
    <item>
      <title>How to solve bugs of simultaneously misaligned memory accesses</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/How-to-solve-bugs-of-simultaneously-misaligned-memory-accesses/m-p/799451#M543</link>
      <description>In your pointer fetch code, I will guess that you have a test for pointer == 0xFFFFFFFF as an indication that the pointe is not yet set.&lt;BR /&gt;&lt;BR /&gt;You can change this test to test for either the high word or low word being equal to 0xFFFF&lt;BR /&gt;Any valid pointer you insert into this DWORD will likely not have 0xFFFF in either word. Allocations are generally aligned to 8 bytes (or more) so the low pointer should never be 0xFFFF. Also, 0xFFFF in the high address points to the last few pages of virtual memory (OS in Window or potentially -stack addressing in *ux). You should look at what you place into this pointer to assure my assumption.&lt;BR /&gt;&lt;BR /&gt;Aligning the pointer to a DWORD address (32-bit system) or QWORD (64-bit system) as Dmitiy suggestswould assure that writes occur in one operation. (excepting possibly for address of pointer residing in I/O space)&lt;BR /&gt;&lt;BR /&gt;Jim Dempsey&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;</description>
      <pubDate>Wed, 16 Jun 2010 13:12:35 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/How-to-solve-bugs-of-simultaneously-misaligned-memory-accesses/m-p/799451#M543</guid>
      <dc:creator>jimdempseyatthecove</dc:creator>
      <dc:date>2010-06-16T13:12:35Z</dc:date>
    </item>
    <item>
      <title>How to solve bugs of simultaneously misaligned memory accesses</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/How-to-solve-bugs-of-simultaneously-misaligned-memory-accesses/m-p/799452#M544</link>
      <description>&lt;P&gt;I really appreciate your suggestions.&lt;/P&gt;&lt;P&gt;As your saying, I make a solution to take over the problem. &lt;STRONG&gt;Is it right or not?&lt;BR /&gt;&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;;---------------------------------------&lt;/P&gt;&lt;P&gt;**On Writer's side&lt;/P&gt;&lt;P&gt; &lt;SPAN style="text-decoration: underline;"&gt;push ax&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt; &lt;SPAN style="text-decoration: underline;"&gt;mov ax, XXXX&lt;/SPAN&gt; ; XXXX means certain value that I want to write to the unaligned memory&lt;/P&gt;&lt;P&gt; &lt;SPAN style="text-decoration: underline;"&gt;lock xchg YYYY, ax&lt;/SPAN&gt; ; write value in ax to memory, YYYY means the unaligned address of the memory&lt;/P&gt;&lt;P&gt; &lt;SPAN style="text-decoration: underline;"&gt;pop ax&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;**On Reader's side&lt;/P&gt;&lt;P&gt; &lt;SPAN style="text-decoration: underline;"&gt;mov ax, YYYY&lt;BR /&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;;---------------------------------------&lt;BR /&gt;&lt;BR /&gt;Still I have some questions:&lt;BR /&gt;1. "on reader side you can leave plain MOV...". In my case:&lt;BR /&gt;;--------------------------------------- &lt;BR /&gt;a Write's side | Reader's side&lt;BR /&gt;b   | read lower part&lt;BR /&gt;c lock the bus  |&lt;BR /&gt;d write lower part|&lt;BR /&gt;e write higher part |&lt;BR /&gt;f atomaticly unlock the bus  |&lt;BR /&gt;g   | read higher part&lt;BR /&gt;;---------------------------------------&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;What will happen in step c? &lt;BR /&gt;&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;Will step c wait until step g of reader's side finishes?&lt;BR /&gt;&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;Or if step c success immediately, reader's side should go into a bug? I think so.&lt;BR /&gt;&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;2. Though the prefix "LOCK" has not effects on "MOV" instruction, I think that &lt;STRONG&gt;"MOV" instruction need also lock the bus.&lt;BR /&gt;&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;In my point of view, &lt;STRONG&gt;all the instructions that read/write the same unaligned memory should lock the bus, in order to make the instructions atomic.&lt;BR /&gt;&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;Isn't that right or not?&lt;/P&gt;</description>
      <pubDate>Wed, 16 Jun 2010 13:23:19 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/How-to-solve-bugs-of-simultaneously-misaligned-memory-accesses/m-p/799452#M544</guid>
      <dc:creator>gangti</dc:creator>
      <dc:date>2010-06-16T13:23:19Z</dc:date>
    </item>
    <item>
      <title>How to solve bugs of simultaneously misaligned memory accesses</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/How-to-solve-bugs-of-simultaneously-misaligned-memory-accesses/m-p/799453#M545</link>
      <description>&lt;DIV id="tiny_quote"&gt;
                &lt;DIV style="margin-left: 2px; margin-right: 2px;"&gt;Quoting &lt;A rel="/en-us/services/profile/quick_profile.php?is_paid=&amp;amp;user_id=434199" class="basic" href="https://community.intel.com/en-us/profile/434199/"&gt;gangti&lt;/A&gt;&lt;/DIV&gt;
                &lt;DIV style="background-color: #e5e5e5; padding: 5px; border: 1px inset; margin-left: 2px; margin-right: 2px;"&gt;&lt;BR /&gt;2. Though the prefix "LOCK" has not effects on "MOV" instruction, I think that &lt;B&gt;"MOV" instruction need also lock the bus.&lt;BR /&gt;&lt;/B&gt;&lt;I&gt;&lt;BR /&gt;&lt;P&gt;In my point of view, &lt;B&gt;all the instructions that read/write the same unaligned memory should lock the bus, in order to make the instructions atomic.&lt;BR /&gt;&lt;/B&gt;&lt;/P&gt;&lt;P&gt;Isn't that right or not?&lt;/P&gt;&lt;/I&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;BR /&gt;I think you are right, it was premature optimization on my side.&lt;BR /&gt;If correctness is the only concern, then use LOCK XCHG for both reader and writer. It should 100% work.&lt;BR /&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 16 Jun 2010 13:38:12 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/How-to-solve-bugs-of-simultaneously-misaligned-memory-accesses/m-p/799453#M545</guid>
      <dc:creator>Dmitry_Vyukov</dc:creator>
      <dc:date>2010-06-16T13:38:12Z</dc:date>
    </item>
    <item>
      <title>How to solve bugs of simultaneously misaligned memory accesses</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/How-to-solve-bugs-of-simultaneously-misaligned-memory-accesses/m-p/799454#M546</link>
      <description>&lt;P&gt;Dear Jim Dempsey&lt;BR /&gt;Thank you very much, that sounds a good idea. I will take it asa possible solution and try to find a best one.&lt;/P&gt;</description>
      <pubDate>Wed, 16 Jun 2010 15:11:31 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/How-to-solve-bugs-of-simultaneously-misaligned-memory-accesses/m-p/799454#M546</guid>
      <dc:creator>gangti</dc:creator>
      <dc:date>2010-06-16T15:11:31Z</dc:date>
    </item>
    <item>
      <title>How to solve bugs of simultaneously misaligned memory accesses</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/How-to-solve-bugs-of-simultaneously-misaligned-memory-accesses/m-p/799455#M547</link>
      <description>LOCK will affect MOV when stored/retrieved data spans cache line and/or natural boundry.&lt;BR /&gt;&lt;BR /&gt;consider:&lt;BR /&gt;&lt;BR /&gt;&lt;PRE&gt;[bash]; coded for 32-bit pointers where shared pointer may be unaligned
; shared pointer has 0xFFFFFFFF prior to store of new pointer
; address 0xFFxxxxxx is invalid (IOW 
; indicating invalid pointer
; only one producer and one consumer of pointer

mov edx, addressOfSharedPointer
mov eax, newPointer
mov [edx], ax  ; store lsw
rcr ax,16
mov [edx+2], al ; store 3rd byte
mov [edx+3], ah ; store 4th byte (overwrite 0xFF of 0xFFxxxxxx)

-----------------------------------------------------------

; read
mov edx, addressOfSharedPointer
loop:
mov eax, [edx]	; collect all 4 bytes
cmp eax, 0xFF000000
jae loop


===============================
or


; read
mov edx, addressOfSharedPointer
loop:
mov eax, [edx]	; collect all 4 bytes
cmp eax, 0xFF000000
jb toReturn
pause
jmp loop
toReturn:
ret


The LOCK, although functionally correct, is an expensive operation.&lt;BR /&gt;If this pointer manipulation is infrequent, then use the LOCK.&lt;BR /&gt;If the pointer manipulation is heavily used, then experiment with code similar to above.&lt;BR /&gt;You can also modify the write of pointer to test to see if the pointer is DWORD or WORD aligned&lt;BR /&gt;If on odd byte address, use the code first listed above,&lt;BR /&gt;If on DWORD, simply write the data as DWORD,&lt;BR /&gt;If on WORD (only thing left) write as WORDs (or as low WORD followed by DWORD)&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;Jim Dempsey&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;[/bash]&lt;/PRE&gt;</description>
      <pubDate>Wed, 16 Jun 2010 15:14:45 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/How-to-solve-bugs-of-simultaneously-misaligned-memory-accesses/m-p/799455#M547</guid>
      <dc:creator>jimdempseyatthecove</dc:creator>
      <dc:date>2010-06-16T15:14:45Z</dc:date>
    </item>
    <item>
      <title>How to solve bugs of simultaneously misaligned memory accesses</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/How-to-solve-bugs-of-simultaneously-misaligned-memory-accesses/m-p/799456#M548</link>
      <description>&lt;EM&gt;I think you are right, it was premature optimization on my side.&lt;BR /&gt;If correctness is the only concern, then use LOCK XCHG for both reader and writer. It should 100% work.&lt;BR /&gt;&lt;BR /&gt;&lt;/EM&gt;&lt;STRONG&gt;Well, that is just what I want to ask you, HOW TO USE LOCK XCHG FOR READER?&lt;BR /&gt;&lt;/STRONG&gt;&lt;EM&gt;thanks for your help&lt;/EM&gt;</description>
      <pubDate>Wed, 16 Jun 2010 15:17:25 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/How-to-solve-bugs-of-simultaneously-misaligned-memory-accesses/m-p/799456#M548</guid>
      <dc:creator>gangti</dc:creator>
      <dc:date>2010-06-16T15:17:25Z</dc:date>
    </item>
    <item>
      <title>How to solve bugs of simultaneously misaligned memory accesses</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/How-to-solve-bugs-of-simultaneously-misaligned-memory-accesses/m-p/799457#M549</link>
      <description>Dear Jim Dempsey&lt;BR /&gt;Your code will definitely sove the problem, thank you very much. It really helps.</description>
      <pubDate>Wed, 16 Jun 2010 15:29:32 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/How-to-solve-bugs-of-simultaneously-misaligned-memory-accesses/m-p/799457#M549</guid>
      <dc:creator>gangti</dc:creator>
      <dc:date>2010-06-16T15:29:32Z</dc:date>
    </item>
    <item>
      <title>How to solve bugs of simultaneously misaligned memory accesses</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/How-to-solve-bugs-of-simultaneously-misaligned-memory-accesses/m-p/799458#M550</link>
      <description>** error in code sample&lt;BR /&gt;&lt;BR /&gt; rcr ax, 16&lt;BR /&gt;&lt;BR /&gt;should read&lt;BR /&gt;&lt;BR /&gt; ror eax, 16&lt;BR /&gt;&lt;BR /&gt;I assume you caught this typing error.&lt;BR /&gt;&lt;BR /&gt;Jim&lt;BR /&gt;&lt;BR /&gt;</description>
      <pubDate>Wed, 16 Jun 2010 15:36:57 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/How-to-solve-bugs-of-simultaneously-misaligned-memory-accesses/m-p/799458#M550</guid>
      <dc:creator>jimdempseyatthecove</dc:creator>
      <dc:date>2010-06-16T15:36:57Z</dc:date>
    </item>
    <item>
      <title>How to solve bugs of simultaneously misaligned memory accesses</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/How-to-solve-bugs-of-simultaneously-misaligned-memory-accesses/m-p/799459#M551</link>
      <description>Gangti,&lt;BR /&gt;&lt;BR /&gt;For others following this thread, would you be so kind to run a performance test of your application using the LOCK method and the method outlined in my sketch. The readers may find your report useful in determining if they should go to a little extra effort in producing faster code.&lt;BR /&gt;&lt;BR /&gt;Jim Dempsey&lt;BR /&gt;</description>
      <pubDate>Wed, 16 Jun 2010 15:40:53 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/How-to-solve-bugs-of-simultaneously-misaligned-memory-accesses/m-p/799459#M551</guid>
      <dc:creator>jimdempseyatthecove</dc:creator>
      <dc:date>2010-06-16T15:40:53Z</dc:date>
    </item>
    <item>
      <title>How to solve bugs of simultaneously misaligned memory accesses</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/How-to-solve-bugs-of-simultaneously-misaligned-memory-accesses/m-p/799460#M552</link>
      <description>If possible, I will post the performance test result on this thread.&lt;BR /&gt; :-( it's midnight now in China...</description>
      <pubDate>Wed, 16 Jun 2010 15:52:22 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/How-to-solve-bugs-of-simultaneously-misaligned-memory-accesses/m-p/799460#M552</guid>
      <dc:creator>gangti</dc:creator>
      <dc:date>2010-06-16T15:52:22Z</dc:date>
    </item>
    <item>
      <title>How to solve bugs of simultaneously misaligned memory accesses</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/How-to-solve-bugs-of-simultaneously-misaligned-memory-accesses/m-p/799461#M553</link>
      <description>&lt;DIV id="tiny_quote"&gt;
                &lt;DIV style="margin-left: 2px; margin-right: 2px;"&gt;Quoting &lt;A rel="/en-us/services/profile/quick_profile.php?is_paid=&amp;amp;user_id=434199" class="basic" href="https://community.intel.com/en-us/profile/434199/"&gt;gangti&lt;/A&gt;&lt;/DIV&gt;
                &lt;DIV style="background-color: #e5e5e5; padding: 5px; border: 1px inset; margin-left: 2px; margin-right: 2px;"&gt;&lt;I&gt;&lt;I&gt;&lt;/I&gt;&lt;B&gt;Well, that is just what I want to ask you, HOW TO USE LOCK XCHG FOR READER?&lt;BR /&gt;&lt;/B&gt;&lt;I&gt;thanks for your help&lt;/I&gt;&lt;/I&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Ah, sorry. For reader you must use LOCK CMPXCHG. Try to change the variable from 0 to 0. In either case the variable is left physically unchanged, and you the get a current value.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 16 Jun 2010 15:56:13 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/How-to-solve-bugs-of-simultaneously-misaligned-memory-accesses/m-p/799461#M553</guid>
      <dc:creator>Dmitry_Vyukov</dc:creator>
      <dc:date>2010-06-16T15:56:13Z</dc:date>
    </item>
    <item>
      <title>How to solve bugs of simultaneously misaligned memory accesses</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/How-to-solve-bugs-of-simultaneously-misaligned-memory-accesses/m-p/799462#M554</link>
      <description>Btw, can't it be so that the variable is 1- or 3-byte aligned (or perhaps mis-aligned), and not just 2-byte aligned? If so, and if you decide to use plain MOVs (as Jim suggested), you must to handle that cases too.&lt;BR /&gt;</description>
      <pubDate>Wed, 16 Jun 2010 15:58:53 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/How-to-solve-bugs-of-simultaneously-misaligned-memory-accesses/m-p/799462#M554</guid>
      <dc:creator>Dmitry_Vyukov</dc:creator>
      <dc:date>2010-06-16T15:58:53Z</dc:date>
    </item>
    <item>
      <title>How to solve bugs of simultaneously misaligned memory accesses</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/How-to-solve-bugs-of-simultaneously-misaligned-memory-accesses/m-p/799463#M555</link>
      <description>&lt;BR /&gt;Dmitriy Vyukov&lt;BR /&gt;thanks, that really works</description>
      <pubDate>Wed, 16 Jun 2010 16:23:23 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/How-to-solve-bugs-of-simultaneously-misaligned-memory-accesses/m-p/799463#M555</guid>
      <dc:creator>gangti</dc:creator>
      <dc:date>2010-06-16T16:23:23Z</dc:date>
    </item>
    <item>
      <title>How to solve bugs of simultaneously misaligned memory accesses</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/How-to-solve-bugs-of-simultaneously-misaligned-memory-accesses/m-p/799464#M556</link>
      <description>Dmitriy Vyukov&lt;BR /&gt;To my case, the pointer is a member of a big struct one-byte aligned.&lt;BR /&gt;Currently, the address of the pointer is 2-byte aligned.&lt;BR /&gt;Someday when we add some other members before the pointer in the struct, &lt;BR /&gt;then the address of the pointer may be 1-byte aligned.&lt;BR /&gt;&lt;BR /&gt;Well, the current codition may not be the worst.</description>
      <pubDate>Wed, 16 Jun 2010 16:27:49 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/How-to-solve-bugs-of-simultaneously-misaligned-memory-accesses/m-p/799464#M556</guid>
      <dc:creator>gangti</dc:creator>
      <dc:date>2010-06-16T16:27:49Z</dc:date>
    </item>
    <item>
      <title>How to solve bugs of simultaneously misaligned memory accesses</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/How-to-solve-bugs-of-simultaneously-misaligned-memory-accesses/m-p/799465#M557</link>
      <description>&lt;P&gt;Then to be truly portable and not have nightmaresmaintaining your code, I strongly recommend to always use "lock cmpxchg"(for read) with "lock xchg" (for write) instead of using the 0xFFFF or 0xFF tricks. (BTW, I'm not sure the 0xFFFF trick will work with all memory allocation schemesanyway. Especially since members of structures are not naturally aligned in your application.)&lt;BR /&gt;&lt;BR /&gt;- Grant&lt;/P&gt;</description>
      <pubDate>Wed, 16 Jun 2010 19:28:10 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/How-to-solve-bugs-of-simultaneously-misaligned-memory-accesses/m-p/799465#M557</guid>
      <dc:creator>Grant_H_Intel</dc:creator>
      <dc:date>2010-06-16T19:28:10Z</dc:date>
    </item>
    <item>
      <title>How to solve bugs of simultaneously misaligned memory accesses</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/How-to-solve-bugs-of-simultaneously-misaligned-memory-accesses/m-p/799466#M558</link>
      <description>&lt;DIV id="tiny_quote"&gt;
                &lt;DIV style="margin-left: 2px; margin-right: 2px;"&gt;Quoting &lt;A rel="/en-us/services/profile/quick_profile.php?is_paid=&amp;amp;user_id=334689" class="basic" href="https://community.intel.com/en-us/profile/334689/"&gt;Grant Haab (Intel)&lt;/A&gt;&lt;/DIV&gt;
                &lt;DIV style="background-color: #e5e5e5; padding: 5px; border: 1px inset; margin-left: 2px; margin-right: 2px;"&gt;&lt;I&gt;&lt;P&gt;Then to be truly portable and not have nightmaresmaintaining your code, I strongly recommend to always use "lock cmpxchg"(for read) with "lock xchg" (for write) instead of using the 0xFFFF or 0xFF tricks.&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;&lt;/I&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I would recommend to either stay single-threaded and do not bear all the complexities of concurrent software, or at least get some performance benefit from concurrent software. And plunge into concurrent software and then find yourself slower than single-threaded version looks quite strange. Load via LOCK CMPXCHG can be 10000 times slower than MOV.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 16 Jun 2010 19:53:29 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/How-to-solve-bugs-of-simultaneously-misaligned-memory-accesses/m-p/799466#M558</guid>
      <dc:creator>Dmitry_Vyukov</dc:creator>
      <dc:date>2010-06-16T19:53:29Z</dc:date>
    </item>
    <item>
      <title>How to solve bugs of simultaneously misaligned memory accesses</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/How-to-solve-bugs-of-simultaneously-misaligned-memory-accesses/m-p/799467#M559</link>
      <description>Grant,&lt;BR /&gt;&lt;BR /&gt;The problem arises when the cache line splits the DWORD. This may occure at comma in&lt;BR /&gt;&lt;BR /&gt;0xFFFFFF,FF (least significant byte in lower cache aligned address)&lt;BR /&gt;0xFFFF,FFFF (least significant 2 bytes in lower cache aligned address)&lt;BR /&gt;0xFF,FFFFFF (least significant3 bytes in lower cache aligned address)&lt;BR /&gt;0xFFFFFFFF (DWORD not split between cache lines)&lt;BR /&gt;&lt;BR /&gt;The LOCK prefix, depending on processor model cost you 100x to 500x the overhead of a write for naturaly aligned variables (also conatined within 1 cache line).&lt;BR /&gt;&lt;BR /&gt;If performance matters then consider safe alternatives that bypass LOCK.&lt;BR /&gt;&lt;BR /&gt;Note, the triple write:&lt;BR /&gt;&lt;BR /&gt; write word containing 2 lowest bytes&lt;BR /&gt; write byte containing byte 3 ofDWORD&lt;BR /&gt; write byte containing byte 4 of DWORD&lt;BR /&gt;&lt;BR /&gt;May (depending on processor model), I said &lt;EM&gt;may&lt;/EM&gt; occur, due to processor write combining, as a single write to memory when the DWORD is fully contained within the same cache line, and in 2 writes when split across cache lines. Ineither of the two circumstances (split or not split) the high byte will be written in the last write (which may be the only write). Without testing the code, my estimate is 50x to 500x the performance of the LOCK prefixed code.&lt;BR /&gt;&lt;BR /&gt;Jim Dempsey&lt;BR /&gt;&lt;BR /&gt;</description>
      <pubDate>Wed, 16 Jun 2010 22:38:34 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/How-to-solve-bugs-of-simultaneously-misaligned-memory-accesses/m-p/799467#M559</guid>
      <dc:creator>jimdempseyatthecove</dc:creator>
      <dc:date>2010-06-16T22:38:34Z</dc:date>
    </item>
    <item>
      <title>How to solve bugs of simultaneously misaligned memory accesses</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/How-to-solve-bugs-of-simultaneously-misaligned-memory-accesses/m-p/799468#M560</link>
      <description>&lt;DIV id="tiny_quote"&gt;
                &lt;DIV style="margin-left: 2px; margin-right: 2px;"&gt;Quoting &lt;A rel="/en-us/services/profile/quick_profile.php?is_paid=&amp;amp;user_id=99850" class="basic" href="https://community.intel.com/en-us/profile/99850/"&gt;jimdempseyatthecove&lt;/A&gt;&lt;/DIV&gt;
                &lt;DIV style="background-color: #e5e5e5; padding: 5px; border: 1px inset; margin-left: 2px; margin-right: 2px;"&gt;&lt;I&gt;&lt;BR /&gt;The LOCK prefix, depending on processor model cost you 100x to 500x the overhead of a write for naturaly aligned variables (also conatined within 1 cache line).&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;/I&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Jim, the problem is not with writes, they are not scalable in either case.&lt;/P&gt;&lt;P&gt;The problem is with reads. A program can perform 1 read per 0.5 cycles per *thread* if implemented with MOV, or 1 read per 100-1000 cycles per *system* if implemented with CMPXCHG. The worst thing one may do in a concurrent program is to turn perfectly scalable read operation into completely non scalable write operation (rebuke rw mutexes).&lt;/P&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 17 Jun 2010 11:18:00 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/How-to-solve-bugs-of-simultaneously-misaligned-memory-accesses/m-p/799468#M560</guid>
      <dc:creator>Dmitry_Vyukov</dc:creator>
      <dc:date>2010-06-17T11:18:00Z</dc:date>
    </item>
  </channel>
</rss>

