<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Memory reactivity on Intel multicore processors in Intel® Moderncode for Parallel Architectures</title>
    <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Memory-reactivity-on-Intel-multicore-processors/m-p/848146#M1763</link>
    <description>&lt;P&gt;Dmitriy,&lt;/P&gt;
&lt;P&gt;If your intention is to reliably store data into a shared buffer without loss of data then use the InterlockedCompareExchange or an interlocked increment with static index. In your original code you could potentially loose more data than is stored (depending on the nature and phase of the programs).&lt;/P&gt;
&lt;P&gt;However, it appears from your latest commentsthat your interest is in flushing cache.&lt;/P&gt;
&lt;P&gt;If flushing or invalidating cache is your interest then I suggest you determing the size of the cache line, then walk the zone of interest in the cach in increments of the cache line. If you use SSE3 instructions (on todays processors) you can write 8 or 16 bytes at a time. If your system has a 16 byte cache line then writing 16 bytes per iteration to aligned data performs the operation using a write verses a read-modify-write as with you long buffer technique.&lt;/P&gt;
&lt;P&gt;There are additional instructions available to invalidate and/or flushcache lines without or with writes to memory depending on your interests.&lt;/P&gt;
&lt;P&gt;It may help if you provide detailed description of what you want to do as opposed to asking for comments on code, the purpose ofwhich is not disclosed.&lt;/P&gt;
&lt;P&gt;Jim Dempsey&lt;/P&gt;
&lt;P&gt;&lt;/P&gt;</description>
    <pubDate>Thu, 05 Jul 2007 18:48:20 GMT</pubDate>
    <dc:creator>jimdempseyatthecove</dc:creator>
    <dc:date>2007-07-05T18:48:20Z</dc:date>
    <item>
      <title>Memory reactivity on Intel multicore processors</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Memory-reactivity-on-Intel-multicore-processors/m-p/848141#M1758</link>
      <description>Here is the code:&lt;BR /&gt;&lt;BR /&gt;int buffer[1000];&lt;BR /&gt;void write_value(int value)&lt;BR /&gt;{&lt;BR /&gt; int pos = 0;&lt;BR /&gt; while (buffer[pos]) ++pos;&lt;BR /&gt; if (pos == 999) return;&lt;BR /&gt; buffer[pos] = value;&lt;BR /&gt;}&lt;BR /&gt;&lt;BR /&gt;Assume function write_value() is executed by multiple threads on Intel multicore processor.&lt;BR /&gt;&lt;BR /&gt;Question: What would be the rate of value loss depending on (1) thread count, depending on (2) core count, depending on (3) whether L2 cache is shared between cores or not?&lt;BR /&gt;&lt;BR /&gt;Question is about hardware, question is not about possible context switch between buffer cell check and store.&lt;BR /&gt;&lt;BR /&gt;Thank you&lt;BR /&gt;Dmitriy V'jukov&lt;BR /&gt;</description>
      <pubDate>Wed, 04 Jul 2007 10:19:10 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Memory-reactivity-on-Intel-multicore-processors/m-p/848141#M1758</guid>
      <dc:creator>Dmitry_Vyukov</dc:creator>
      <dc:date>2007-07-04T10:19:10Z</dc:date>
    </item>
    <item>
      <title>Re: Memory reactivity on Intel multicore processors</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Memory-reactivity-on-Intel-multicore-processors/m-p/848142#M1759</link>
      <description>Question: What would be the rate of value loss depending on (1) thread
count, depending on (2) core count, depending on (3) whether L2 cache
is shared between cores or not?&lt;BR /&gt;&lt;BR /&gt;And depending on (4) frequency of calling function write_value() by threads? I.e. every thread periodically call function write_value() and then make some other work for X cycles. How value loss frequency depends on X?&lt;BR /&gt;&lt;BR /&gt;It would be great to see tables like this:&lt;BR /&gt;&lt;BR /&gt;core count = 2, shared L2 cache&lt;BR /&gt;X = 0 (threads only call write_value() in tight loop), value loss rate = 10%&lt;BR /&gt;X = 100, value loss rate = 5%&lt;BR /&gt;
X = 1000, value loss rate = 1%&lt;BR /&gt;

&lt;BR /&gt;core count = 4, two L2 caches, every L2 cache shared between 2 cores&lt;BR /&gt;
X = 0, value loss rate = 30%&lt;BR /&gt;
X = 100, value loss rate = 10%&lt;BR /&gt;

X = 1000, value loss rate = 3%&lt;BR /&gt;


&lt;BR /&gt;&lt;BR /&gt;I don't have access to multicore Intel processors, so I can't test this. Anyway you can provide more information on this, and for wider range of processor architectures, and maybe some other details...&lt;BR /&gt;&lt;BR /&gt;Thank you&lt;BR /&gt;
Dmitriy V'jukov&lt;BR /&gt;</description>
      <pubDate>Wed, 04 Jul 2007 14:34:57 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Memory-reactivity-on-Intel-multicore-processors/m-p/848142#M1759</guid>
      <dc:creator>Dmitry_Vyukov</dc:creator>
      <dc:date>2007-07-04T14:34:57Z</dc:date>
    </item>
    <item>
      <title>Re: Memory reactivity on Intel multicore processors</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Memory-reactivity-on-Intel-multicore-processors/m-p/848143#M1760</link>
      <description>&lt;P&gt;Dmitriy,&lt;/P&gt;
&lt;P&gt;You should not write a shared write routine in the manner you described. Try something along the lines of&lt;/P&gt;&lt;PRE&gt;volatile long buffer[1000];&lt;/PRE&gt;&lt;PRE&gt;void init_buffer()&lt;BR /&gt;{&lt;BR /&gt; for(i=0;i&amp;lt;1000;++i) buffer&lt;I&gt;=0;&lt;BR /&gt;}&lt;/I&gt;&lt;/PRE&gt;&lt;PRE&gt;bool write_value(int value)&lt;BR /&gt;{&lt;BR /&gt; if(value == 0) return false; ! Invalid input&lt;BR /&gt; int pos;&lt;BR /&gt; for(pos=0;pos&amp;lt;1000;++pos)&lt;BR /&gt; {&lt;BR /&gt; if(!buffer[pos])&lt;BR /&gt;        {&lt;BR /&gt;     if(InterlockedCompareExchange(&amp;amp;buffer[pos],value,0) == 0)&lt;BR /&gt;                return true; ! insert successful&lt;BR /&gt;        }&lt;BR /&gt; }&lt;BR /&gt;    return false; ! Insert failed&lt;BR /&gt;&lt;BR /&gt;}&lt;BR /&gt;&lt;/PRE&gt;&lt;PRE&gt;Jim Dempsey&lt;/PRE&gt;</description>
      <pubDate>Thu, 05 Jul 2007 16:42:14 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Memory-reactivity-on-Intel-multicore-processors/m-p/848143#M1760</guid>
      <dc:creator>jimdempseyatthecove</dc:creator>
      <dc:date>2007-07-05T16:42:14Z</dc:date>
    </item>
    <item>
      <title>Re: Memory reactivity on Intel multicore processors</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Memory-reactivity-on-Intel-multicore-processors/m-p/848144#M1761</link>
      <description>&lt;BLOCKQUOTE&gt;&lt;DIV&gt;&lt;IMG src="https://community.intel.com/file/6745" /&gt; &lt;STRONG&gt;JimDempseyAtTheCove:&lt;/STRONG&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;P&gt;Dmitriy,&lt;/P&gt;
&lt;P&gt;You should not write a shared write routine in the manner you described.&lt;/P&gt;&lt;/DIV&gt;&lt;/BLOCKQUOTE&gt;&lt;BR /&gt;&lt;BR /&gt;Please, can you explain in more detail, why I should not write a shared write routine in this manner?&lt;BR /&gt;I don't see problems here (except that some values can be lost). Stores are atomic. Other threads eventually will see stores made by other threads.&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;BLOCKQUOTE&gt;&lt;DIV&gt;&lt;IMG src="https://community.intel.com/file/6745" /&gt; &lt;STRONG&gt;JimDempseyAtTheCove:&lt;/STRONG&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;P&gt;Try something along the lines of&lt;/P&gt;
&lt;/DIV&gt;&lt;/BLOCKQUOTE&gt;&lt;BR /&gt;&lt;BR /&gt;Why?&lt;BR /&gt;You propose code that...let me guess...about 100 times slower than my. And loads cache coherency protocol more, so less scalable. Is there earnest reason to use such code?&lt;BR /&gt;&lt;BR /&gt;Dmitriy V'jukov&lt;BR /&gt;</description>
      <pubDate>Thu, 05 Jul 2007 17:35:28 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Memory-reactivity-on-Intel-multicore-processors/m-p/848144#M1761</guid>
      <dc:creator>Dmitry_Vyukov</dc:creator>
      <dc:date>2007-07-05T17:35:28Z</dc:date>
    </item>
    <item>
      <title>Re: Memory reactivity on Intel multicore processors</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Memory-reactivity-on-Intel-multicore-processors/m-p/848145#M1762</link>
      <description>I am trying to understand how much cycles it takes to core to flush it's store buffer.&lt;BR /&gt;&lt;BR /&gt;I think (hope) that with 4 cores and frequency of write_value() calls about 1 call per 1000 cycles, value loss rate will be low enough, about &amp;lt;5%.&lt;BR /&gt;&lt;BR /&gt;Dmitriy V'jukov&lt;BR /&gt;</description>
      <pubDate>Thu, 05 Jul 2007 17:41:44 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Memory-reactivity-on-Intel-multicore-processors/m-p/848145#M1762</guid>
      <dc:creator>Dmitry_Vyukov</dc:creator>
      <dc:date>2007-07-05T17:41:44Z</dc:date>
    </item>
    <item>
      <title>Re: Memory reactivity on Intel multicore processors</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Memory-reactivity-on-Intel-multicore-processors/m-p/848146#M1763</link>
      <description>&lt;P&gt;Dmitriy,&lt;/P&gt;
&lt;P&gt;If your intention is to reliably store data into a shared buffer without loss of data then use the InterlockedCompareExchange or an interlocked increment with static index. In your original code you could potentially loose more data than is stored (depending on the nature and phase of the programs).&lt;/P&gt;
&lt;P&gt;However, it appears from your latest commentsthat your interest is in flushing cache.&lt;/P&gt;
&lt;P&gt;If flushing or invalidating cache is your interest then I suggest you determing the size of the cache line, then walk the zone of interest in the cach in increments of the cache line. If you use SSE3 instructions (on todays processors) you can write 8 or 16 bytes at a time. If your system has a 16 byte cache line then writing 16 bytes per iteration to aligned data performs the operation using a write verses a read-modify-write as with you long buffer technique.&lt;/P&gt;
&lt;P&gt;There are additional instructions available to invalidate and/or flushcache lines without or with writes to memory depending on your interests.&lt;/P&gt;
&lt;P&gt;It may help if you provide detailed description of what you want to do as opposed to asking for comments on code, the purpose ofwhich is not disclosed.&lt;/P&gt;
&lt;P&gt;Jim Dempsey&lt;/P&gt;
&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 05 Jul 2007 18:48:20 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Memory-reactivity-on-Intel-multicore-processors/m-p/848146#M1763</guid>
      <dc:creator>jimdempseyatthecove</dc:creator>
      <dc:date>2007-07-05T18:48:20Z</dc:date>
    </item>
    <item>
      <title>Re: Memory reactivity on Intel multicore processors</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Memory-reactivity-on-Intel-multicore-processors/m-p/848147#M1764</link>
      <description>&lt;BLOCKQUOTE&gt;&lt;DIV&gt;&lt;IMG src="https://community.intel.com/file/6745" /&gt; &lt;STRONG&gt;JimDempseyAtTheCove:&lt;/STRONG&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;P&gt;It may help if you provide detailed description of what you want to do as opposed to asking for comments on code, the purpose ofwhich is not disclosed.&lt;/P&gt;&lt;/DIV&gt;&lt;/BLOCKQUOTE&gt;&lt;BR /&gt;&lt;BR /&gt;Yes, of course.&lt;BR /&gt;Please, see full code and my intention here:&lt;BR /&gt;&lt;A href="http://groups.google.com/group/comp.programming.threads/browse_frm/thread/844d08c4eeb5c5d5" target="_blank"&gt;http://groups.google.com/group/comp.programming.threads/browse_frm/thread/844d08c4eeb5c5d5&lt;/A&gt;&lt;BR /&gt;&lt;BR /&gt;Dmitriy V'jukov&lt;BR /&gt;</description>
      <pubDate>Thu, 05 Jul 2007 19:09:14 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Memory-reactivity-on-Intel-multicore-processors/m-p/848147#M1764</guid>
      <dc:creator>Dmitry_Vyukov</dc:creator>
      <dc:date>2007-07-05T19:09:14Z</dc:date>
    </item>
    <item>
      <title>Re: Memory reactivity on Intel multicore processors</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Memory-reactivity-on-Intel-multicore-processors/m-p/848148#M1765</link>
      <description>&lt;BLOCKQUOTE&gt;&lt;DIV&gt;&lt;IMG src="https://community.intel.com/file/6745" /&gt; &lt;STRONG&gt;randomizer:&lt;/STRONG&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;BR /&gt;Yes, of course.&lt;BR /&gt;Please, see full code and my intention here:&lt;BR /&gt;&lt;A href="http://groups.google.com/group/comp.programming.threads/browse_frm/thread/844d08c4eeb5c5d5" target="_blank"&gt;http://groups.google.com/group/comp.programming.threads/browse_frm/thread/844d08c4eeb5c5d5&lt;/A&gt;&lt;BR /&gt;&lt;/DIV&gt;&lt;/BLOCKQUOTE&gt;&lt;BR /&gt;&lt;BR /&gt;Short conclusion: &lt;BR /&gt;I propose algorithm and implementation for multiple-producer/single-consumer queue. Enqueue and dequeue operations don't issue memory barriers and atomic read-modify-write operation at all. Algorithm based on "hazard" storing that I show in first post. But &lt;FONT face="Times New Roman"&gt;algorithm also includes some additional tricky logic to cope with lost nodes, so no lost nodes in result.&lt;BR /&gt;&lt;BR /&gt;The main question is:&lt;BR /&gt;&lt;/FONT&gt;&lt;FONT class="fixed_width" face="Times New Roman"&gt;What will be the frequency of node loss in shared buffer under different workloads and on different platforms?&lt;BR /&gt;&lt;BR /&gt;&lt;/FONT&gt;&lt;FONT class="fixed_width" face="Times New Roman"&gt;The good news is:&lt;BR /&gt;Frequency up to 50% or even up to 66% is OK. Because queue algorithm can successfully restore all other nodes. Producer have to only episodically successfully put nodes to shared buffer.&lt;/FONT&gt;&lt;FONT face="Times New Roman"&gt;&lt;BR /&gt;&lt;BR /&gt;Dmitriy V'jukov&lt;/FONT&gt;&lt;BR /&gt;</description>
      <pubDate>Thu, 05 Jul 2007 19:20:03 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Memory-reactivity-on-Intel-multicore-processors/m-p/848148#M1765</guid>
      <dc:creator>Dmitry_Vyukov</dc:creator>
      <dc:date>2007-07-05T19:20:03Z</dc:date>
    </item>
    <item>
      <title>Re: Memory reactivity on Intel multicore processors</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Memory-reactivity-on-Intel-multicore-processors/m-p/848149#M1766</link>
      <description>&lt;BLOCKQUOTE&gt;&lt;DIV&gt;&lt;IMG src="https://community.intel.com/file/6745" /&gt; &lt;STRONG&gt;JimDempseyAtTheCove:&lt;/STRONG&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;P&gt;If your intention is to reliably store data into a shared buffer without loss of data... &lt;BR /&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;/DIV&gt;&lt;/BLOCKQUOTE&gt;&lt;P&gt;&lt;/P&gt;No. I it would be enough to only episodically successfully store data to shared buffer.&lt;BR /&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;BLOCKQUOTE&gt;&lt;DIV&gt;&lt;IMG src="https://community.intel.com/file/6745" /&gt; &lt;STRONG&gt;JimDempseyAtTheCove:&lt;/STRONG&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;then use the InterlockedCompareExchange or an interlocked increment with static index.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;/DIV&gt;&lt;/BLOCKQUOTE&gt;&lt;P&gt;&lt;/P&gt;This is too... boring :)&lt;BR /&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;BLOCKQUOTE&gt;&lt;DIV&gt;&lt;IMG src="https://community.intel.com/file/6745" /&gt; &lt;STRONG&gt;JimDempseyAtTheCove:&lt;/STRONG&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;P&gt;&lt;/P&gt;In your original code you could potentially loose more data than is stored (depending on the nature and phase of the programs).&lt;BR /&gt;&lt;P&gt;&lt;/P&gt;&lt;/DIV&gt;&lt;/BLOCKQUOTE&gt;&lt;P&gt;&lt;/P&gt;
If I loose two times more that I store, it is OK.&lt;BR /&gt;If more... the question exactly about this - can I loose more?&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;P&gt;&lt;/P&gt;&lt;BLOCKQUOTE&gt;&lt;DIV&gt;&lt;IMG src="https://community.intel.com/file/6745" /&gt; &lt;STRONG&gt;JimDempseyAtTheCove:&lt;/STRONG&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;P&gt;&lt;/P&gt;However, it appears from your latest commentsthat your interest is in flushing cache.
&lt;P&gt;&lt;/P&gt;&lt;/DIV&gt;&lt;/BLOCKQUOTE&gt;&lt;P&gt;&lt;/P&gt;No. But as I understand data loss frequency related to store buffer flushing speed.&lt;BR /&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;BR /&gt;&lt;P&gt;&lt;/P&gt;&lt;BLOCKQUOTE&gt;&lt;DIV&gt;&lt;IMG src="https://community.intel.com/file/6745" /&gt; &lt;STRONG&gt;JimDempseyAtTheCove:&lt;/STRONG&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;P&gt;&lt;/P&gt;Jim Dempsey
&lt;P&gt;&lt;/P&gt;&lt;/DIV&gt;&lt;/BLOCKQUOTE&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;
&lt;P&gt;Thank you for interest and for answers.&lt;/P&gt;Dmitriy V'jukov&lt;BR /&gt;</description>
      <pubDate>Thu, 05 Jul 2007 19:27:32 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Memory-reactivity-on-Intel-multicore-processors/m-p/848149#M1766</guid>
      <dc:creator>Dmitry_Vyukov</dc:creator>
      <dc:date>2007-07-05T19:27:32Z</dc:date>
    </item>
    <item>
      <title>Re: Memory reactivity on Intel multicore processors</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Memory-reactivity-on-Intel-multicore-processors/m-p/848150#M1767</link>
      <description>&lt;P&gt;Dmitriy,&lt;/P&gt;
&lt;P&gt;So your intention is to provide for a Queue with error recovery due to potential adverse interaction (competition) for an emptycell in the queue. &lt;/P&gt;
&lt;P&gt;Observe your code with state numbers&lt;/P&gt;&lt;PRE&gt;1) while (buffer[pos]) ++pos;&lt;BR /&gt;2) if (pos == 999) return;&lt;BR /&gt;3) buffer[pos] = value;&lt;BR /&gt;&lt;/PRE&gt;
&lt;P&gt;I have not downloaded your code (not permitted to) so I cannot comment on your implementation specifically. However, I can comment in general based on your implementation of the enqueue routine.&lt;/P&gt;
&lt;P&gt;Presumably you will insert something into the queue and then test immediately or shortly thereafter to see if that item remains in the queue. If the item does not remain in the queue you would assume that contention for the cell cause the same cell to be used by multiple threads. That is to say multiple threads issue statement 1) above and observe the same empty cell, then multiple threads issue statements 2 and 3 to update the same cell in the buffer with the last thread to "simultaneously" update being the "winner". Then presumably all threads examine the queue for their value and if not found (winner finds its entry in buffer) re-queue the value.&lt;/P&gt;
&lt;P&gt;The problem with this is one of the competing threads that identify [pos] in buffer as being available could stall for a considerable time between states 1) and 3). This could be due to the O/S running something else in lieu of the competing thread. This stall time could be on the order of several milliseconds or even much longer (several seconds). Therefore the stall time will at times exceed the dwell time that you assume is safe for reexamination of buffer for sucessful insertion.&lt;/P&gt;
&lt;P&gt;Also, in the code I suggested there is a similar problem. Assume a thread scans for available cell at statement 1). Then, before it inserts data the thread stalls for a long time (O/S context switch). During the stall time for that thread the cell is used by a different thread and then during the same stall time the de-queue process removes the entry and removes all entries and resets the queue to an empty state (all cells 0 now). Now then the stalled thread successfully inserts its value into the middle of the empty queue.&lt;/P&gt;
&lt;P&gt;The problem now is the errant entry has empty cells before it. The entry is not lost, but the dequeue process will not fetch the entry for processing until the prior cells get reused (the errant cell will be skipped during the next fill pass).&lt;/P&gt;
&lt;P&gt;Jim Dempsey&lt;/P&gt;</description>
      <pubDate>Fri, 06 Jul 2007 16:39:26 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Memory-reactivity-on-Intel-multicore-processors/m-p/848150#M1767</guid>
      <dc:creator>jimdempseyatthecove</dc:creator>
      <dc:date>2007-07-06T16:39:26Z</dc:date>
    </item>
    <item>
      <title>Re: Memory reactivity on Intel multicore processors</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Memory-reactivity-on-Intel-multicore-processors/m-p/848151#M1768</link>
      <description>&lt;P&gt;Dmitriy,&lt;/P&gt;
&lt;P&gt;If you wish to avoid Interlocked instructions then might I suggest an alternative technique that each thread has a private queue for enque to a resourceand that the server thread for the resource dequeues from each of the private queues. The enque process never has contention (except for queue full condition). And the dequeue is only delayed marginaly to find that queues are empty. This eliminates lost entries and the time spent for verifying lost entries at the expense at peeking into potentially empty queues.&lt;/P&gt;
&lt;P&gt;On a 4 core system (4 processing theads)each shared resource would have 4 ring buffers (queues).&lt;/P&gt;
&lt;P&gt;Jim Dempsey&lt;/P&gt;
&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Fri, 06 Jul 2007 17:01:27 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Memory-reactivity-on-Intel-multicore-processors/m-p/848151#M1768</guid>
      <dc:creator>jimdempseyatthecove</dc:creator>
      <dc:date>2007-07-06T17:01:27Z</dc:date>
    </item>
    <item>
      <title>Re: Memory reactivity on Intel multicore processors</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Memory-reactivity-on-Intel-multicore-processors/m-p/848152#M1769</link>
      <description>&lt;BLOCKQUOTE&gt;&lt;DIV&gt;&lt;IMG src="https://community.intel.com/file/6745" /&gt; &lt;STRONG&gt;JimDempseyAtTheCove:&lt;/STRONG&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;P&gt;Dmitriy,&lt;/P&gt;I have not downloaded your code (not permitted to) so I cannot comment on your implementation specifically. However, I can comment in general based on your implementation of the enqueue routine.&lt;BR /&gt;&lt;/DIV&gt;&lt;/BLOCKQUOTE&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;FONT face="Times New Roman" size="3"&gt;I post code to this forum. Please see topic named "&lt;/FONT&gt;&lt;FONT face="Times New Roman" size="3"&gt;&lt;SPAN id="_ctl0__ctl1_bcr__ctl0___ForumName"&gt;MPSC FIFO Queue w/o atomic_rmw/membars".&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;/SPAN&gt;&lt;/FONT&gt;
&lt;P&gt;&lt;/P&gt;&lt;BLOCKQUOTE&gt;&lt;DIV&gt;&lt;IMG src="https://community.intel.com/file/6745" /&gt; &lt;STRONG&gt;JimDempseyAtTheCove:&lt;/STRONG&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Presumably you will insert something into the queue and then test immediately or shortly thereafter to see if that item remains in the queue...&lt;/P&gt;&lt;/DIV&gt;&lt;/BLOCKQUOTE&gt;&lt;BR /&gt;&lt;BR /&gt;No. All this is not the case for my algorithm.&lt;BR /&gt;&lt;BR /&gt;Dmitriy V'jukov&lt;BR /&gt;</description>
      <pubDate>Sun, 08 Jul 2007 18:48:25 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Memory-reactivity-on-Intel-multicore-processors/m-p/848152#M1769</guid>
      <dc:creator>Dmitry_Vyukov</dc:creator>
      <dc:date>2007-07-08T18:48:25Z</dc:date>
    </item>
    <item>
      <title>Re: Memory reactivity on Intel multicore processors</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Memory-reactivity-on-Intel-multicore-processors/m-p/848153#M1770</link>
      <description>&lt;BLOCKQUOTE&gt;&lt;DIV&gt;&lt;IMG src="https://community.intel.com/file/6745" /&gt; &lt;STRONG&gt;JimDempseyAtTheCove:&lt;/STRONG&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;P&gt;If you wish to avoid Interlocked instructions then might I suggest an alternative technique that each thread has a private queue for enque to a resourceand that the server thread for the resource dequeues from each of the private queues. The enque process never has contention (except for queue full condition). And the dequeue is only delayed marginaly to find that queues are empty. This eliminates lost entries and the time spent for verifying lost entries at the expense at peeking into potentially empty queues.&lt;/P&gt;
&lt;P&gt;On a 4 core system (4 processing theads)each shared resource would have 4 ring buffers (queues).&lt;/P&gt;&lt;/DIV&gt;&lt;/BLOCKQUOTE&gt;&lt;BR /&gt;&lt;BR /&gt;Yes, this is rational suggestion.&lt;BR /&gt;If there are few threads, then - yes. It will be very good solution. &lt;BR /&gt; But I am targeted and thinking about manycore machines with, for &lt;BR /&gt; example, 100 cores (Intel promise 80 cores in 5 years). And you can &lt;BR /&gt; have, for example, 2 threads per core. So 200 threads. Every consumer &lt;BR /&gt; must pull 200 spsc queues... And to block when all queues are empty, &lt;BR /&gt; consumer must check all 200 queues 2 times... &lt;BR /&gt;And to determine total node count in N spsc queues, consumer have to &lt;BR /&gt; check all N queues too. Node count needed for load-balancing, &lt;BR /&gt; statistics, feedback etc... &lt;BR /&gt; I am thinking about solution when consumer have, for example, N/10 &lt;BR /&gt; mpsc queues instead of N spsc queues (N - number of threads). So and &lt;BR /&gt; number of queues would be moderate and contention would be moderate &lt;BR /&gt; too. &lt;BR /&gt;&lt;BR /&gt;Dmitriy V'jukov&lt;BR /&gt;</description>
      <pubDate>Sun, 08 Jul 2007 18:59:34 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Memory-reactivity-on-Intel-multicore-processors/m-p/848153#M1770</guid>
      <dc:creator>Dmitry_Vyukov</dc:creator>
      <dc:date>2007-07-08T18:59:34Z</dc:date>
    </item>
  </channel>
</rss>

