<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Remarks in Intel® Moderncode for Parallel Architectures</title>
    <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/MPSC-FIFO-Queue-w-o-atomic-rmw-membars/m-p/916263#M4830</link>
    <description>I guess the first question: how to make this queue unbounded?&lt;BR /&gt;&lt;BR /&gt;To make this queue unbounded you can use nested list as underlying data structure. Nests must be large enough. And to manage nests lifetime you can use any of PDR techniques: RCU/SMR/ROP/vZOOM/ref-counting. Something like this:&lt;BR /&gt;&lt;BR /&gt;struct nest_t&lt;BR /&gt;{&lt;BR /&gt; node_t* buffer[1023];&lt;BR /&gt; nest_t* next;&lt;BR /&gt;};&lt;BR /&gt;&lt;BR /&gt;And of course you can use this queue as bounded with cyclic bounded buffer. Since consumer is only one, he can freely zeroize cells after himself.&lt;BR /&gt;&lt;BR /&gt;Dmitriy V'jukov&lt;BR /&gt;</description>
    <pubDate>Sun, 08 Jul 2007 18:28:11 GMT</pubDate>
    <dc:creator>Dmitry_Vyukov</dc:creator>
    <dc:date>2007-07-08T18:28:11Z</dc:date>
    <item>
      <title>MPSC FIFO Queue w/o atomic_rmw/membars</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/MPSC-FIFO-Queue-w-o-atomic-rmw-membars/m-p/916262#M4829</link>
      <description>I propose algorithm for multiple-producer/single-consumer FIFO queue w/o atomic read-modify-write operations and memory barriers.&lt;BR /&gt;&lt;BR /&gt;Features: &lt;BR /&gt; - multiple-producer/single-consumer &lt;BR /&gt; - strong fifo (wrt one producer)&lt;BR /&gt; - unbounded (not in presented demo implementation)&lt;BR /&gt; - no atomic_rmw or memory barriers (at least on x86, on other &lt;BR /&gt; architectures #StoreStore may be needed while enqueuing) while &lt;BR /&gt; enqueuing/dequeuing *and* while node management/reclamation.&lt;BR /&gt;&lt;BR /&gt;The idea. &lt;BR /&gt; Before putting node to shared buffer producer puts node to his own &lt;BR /&gt; local list, and then put node to shared buffer. In shared buffer node &lt;BR /&gt; can get... lost. You don't mishear. Node is stored to shared buffer &lt;BR /&gt; with plain store, no CAS, so node can get lost. &lt;BR /&gt; Consumer remember last consumed node for every producer. And before &lt;BR /&gt; consuming node consumer check whether this the next node after last &lt;BR /&gt; consumed node. If this is the next node after last consumed node, then &lt;BR /&gt; all is ok. And if this is not the next node after last consumed node, &lt;BR /&gt; then some nodes from this consumer get lost, and consumer restore lost &lt;BR /&gt; nodes using producer local list of nodes. &lt;BR /&gt;&lt;BR /&gt;&lt;P&gt;Here is the code: &lt;BR /&gt; &lt;/P&gt;&lt;P&gt;---------------------------------------------------------------------------------------- &lt;BR /&gt; &lt;/P&gt;&lt;P&gt;struct producer_local_t; &lt;BR /&gt; &lt;/P&gt;&lt;P&gt;struct node_t &lt;BR /&gt; { &lt;BR /&gt;   void* user_data; &lt;BR /&gt; &lt;/P&gt;&lt;P&gt;  // used to organize producer local &lt;BR /&gt;   // list of enqueued nodes &lt;BR /&gt;   node_t* local_next; &lt;BR /&gt; &lt;/P&gt;&lt;P&gt;  // incremented with every produced node &lt;BR /&gt;   unsigned tag; &lt;BR /&gt; &lt;/P&gt;&lt;P&gt;  producer_local_t* producer_local; &lt;BR /&gt; &lt;/P&gt;&lt;P&gt;  // whether node is already consumed or not &lt;BR /&gt;   bool is_consumed; &lt;BR /&gt; &lt;/P&gt;&lt;DIV id="qhide_234686" style="display: block;" class="qt"&gt;}; &lt;BR /&gt; &lt;BR /&gt;&lt;/DIV&gt;struct producer_local_t &lt;BR /&gt; { &lt;BR /&gt;   // last consumed node &lt;BR /&gt;   node_t* last_node; &lt;BR /&gt; &lt;P&gt;  // last consumed node tag &lt;BR /&gt;   unsigned last_tag; &lt;BR /&gt; &lt;/P&gt;&lt;DIV id="qhide_234687" style="display: block;" class="qt"&gt;}; &lt;BR /&gt; &lt;BR /&gt;&lt;/DIV&gt;struct producer_t &lt;BR /&gt; { &lt;BR /&gt;   producer_local_t local; &lt;BR /&gt; &lt;P&gt;  unsigned tag_counter; &lt;BR /&gt; &lt;/P&gt;&lt;P&gt;  // newest allocated node &lt;BR /&gt;   node_t* head; &lt;BR /&gt; &lt;/P&gt;&lt;P&gt;  // oldest allocated node &lt;BR /&gt;   node_t* tail; &lt;BR /&gt; &lt;/P&gt;&lt;P&gt;  producer_t() &lt;BR /&gt;   { &lt;BR /&gt;     local.last_node = 0; &lt;BR /&gt;     local.last_tag = 0; &lt;BR /&gt;     tag_counter = 0; &lt;BR /&gt;     head = 0; &lt;BR /&gt;     tail = 0; &lt;BR /&gt;   } &lt;BR /&gt; &lt;/P&gt;&lt;DIV id="qhide_234688" style="display: block;" class="qt"&gt;}; &lt;BR /&gt; &lt;BR /&gt;&lt;/DIV&gt;struct mpsc_queue_t &lt;BR /&gt; { &lt;BR /&gt;   // for simplicity assuming that the buffer is infinite &lt;BR /&gt;   static const unsigned infinity = 100; &lt;BR /&gt;   node_t* buffer[infinity]; &lt;BR /&gt; &lt;P&gt;  node_t** consume_pos; &lt;BR /&gt; &lt;/P&gt;&lt;P&gt;  node_t** produce_pos_hint; &lt;BR /&gt; &lt;/P&gt;&lt;P&gt;  mpsc_queue_t() &lt;BR /&gt;   { &lt;BR /&gt;     memset(buffer, 0, sizeof(buffer)); &lt;BR /&gt;     consume_pos = buffer; &lt;BR /&gt;     produce_pos_hint = buffer; &lt;BR /&gt;   } &lt;BR /&gt; &lt;/P&gt;&lt;P&gt;
  void enqueue(producer_t* producer, void* data) &lt;BR /&gt;   { &lt;BR /&gt;     // multiple producers here &lt;BR /&gt; &lt;/P&gt;&lt;P&gt;    if (0 == producer-&amp;gt;local.last_node) &lt;BR /&gt;       first_time_setup(producer); &lt;BR /&gt; &lt;/P&gt;&lt;P&gt;    // allocate node for enqueue &lt;BR /&gt;     node_t* node = 0; &lt;BR /&gt; &lt;/P&gt;&lt;P&gt;    // try to reuse oldest node &lt;BR /&gt;     if (producer-&amp;gt;tail-&amp;gt;is_consumed) &lt;BR /&gt;     { &lt;BR /&gt;       // reuse oldest node &lt;BR /&gt;       node = producer-&amp;gt;tail; &lt;BR /&gt;       producer-&amp;gt;tail = producer-&amp;gt;tail-&amp;gt;local_next; &lt;BR /&gt;     } &lt;BR /&gt;     else &lt;BR /&gt;     { &lt;BR /&gt;       // allocate new node &lt;BR /&gt;       node = new node_t; &lt;BR /&gt;       node-&amp;gt;producer_local = &amp;amp;producer-&amp;gt;local; &lt;BR /&gt;     } &lt;BR /&gt; &lt;/P&gt;&lt;P&gt;    // fill node &lt;BR /&gt;     node-&amp;gt;is_consumed = false; &lt;BR /&gt;     node-&amp;gt;local_next = 0; &lt;BR /&gt;     node-&amp;gt;tag = producer-&amp;gt;tag_counter++; &lt;BR /&gt;     node-&amp;gt;user_data = data; &lt;BR /&gt; &lt;/P&gt;&lt;P&gt;    producer-&amp;gt;head-&amp;gt;local_next = node; &lt;BR /&gt;     producer-&amp;gt;head = node; &lt;BR /&gt; &lt;/P&gt;&lt;P&gt;    //#StoreStore &lt;BR /&gt; &lt;/P&gt;&lt;P&gt;    // get next empty position &lt;BR /&gt;     node_t** pos = oracle(); &lt;BR /&gt; &lt;/P&gt;&lt;P&gt;    // hazard store &lt;BR /&gt;     // this store potentially can be &lt;BR /&gt;     // overwritten by other producer &lt;BR /&gt;     *pos = node; &lt;BR /&gt; &lt;/P&gt;&lt;P&gt;    // one more hazard store &lt;BR /&gt;     produce_pos_hint = pos + 1; &lt;BR /&gt;   } &lt;BR /&gt; &lt;/P&gt;&lt;P&gt;  void first_time_setup(producer_t* producer) &lt;BR /&gt;   { &lt;BR /&gt;     // allocate new node &lt;BR /&gt;     node_t* node = new node_t; &lt;BR /&gt;     node-&amp;gt;producer_local = &amp;amp;producer-&amp;gt;local; &lt;BR /&gt;     node-&amp;gt;is_consumed = false; &lt;BR /&gt;     node-&amp;gt;local_next = 0; &lt;BR /&gt;     node-&amp;gt;tag = producer-&amp;gt;tag_counter++; &lt;BR /&gt;     node-&amp;gt;user_data = 0; &lt;BR /&gt; &lt;/P&gt;&lt;P&gt;    producer-&amp;gt;head = node; &lt;BR /&gt;     producer-&amp;gt;tail = node; &lt;BR /&gt;     producer-&amp;gt;local.last_node = node; &lt;BR /&gt;   } &lt;BR /&gt; &lt;/P&gt;&lt;P&gt;  node_t** oracle() &lt;BR /&gt;   { &lt;BR /&gt;     // for now just find first empty cell in buffer &lt;BR /&gt;     node_t** node = produce_pos_hint; &lt;BR /&gt;     while (*node) ++node; &lt;BR /&gt;     return node; &lt;BR /&gt;   } &lt;BR /&gt; &lt;/P&gt;&lt;P&gt;  void* dequeue() &lt;BR /&gt;   { &lt;BR /&gt;     // only single consumer here &lt;BR /&gt; &lt;/P&gt;&lt;P&gt;    node_t* node = *consume_pos; 
&lt;BR /&gt; &lt;/P&gt;&lt;P&gt;    if (0 == node) &lt;BR /&gt;       return 0; &lt;BR /&gt; &lt;/P&gt;&lt;P&gt;    // Only rely on data-dependency here &lt;BR /&gt; &lt;/P&gt;&lt;P&gt;    // chech whether we have lost some nodes &lt;BR /&gt;     // since previous consumed node from the producer or not &lt;BR /&gt;     producer_local_t* local = node-&amp;gt;producer_local; &lt;BR /&gt;     if (local-&amp;gt;last_tag + 1 == node-&amp;gt;tag) &lt;BR /&gt;       // producer tags are consistent &lt;BR /&gt;       // i.e. we don't lost any nodes, &lt;BR /&gt;       // so just increment pos &lt;BR /&gt;       ++consume_pos; &lt;BR /&gt;     else &lt;BR /&gt;       // producer tags are inconsistent &lt;BR /&gt;       // i.e. we have lost some nodes, &lt;BR /&gt;       // so get next node from previous &lt;BR /&gt;       // consumed node from the producer &lt;BR /&gt;       // and _don't_ increment pos &lt;BR /&gt;       node = local-&amp;gt;last_node-&amp;gt;local_next; &lt;BR /&gt; &lt;/P&gt;&lt;P&gt;    // we don't need last consumed node, &lt;BR /&gt;     // so mark the node accordingly &lt;BR /&gt;     local-&amp;gt;last_node-&amp;gt;is_consumed = true; &lt;BR /&gt; &lt;/P&gt;&lt;P&gt;    // set last consumed node &lt;BR /&gt;     local-&amp;gt;last_node = node; &lt;BR /&gt;     local-&amp;gt;last_tag = node-&amp;gt;tag; &lt;BR /&gt; &lt;/P&gt;&lt;P&gt;    return node-&amp;gt;user_data; &lt;BR /&gt;   } &lt;BR /&gt; &lt;/P&gt;}; &lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;Comments/suggestions/remarks/criticism are welcome&lt;BR /&gt;&lt;BR /&gt;Dmitriy V'jukov &lt;BR /&gt;</description>
      <pubDate>Sun, 08 Jul 2007 18:14:13 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/MPSC-FIFO-Queue-w-o-atomic-rmw-membars/m-p/916262#M4829</guid>
      <dc:creator>Dmitry_Vyukov</dc:creator>
      <dc:date>2007-07-08T18:14:13Z</dc:date>
    </item>
    <item>
      <title>Remarks</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/MPSC-FIFO-Queue-w-o-atomic-rmw-membars/m-p/916263#M4830</link>
      <description>I guess the first question: how to make this queue unbounded?&lt;BR /&gt;&lt;BR /&gt;To make this queue unbounded you can use nested list as underlying data structure. Nests must be large enough. And to manage nests lifetime you can use any of PDR techniques: RCU/SMR/ROP/vZOOM/ref-counting. Something like this:&lt;BR /&gt;&lt;BR /&gt;struct nest_t&lt;BR /&gt;{&lt;BR /&gt; node_t* buffer[1023];&lt;BR /&gt; nest_t* next;&lt;BR /&gt;};&lt;BR /&gt;&lt;BR /&gt;And of course you can use this queue as bounded with cyclic bounded buffer. Since consumer is only one, he can freely zeroize cells after himself.&lt;BR /&gt;&lt;BR /&gt;Dmitriy V'jukov&lt;BR /&gt;</description>
      <pubDate>Sun, 08 Jul 2007 18:28:11 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/MPSC-FIFO-Queue-w-o-atomic-rmw-membars/m-p/916263#M4830</guid>
      <dc:creator>Dmitry_Vyukov</dc:creator>
      <dc:date>2007-07-08T18:28:11Z</dc:date>
    </item>
    <item>
      <title>Re: MPSC FIFO Queue w/o atomic_rmw/membars</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/MPSC-FIFO-Queue-w-o-atomic-rmw-membars/m-p/916264#M4831</link>
      <description>Function oracle() definitely must be more effective in real-life.&lt;BR /&gt;&lt;BR /&gt;oracle() function must use produce_pos_hint variable and some smart algorithm to determine enqueue position. Something like this:&lt;BR /&gt;&lt;BR /&gt;1. Check current enqueue position hint &lt;BR /&gt;2. If fail then check several next positions &lt;BR /&gt;3. If fail then use binary search in [enqueue_position_hint, end_of_beffer] to find right position &lt;BR /&gt;Note: hint can't point to position after actual enqueue position, only before. &lt;BR /&gt;&lt;BR /&gt;Dmitriy V'jukov</description>
      <pubDate>Sun, 08 Jul 2007 18:36:45 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/MPSC-FIFO-Queue-w-o-atomic-rmw-membars/m-p/916264#M4831</guid>
      <dc:creator>Dmitry_Vyukov</dc:creator>
      <dc:date>2007-07-08T18:36:45Z</dc:date>
    </item>
    <item>
      <title>Re: MPSC FIFO Queue w/o atomic_rmw/membars</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/MPSC-FIFO-Queue-w-o-atomic-rmw-membars/m-p/916265#M4832</link>
      <description>Here is one problem with this algorithm (as Gil Hamilton&lt;SPAN class="fontsize2 author"&gt;&lt;SPAN style="color: rgb(121, 6, 25);"&gt;&lt;/SPAN&gt;
  point&lt;/SPAN&gt;).&lt;BR /&gt;&lt;BR /&gt;If/when a producer's node is lost, that will not even be &lt;BR /&gt; discovered until the next time that the producer manages to produce a &lt;BR /&gt; node that *isn't* lost, if ever. So if there are many producers and/or &lt;BR /&gt; some producers produce a node only occasionally, lost nodes can sit &lt;BR /&gt; unconsumed for an indefinite time. &lt;BR /&gt;&lt;BR /&gt;In some envirionments this is not a problem at all (I suppose that node loss frequency will be low enough).&lt;BR /&gt;&lt;BR /&gt;And if this is a problem than I propose several solutions.&lt;BR /&gt;1. Consumer can (periodically/episodically) check local queues of all producers for lost nodes (global list of all producers must be maintained).&lt;BR /&gt;&lt;BR /&gt;2. Consumers can periodically enqueue fake nodes to queue to push frozen nodes in the queue.&lt;BR /&gt;&lt;BR /&gt;Dmitriy V'jukov&lt;BR /&gt;</description>
      <pubDate>Sun, 08 Jul 2007 18:44:02 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/MPSC-FIFO-Queue-w-o-atomic-rmw-membars/m-p/916265#M4832</guid>
      <dc:creator>Dmitry_Vyukov</dc:creator>
      <dc:date>2007-07-08T18:44:02Z</dc:date>
    </item>
    <item>
      <title>Re: MPSC FIFO Queue w/o atomic_rmw/membars</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/MPSC-FIFO-Queue-w-o-atomic-rmw-membars/m-p/916266#M4833</link>
      <description>You may say "If you want raw performance, why not use N (N - number of producers) single-producer/single-consumer queues instead of 1 multiple-producer/single-consumer queue?"&lt;BR /&gt;&lt;BR /&gt;Yes, this is rational suggestion.&lt;BR /&gt;If there are few threads, then - yes. It will be very good solution. &lt;BR /&gt; But I am targeted and thinking about manycore machines with, for &lt;BR /&gt; example, 100 cores (Intel promise 80 cores in 5 years). And you can &lt;BR /&gt; have, for example, 2 threads per core. So 200 threads. Every consumer &lt;BR /&gt; must pull 200 spsc queues... And to block when all queues are empty, &lt;BR /&gt; consumer must check all 200 queues 2 times... &lt;BR /&gt;And to determine total node count in N spsc queues, consumer have to &lt;BR /&gt; check all N queues too. Node count needed for load-balancing, &lt;BR /&gt; statistics, feedback etc... &lt;BR /&gt; I am thinking about solution when consumer have, for example, N/10 &lt;BR /&gt; mpsc queues instead of N spsc queues (N - number of threads). So and &lt;BR /&gt; number of queues would be moderate and contention would be moderate &lt;BR /&gt; too. &lt;BR /&gt;&lt;BR /&gt;Dmitriy V'jukov</description>
      <pubDate>Sun, 08 Jul 2007 19:02:46 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/MPSC-FIFO-Queue-w-o-atomic-rmw-membars/m-p/916266#M4833</guid>
      <dc:creator>Dmitry_Vyukov</dc:creator>
      <dc:date>2007-07-08T19:02:46Z</dc:date>
    </item>
  </channel>
</rss>

