<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: unbounded multi-producer/consumer queue... in Intel® Moderncode for Parallel Architectures</title>
    <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/unbounded-multi-producer-consumer-queue/m-p/869984#M2849</link>
    <description>&lt;DIV style="margin:0px;"&gt;
&lt;DIV id="quote_reply" style="width: 100%; margin-top: 5px;"&gt;
&lt;DIV style="margin-left:2px;margin-right:2px;"&gt;Quoting - &lt;A href="https://community.intel.com/en-us/profile/6002"&gt;Chris M. Thomasson&lt;/A&gt;&lt;/DIV&gt;
&lt;DIV style="background-color:#E5E5E5; padding:5px;border: 1px; border-style: inset;margin-left:2px;margin-right:2px;"&gt;&lt;EM&gt;
&lt;/EM&gt;&lt;PRE&gt;&lt;EM&gt;[cpp]void&lt;BR /&gt;mpmcq_push(&lt;BR /&gt; struct mpmcq* const self,&lt;BR /&gt; struct node* node&lt;BR /&gt;) {&lt;BR /&gt;  struct node* prev;&lt;BR /&gt;  node-&amp;gt;m_next = NULL;&lt;BR /&gt;  prev = ATOMIC_SWAP_REL(&amp;amp;self-&amp;gt;head, node); // &lt;STRONG&gt;(A)&lt;/STRONG&gt;&lt;BR /&gt;  ATOMIC_STORE(&amp;amp;prev-&amp;gt;next, node); // &lt;STRONG&gt;(B)&lt;/STRONG&gt;&lt;BR /&gt;}&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;struct node*&lt;BR /&gt;mpmcq_pop(&lt;BR /&gt; struct mpmcq* const self&lt;BR /&gt;) {&lt;BR /&gt;  void* state;&lt;BR /&gt;  struct dwcas_anchor cmp, xchg;&lt;BR /&gt;  cmp = self-&amp;gt;tail;&lt;BR /&gt;  do {&lt;BR /&gt;    struct node* next = ATOMIC_LOAD(&amp;amp;cmp.node-&amp;gt;next); // &lt;STRONG&gt;(C)&lt;/STRONG&gt;&lt;BR /&gt;    if (! next) return NULL;&lt;BR /&gt;    state = next-&amp;gt;state;&lt;BR /&gt;    xchg.node = next;&lt;BR /&gt;    xchg.aba = cmp.aba + 1;&lt;BR /&gt;  } while (! ATOMIC_DWCAS_ACQ(&amp;amp;self-&amp;gt;tail, &amp;amp;cmp, &amp;amp;xchg)); // &lt;STRONG&gt;(D)&lt;/STRONG&gt;&lt;BR /&gt;  cmp.node-&amp;gt;state = state;&lt;BR /&gt;  return cmp.node;&lt;BR /&gt;}&lt;BR /&gt;[/cpp]&lt;/EM&gt;&lt;/PRE&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;BR /&gt;I think the fences here must be different:&lt;BR /&gt;A - relaxed&lt;BR /&gt;B - release&lt;BR /&gt;C - acquire&lt;BR /&gt;D - acq_rel (but not seq_cst, i.e. no #StoreLoad)&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;</description>
    <pubDate>Fri, 14 Aug 2009 08:24:13 GMT</pubDate>
    <dc:creator>Dmitry_Vyukov</dc:creator>
    <dc:date>2009-08-14T08:24:13Z</dc:date>
    <item>
      <title>unbounded multi-producer/consumer queue...</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/unbounded-multi-producer-consumer-queue/m-p/869982#M2847</link>
      <description>I am presenting pseudo-code for the most efficient unbounded multi-producer/consumer queues I have seen. It has wait-free pushes, and lock-free pops. One atomic RMW operation per-operation, and one release barrier for push and one acquire barrier for pop. Here is some crude pseudo-code:&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;
&lt;PRE&gt;[cpp]struct node {
  node* next;
  void* state;
};


struct dwcas_anchor {
  int32 aba;
  struct node* node;
};


struct mpmcq {
  struct dwcas_anchor tail;
  struct node* head;
};


void
mpmcq_init(
 struct mpmcq* const self,
 struct node* dummy
) {
  dummy-&amp;gt;next = NULL;
  self-&amp;gt;head = dummy;
  self-&amp;gt;tail.node = dummy;
}


void
mpmcq_push(
 struct mpmcq* const self,
 struct node* node
) {
  struct node* prev;
  node-&amp;gt;m_next = NULL;
  prev = ATOMIC_SWAP_REL(&amp;amp;self-&amp;gt;head, node);
  ATOMIC_STORE(&amp;amp;prev-&amp;gt;next, node);
}


struct node*
mpmcq_pop(
 struct mpmcq* const self
) {
  void* state;
  struct dwcas_anchor cmp, xchg;
  cmp = self-&amp;gt;tail;
  do {
    struct node* next = ATOMIC_LOAD(&amp;amp;cmp.node-&amp;gt;next);
    if (! next) return NULL;
    state = next-&amp;gt;state;
    xchg.node = next;
    xchg.aba = cmp.aba + 1;
  } while (! ATOMIC_DWCAS_ACQ(&amp;amp;self-&amp;gt;tail, &amp;amp;cmp, &amp;amp;xchg));
  cmp.node-&amp;gt;state = state;
  return cmp.node;
}

[/cpp]&lt;/PRE&gt;
&lt;BR /&gt;
&lt;DIV&gt;Please note that the function `ATOMIC_DWCAS_ACQ()' is assumed to automatically update the comperand on failure.&lt;BR /&gt;&lt;/DIV&gt;
&lt;DIV&gt;&lt;BR /&gt;&lt;/DIV&gt;
&lt;DIV&gt;This has the same memory management properties as a lock-free stack. It beats the heck out of all the other unbounded multi-producer consumer queues I have seen to date.&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;Enjoy!&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;;^)&lt;BR /&gt;&lt;/DIV&gt;</description>
      <pubDate>Sat, 27 Jun 2009 15:32:51 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/unbounded-multi-producer-consumer-queue/m-p/869982#M2847</guid>
      <dc:creator>Chris_M__Thomasson</dc:creator>
      <dc:date>2009-06-27T15:32:51Z</dc:date>
    </item>
    <item>
      <title>Re: unbounded multi-producer/consumer queue...</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/unbounded-multi-producer-consumer-queue/m-p/869983#M2848</link>
      <description>Please note that there is a race-condition in the non-intrusive version of the algorithm I posted IF, and ONLY IF, you read the tail pointer BEFORE the tail aba counter. Here is explanation and Relacy source-code which proves it:&lt;BR /&gt;&lt;BR /&gt;&lt;A href="http://groups.google.com/group/comp.programming.threads/msg/62527c65a440516a" target="_blank"&gt;http://groups.google.com/group/comp.programming.threads/msg/62527c65a440516a&lt;/A&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;A href="http://relacy.pastebin.com/f4b57bda2" target="_blank"&gt;http://relacy.pastebin.com/f4b57bda2&lt;/A&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;If you load the tail pointer AFTER the tail aba counter then there is NO race-condition in any way shape or form.&lt;BR /&gt;
&lt;DIV&gt;&lt;BR /&gt;&lt;/DIV&gt;
&lt;DIV&gt;&lt;BR /&gt;&lt;/DIV&gt;
&lt;DIV&gt;Here is example code which does exactly that:&lt;/DIV&gt;
&lt;DIV&gt;&lt;BR /&gt;&lt;/DIV&gt;
&lt;DIV&gt;
&lt;PRE&gt;[cpp]struct node {
  node* next;
  void* state;
};


struct dwcas_anchor {
  int32 aba;
  struct node* node;
};


struct mpmcq {
  struct dwcas_anchor tail;
  struct node* head;
};


void
mpmcq_init(
 struct mpmcq* const self,
 struct node* dummy
) {
  dummy-&amp;gt;next = NULL;
  self-&amp;gt;head = dummy;
  self-&amp;gt;tail.node = dummy;
}


void
mpmcq_push(
 struct mpmcq* const self,
 struct node* node
) {
  struct node* prev;
  node-&amp;gt;m_next = NULL;
  prev = ATOMIC_SWAP(&amp;amp;self-&amp;gt;head, node);
  ATOMIC_STORE(&amp;amp;prev-&amp;gt;next, node);
}


struct node*
mpmcq_pop(
 struct mpmcq* const self
) {
  void* state;
  struct dwcas_anchor cmp, xchg;
  cmp.aba = self-&amp;gt;aba; // BEFORE!
  cmp.node = self-&amp;gt;tail; // AFTER!
  do {
    struct node* next = ATOMIC_LOAD(&amp;amp;cmp.node-&amp;gt;next);
    if (! next) return NULL;
    state = next-&amp;gt;state;
    xchg.node = next;
    xchg.aba = cmp.aba + 1;
  } while (! ATOMIC_DWCAS(&amp;amp;self-&amp;gt;tail, &amp;amp;cmp, &amp;amp;xchg));
  cmp.node-&amp;gt;state = state;
  return cmp.node;
}[/cpp]&lt;/PRE&gt;
&lt;BR /&gt;&lt;/DIV&gt;
&lt;DIV&gt;&lt;BR /&gt;&lt;/DIV&gt;
&lt;DIV&gt;Hope that helps!&lt;/DIV&gt;</description>
      <pubDate>Thu, 13 Aug 2009 01:28:13 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/unbounded-multi-producer-consumer-queue/m-p/869983#M2848</guid>
      <dc:creator>Chris_M__Thomasson</dc:creator>
      <dc:date>2009-08-13T01:28:13Z</dc:date>
    </item>
    <item>
      <title>Re: unbounded multi-producer/consumer queue...</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/unbounded-multi-producer-consumer-queue/m-p/869984#M2849</link>
      <description>&lt;DIV style="margin:0px;"&gt;
&lt;DIV id="quote_reply" style="width: 100%; margin-top: 5px;"&gt;
&lt;DIV style="margin-left:2px;margin-right:2px;"&gt;Quoting - &lt;A href="https://community.intel.com/en-us/profile/6002"&gt;Chris M. Thomasson&lt;/A&gt;&lt;/DIV&gt;
&lt;DIV style="background-color:#E5E5E5; padding:5px;border: 1px; border-style: inset;margin-left:2px;margin-right:2px;"&gt;&lt;EM&gt;
&lt;/EM&gt;&lt;PRE&gt;&lt;EM&gt;[cpp]void&lt;BR /&gt;mpmcq_push(&lt;BR /&gt; struct mpmcq* const self,&lt;BR /&gt; struct node* node&lt;BR /&gt;) {&lt;BR /&gt;  struct node* prev;&lt;BR /&gt;  node-&amp;gt;m_next = NULL;&lt;BR /&gt;  prev = ATOMIC_SWAP_REL(&amp;amp;self-&amp;gt;head, node); // &lt;STRONG&gt;(A)&lt;/STRONG&gt;&lt;BR /&gt;  ATOMIC_STORE(&amp;amp;prev-&amp;gt;next, node); // &lt;STRONG&gt;(B)&lt;/STRONG&gt;&lt;BR /&gt;}&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;struct node*&lt;BR /&gt;mpmcq_pop(&lt;BR /&gt; struct mpmcq* const self&lt;BR /&gt;) {&lt;BR /&gt;  void* state;&lt;BR /&gt;  struct dwcas_anchor cmp, xchg;&lt;BR /&gt;  cmp = self-&amp;gt;tail;&lt;BR /&gt;  do {&lt;BR /&gt;    struct node* next = ATOMIC_LOAD(&amp;amp;cmp.node-&amp;gt;next); // &lt;STRONG&gt;(C)&lt;/STRONG&gt;&lt;BR /&gt;    if (! next) return NULL;&lt;BR /&gt;    state = next-&amp;gt;state;&lt;BR /&gt;    xchg.node = next;&lt;BR /&gt;    xchg.aba = cmp.aba + 1;&lt;BR /&gt;  } while (! ATOMIC_DWCAS_ACQ(&amp;amp;self-&amp;gt;tail, &amp;amp;cmp, &amp;amp;xchg)); // &lt;STRONG&gt;(D)&lt;/STRONG&gt;&lt;BR /&gt;  cmp.node-&amp;gt;state = state;&lt;BR /&gt;  return cmp.node;&lt;BR /&gt;}&lt;BR /&gt;[/cpp]&lt;/EM&gt;&lt;/PRE&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;BR /&gt;I think the fences here must be different:&lt;BR /&gt;A - relaxed&lt;BR /&gt;B - release&lt;BR /&gt;C - acquire&lt;BR /&gt;D - acq_rel (but not seq_cst, i.e. no #StoreLoad)&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;</description>
      <pubDate>Fri, 14 Aug 2009 08:24:13 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/unbounded-multi-producer-consumer-queue/m-p/869984#M2849</guid>
      <dc:creator>Dmitry_Vyukov</dc:creator>
      <dc:date>2009-08-14T08:24:13Z</dc:date>
    </item>
    <item>
      <title>Re: unbounded multi-producer/consumer queue...</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/unbounded-multi-producer-consumer-queue/m-p/869985#M2850</link>
      <description>&lt;DIV style="margin: 0px; height: auto;"&gt;&lt;/DIV&gt;
&lt;BR /&gt;Chris, Dmitriy,&lt;BR /&gt;&lt;BR /&gt;Call me old school if you wish.&lt;BR /&gt;&lt;BR /&gt;When you have&lt;BR /&gt;&lt;BR /&gt;
&lt;PRE&gt;[cpp]p1-&amp;gt;Node-&amp;gt;Node-&amp;gt;Node-&amp;gt;NULL
p2---------------^

From my perspective p1 is Head, p2 is Tail&lt;BR /&gt;Your code samples has the naming convention turned around, why?&lt;BR /&gt;Jim&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;[/cpp]&lt;/PRE&gt;</description>
      <pubDate>Fri, 14 Aug 2009 12:51:44 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/unbounded-multi-producer-consumer-queue/m-p/869985#M2850</guid>
      <dc:creator>jimdempseyatthecove</dc:creator>
      <dc:date>2009-08-14T12:51:44Z</dc:date>
    </item>
    <item>
      <title>Re: unbounded multi-producer/consumer queue...</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/unbounded-multi-producer-consumer-queue/m-p/869986#M2851</link>
      <description>&lt;DIV style="margin:0px;"&gt;
&lt;DIV id="quote_reply" style="width: 100%; margin-top: 5px;"&gt;
&lt;DIV style="margin-left:2px;margin-right:2px;"&gt;Quoting - &lt;A href="https://community.intel.com/en-us/profile/347331"&gt;Dmitriy Vyukov&lt;/A&gt;&lt;/DIV&gt;
&lt;DIV style="background-color:#E5E5E5; padding:5px;border: 1px; border-style: inset;margin-left:2px;margin-right:2px;"&gt;&lt;EM&gt; &lt;BR /&gt;I think the fences here must be different:&lt;BR /&gt;A - relaxed&lt;BR /&gt;B - release&lt;BR /&gt;C - acquire&lt;BR /&gt;D - acq_rel (but not seq_cst, i.e. no #StoreLoad)&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;/EM&gt;&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;BR /&gt;
&lt;DIV&gt;Humm... Relacy seems to disagree. Here is example code which uses the memory barriers you laid out:&lt;BR /&gt;&lt;BR /&gt;&lt;A href="http://relacy.pastebin.com/f21025123"&gt;http://relacy.pastebin.com/f21025123&lt;/A&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;This bites the dust with a memory leak, which means the queue lost nodes.&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;Here is a version that uses sequential consistency and all is good:&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;A href="http://relacy.pastebin.com/f2ce9aab8"&gt;http://relacy.pastebin.com/f2ce9aab8&lt;/A&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;Could  you please help me create a version of the example code which does not use sequential consistency? Please note that if I change one of those seq_cst membars to any other membar, I get lost nodes and a memory leak...&lt;BR /&gt;&lt;/DIV&gt;
&lt;DIV&gt;&lt;BR /&gt;&lt;/DIV&gt;
&lt;DIV&gt;&lt;BR /&gt;&lt;/DIV&gt;
&lt;DIV&gt;Interesting...&lt;BR /&gt;&lt;/DIV&gt;
&lt;DIV&gt;&lt;BR /&gt;&lt;/DIV&gt;
&lt;DIV&gt;&lt;BR /&gt;&lt;/DIV&gt;
&lt;DIV&gt;BTW, I am using Relacy version 1.5.0&lt;BR /&gt;&lt;/DIV&gt;</description>
      <pubDate>Fri, 21 Aug 2009 13:04:39 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/unbounded-multi-producer-consumer-queue/m-p/869986#M2851</guid>
      <dc:creator>Chris_M__Thomasson</dc:creator>
      <dc:date>2009-08-21T13:04:39Z</dc:date>
    </item>
  </channel>
</rss>

