topic Re: unbounded multi-producer/consumer queue... in Intel® Moderncode for Parallel Architectures

unbounded multi-producer/consumer queue...

Chris_M__Thomasson — Sat, 27 Jun 2009 15:32:51 GMT

I am presenting pseudo-code for the most efficient unbounded multi-producer/consumer queues I have seen. It has wait-free pushes, and lock-free pops. One atomic RMW operation per-operation, and one release barrier for push and one acquire barrier for pop. Here is some crude pseudo-code:

[cpp]struct node {
  node* next;
  void* state;
};


struct dwcas_anchor {
  int32 aba;
  struct node* node;
};


struct mpmcq {
  struct dwcas_anchor tail;
  struct node* head;
};


void
mpmcq_init(
 struct mpmcq* const self,
 struct node* dummy
) {
  dummy->next = NULL;
  self->head = dummy;
  self->tail.node = dummy;
}


void
mpmcq_push(
 struct mpmcq* const self,
 struct node* node
) {
  struct node* prev;
  node->m_next = NULL;
  prev = ATOMIC_SWAP_REL(&self->head, node);
  ATOMIC_STORE(&prev->next, node);
}


struct node*
mpmcq_pop(
 struct mpmcq* const self
) {
  void* state;
  struct dwcas_anchor cmp, xchg;
  cmp = self->tail;
  do {
    struct node* next = ATOMIC_LOAD(&cmp.node->next);
    if (! next) return NULL;
    state = next->state;
    xchg.node = next;
    xchg.aba = cmp.aba + 1;
  } while (! ATOMIC_DWCAS_ACQ(&self->tail, &cmp, &xchg));
  cmp.node->state = state;
  return cmp.node;
}

[/cpp]

Please note that the function `ATOMIC_DWCAS_ACQ()' is assumed to automatically update the comperand on failure.

This has the same memory management properties as a lock-free stack. It beats the heck out of all the other unbounded multi-producer consumer queues I have seen to date.

Enjoy!

;^)

Re: unbounded multi-producer/consumer queue...

Chris_M__Thomasson — Thu, 13 Aug 2009 01:28:13 GMT

Please note that there is a race-condition in the non-intrusive version of the algorithm I posted IF, and ONLY IF, you read the tail pointer BEFORE the tail aba counter. Here is explanation and Relacy source-code which proves it:

http://groups.google.com/group/comp.programming.threads/msg/62527c65a440516a

http://relacy.pastebin.com/f4b57bda2

If you load the tail pointer AFTER the tail aba counter then there is NO race-condition in any way shape or form.

Here is example code which does exactly that:

[cpp]struct node {
  node* next;
  void* state;
};


struct dwcas_anchor {
  int32 aba;
  struct node* node;
};


struct mpmcq {
  struct dwcas_anchor tail;
  struct node* head;
};


void
mpmcq_init(
 struct mpmcq* const self,
 struct node* dummy
) {
  dummy->next = NULL;
  self->head = dummy;
  self->tail.node = dummy;
}


void
mpmcq_push(
 struct mpmcq* const self,
 struct node* node
) {
  struct node* prev;
  node->m_next = NULL;
  prev = ATOMIC_SWAP(&self->head, node);
  ATOMIC_STORE(&prev->next, node);
}


struct node*
mpmcq_pop(
 struct mpmcq* const self
) {
  void* state;
  struct dwcas_anchor cmp, xchg;
  cmp.aba = self->aba; // BEFORE!
  cmp.node = self->tail; // AFTER!
  do {
    struct node* next = ATOMIC_LOAD(&cmp.node->next);
    if (! next) return NULL;
    state = next->state;
    xchg.node = next;
    xchg.aba = cmp.aba + 1;
  } while (! ATOMIC_DWCAS(&self->tail, &cmp, &xchg));
  cmp.node->state = state;
  return cmp.node;
}[/cpp]

Hope that helps!

Re: unbounded multi-producer/consumer queue...

Dmitry_Vyukov — Fri, 14 Aug 2009 08:24:13 GMT

Quoting - Chris M. Thomasson

[cpp]void
mpmcq_push(
 struct mpmcq* const self,
 struct node* node
) {
  struct node* prev;
  node->m_next = NULL;
  prev = ATOMIC_SWAP_REL(&self->head, node); // (A)
  ATOMIC_STORE(&prev->next, node); // (B)
}


struct node*
mpmcq_pop(
 struct mpmcq* const self
) {
  void* state;
  struct dwcas_anchor cmp, xchg;
  cmp = self->tail;
  do {
    struct node* next = ATOMIC_LOAD(&cmp.node->next); // (C)
    if (! next) return NULL;
    state = next->state;
    xchg.node = next;
    xchg.aba = cmp.aba + 1;
  } while (! ATOMIC_DWCAS_ACQ(&self->tail, &cmp, &xchg)); // (D)
  cmp.node->state = state;
  return cmp.node;
}
[/cpp]

I think the fences here must be different:
A - relaxed
B - release
C - acquire
D - acq_rel (but not seq_cst, i.e. no #StoreLoad)

Re: unbounded multi-producer/consumer queue...

jimdempseyatthecove — Fri, 14 Aug 2009 12:51:44 GMT

Chris, Dmitriy,

Call me old school if you wish.

When you have

[cpp]p1->Node->Node->Node->NULL
p2---------------^

From my perspective p1 is Head, p2 is Tail
Your code samples has the naming convention turned around, why?
Jim


[/cpp]

Re: unbounded multi-producer/consumer queue...

Chris_M__Thomasson — Fri, 21 Aug 2009 13:04:39 GMT

Quoting - Dmitriy Vyukov

I think the fences here must be different:
A - relaxed
B - release
C - acquire
D - acq_rel (but not seq_cst, i.e. no #StoreLoad)

Humm... Relacy seems to disagree. Here is example code which uses the memory barriers you laid out:

http://relacy.pastebin.com/f21025123

This bites the dust with a memory leak, which means the queue lost nodes.

Here is a version that uses sequential consistency and all is good:

http://relacy.pastebin.com/f2ce9aab8

Could you please help me create a version of the example code which does not use sequential consistency? Please note that if I change one of those seq_cst membars to any other membar, I get lost nodes and a memory leak...

Interesting...

BTW, I am using Relacy version 1.5.0