Intel® Moderncode for Parallel Architectures
Support for developing parallel programming applications on Intel® Architecture.

unbounded multi-producer/consumer queue...

Chris_M__Thomasson
New Contributor I
1,007 Views
I am presenting pseudo-code for the most efficient unbounded multi-producer/consumer queues I have seen. It has wait-free pushes, and lock-free pops. One atomic RMW operation per-operation, and one release barrier for push and one acquire barrier for pop. Here is some crude pseudo-code:


[cpp]struct node {
  node* next;
  void* state;
};


struct dwcas_anchor {
  int32 aba;
  struct node* node;
};


struct mpmcq {
  struct dwcas_anchor tail;
  struct node* head;
};


void
mpmcq_init(
 struct mpmcq* const self,
 struct node* dummy
) {
  dummy->next = NULL;
  self->head = dummy;
  self->tail.node = dummy;
}


void
mpmcq_push(
 struct mpmcq* const self,
 struct node* node
) {
  struct node* prev;
  node->m_next = NULL;
  prev = ATOMIC_SWAP_REL(&self->head, node);
  ATOMIC_STORE(&prev->next, node);
}


struct node*
mpmcq_pop(
 struct mpmcq* const self
) {
  void* state;
  struct dwcas_anchor cmp, xchg;
  cmp = self->tail;
  do {
    struct node* next = ATOMIC_LOAD(&cmp.node->next);
    if (! next) return NULL;
    state = next->state;
    xchg.node = next;
    xchg.aba = cmp.aba + 1;
  } while (! ATOMIC_DWCAS_ACQ(&self->tail, &cmp, &xchg));
  cmp.node->state = state;
  return cmp.node;
}

[/cpp]

Please note that the function `ATOMIC_DWCAS_ACQ()' is assumed to automatically update the comperand on failure.

This has the same memory management properties as a lock-free stack. It beats the heck out of all the other unbounded multi-producer consumer queues I have seen to date.


Enjoy!


;^)
0 Kudos
4 Replies
Chris_M__Thomasson
New Contributor I
1,007 Views
Please note that there is a race-condition in the non-intrusive version of the algorithm I posted IF, and ONLY IF, you read the tail pointer BEFORE the tail aba counter. Here is explanation and Relacy source-code which proves it:

http://groups.google.com/group/comp.programming.threads/msg/62527c65a440516a


http://relacy.pastebin.com/f4b57bda2



If you load the tail pointer AFTER the tail aba counter then there is NO race-condition in any way shape or form.


Here is example code which does exactly that:

[cpp]struct node {
  node* next;
  void* state;
};


struct dwcas_anchor {
  int32 aba;
  struct node* node;
};


struct mpmcq {
  struct dwcas_anchor tail;
  struct node* head;
};


void
mpmcq_init(
 struct mpmcq* const self,
 struct node* dummy
) {
  dummy->next = NULL;
  self->head = dummy;
  self->tail.node = dummy;
}


void
mpmcq_push(
 struct mpmcq* const self,
 struct node* node
) {
  struct node* prev;
  node->m_next = NULL;
  prev = ATOMIC_SWAP(&self->head, node);
  ATOMIC_STORE(&prev->next, node);
}


struct node*
mpmcq_pop(
 struct mpmcq* const self
) {
  void* state;
  struct dwcas_anchor cmp, xchg;
  cmp.aba = self->aba; // BEFORE!
  cmp.node = self->tail; // AFTER!
  do {
    struct node* next = ATOMIC_LOAD(&cmp.node->next);
    if (! next) return NULL;
    state = next->state;
    xchg.node = next;
    xchg.aba = cmp.aba + 1;
  } while (! ATOMIC_DWCAS(&self->tail, &cmp, &xchg));
  cmp.node->state = state;
  return cmp.node;
}[/cpp]


Hope that helps!
0 Kudos
Dmitry_Vyukov
Valued Contributor I
1,007 Views
[cpp]void
mpmcq_push(
struct mpmcq* const self,
struct node* node
) {
struct node* prev;
node->m_next = NULL;
prev = ATOMIC_SWAP_REL(&self->head, node); // (A)
ATOMIC_STORE(&prev->next, node); // (B)
}


struct node*
mpmcq_pop(
struct mpmcq* const self
) {
void* state;
struct dwcas_anchor cmp, xchg;
cmp = self->tail;
do {
struct node* next = ATOMIC_LOAD(&cmp.node->next); // (C)
if (! next) return NULL;
state = next->state;
xchg.node = next;
xchg.aba = cmp.aba + 1;
} while (! ATOMIC_DWCAS_ACQ(&self->tail, &cmp, &xchg)); // (D)
cmp.node->state = state;
return cmp.node;
}
[/cpp]

I think the fences here must be different:
A - relaxed
B - release
C - acquire
D - acq_rel (but not seq_cst, i.e. no #StoreLoad)


0 Kudos
jimdempseyatthecove
Honored Contributor III
1,007 Views

Chris, Dmitriy,

Call me old school if you wish.

When you have

[cpp]p1->Node->Node->Node->NULL
p2---------------^

From my perspective p1 is Head, p2 is Tail
Your code samples has the naming convention turned around, why?
Jim


[/cpp]
0 Kudos
Chris_M__Thomasson
New Contributor I
1,007 Views
Quoting - Dmitriy Vyukov

I think the fences here must be different:
A - relaxed
B - release
C - acquire
D - acq_rel (but not seq_cst, i.e. no #StoreLoad)



Humm... Relacy seems to disagree. Here is example code which uses the memory barriers you laid out:

http://relacy.pastebin.com/f21025123


This bites the dust with a memory leak, which means the queue lost nodes.



Here is a version that uses sequential consistency and all is good:


http://relacy.pastebin.com/f2ce9aab8



Could you please help me create a version of the example code which does not use sequential consistency? Please note that if I change one of those seq_cst membars to any other membar, I get lost nodes and a memory leak...


Interesting...


BTW, I am using Relacy version 1.5.0
0 Kudos
Reply