Re: Summary (survey) of available Lock Free and Wait Free queue

gallus2 · ‎12-03-2009

I'm visiting the forum quite often and I've seen few posts about lock(wait)-free queues. Can someone enumerate best existing solutions for lock(wait)-free queues in multiple scenarios (single producer single consumer, multiple producers single consumer, etc)? Few forum members are active in this field, like Dmitriy Vyukov for example. The topic is surprisingly wide.

From my side, I can share the following:
FastForward for Efficient Pipeline Parallelism

It contains code for single producer, single consumer wait free queue which, from authors' POV, should be most efficient in its use scenario.

PS. Is the term "wait-free" correct and well adapted?

Dmitry_Vyukov · ‎12-03-2009

Quoting - gallus2

I'm visiting the forum quite often and I've seen few posts about lock(wait)-free queues. Can someone enumerate best existing solutions for lock(wait)-free queues in multiple scenarios (single producer single consumer, multiple producers single consumer, etc)? Few forum members are active in this field, like Dmitriy Vyukov for example. The topic is surprisingly wide.

From my side, I can share the following:
FastForward for Efficient Pipeline Parallelism

It contains code for single producer, single consumer wait free queue which, from authors' POV, should be most efficient in its use scenario.

PS. Is the term "wait-free" correct and well adapted?

Yup, I'm here :)

Well, it's difficult to enumerate all them because there are hundreds of them.
There are:
1.1. single-producer
1.2. multi-producer

2.1. single-consumer
2.2. multi-consumer

3.1. array based
3.2. node based
3.3. hybrid

4.1. intrusive
4.2. non-intrusive

5.1. bounded
5.2. unbounded

6.1. blocks consumers when queue is empty
6.2. returns 'false' when queue is empty

7.1. blocks producer when queue is full
7.2. returns 'false' when queue is full
7.3. overwrites old items when queue is full

8.1. requires GC
8.2. does not require GC

9.1. supports priorities
9.2. does not support priorities

10.1. provides causal-FIFO
10.2. provides per-producer FIFO
10.3. provides best-effort FIFO
10.4. provides LIFO
10.5. no order guarantees

11.1. wait-free for producers
11.2. lock-free producers
11.3. blocking producers

12.1. wait-free consumers
12.2. lock-free consumers
12.3. blocking consumers

13.1. queue usually contains very few or zero elements
13.2. queue usually contains substantial amount of elements

Basically, if you choose 1 item from each category you can create specialized queue for the exact requirement set.
Some combinations of requirements are not sensible. And for some sets one will probably create the same queue implementation. But anyway here is more than 100'000 permutations!..

Dmitry_Vyukov · ‎12-03-2009

My queues live here

gaston-hillar · ‎12-03-2009

Quoting - gallus2

I'm visiting the forum quite often and I've seen few posts about lock(wait)-free queues. Can someone enumerate best existing solutions for lock(wait)-free queues in multiple scenarios (single producer single consumer, multiple producers single consumer, etc)? Few forum members are active in this field, like Dmitriy Vyukov for example. The topic is surprisingly wide.

From my side, I can share the following:
FastForward for Efficient Pipeline Parallelism

It contains code for single producer, single consumer wait free queue which, from authors' POV, should be most efficient in its use scenario.

PS. Is the term "wait-free" correct and well adapted?

Hi Gallus2,

Which languages are you interested in?
In managed code, Task Parallel Library in .NET 4 Beta 2 offers concurrent collections.
However, so far, C++ has more flexible implementations than C#. Dmitriy Vyukov posted a long list. You won't see all of them implemented in C# with .NET 4 Beta 2. However, I'm sure that new solutions on top of TPL will appear once they reach the final version.

This comment is just to give you some information about managed code. Dmitriy has posts and sites with more info related to C++, as he already posted in this thread.

Cheers,

Gaston Hillar

jimdempseyatthecove · ‎12-03-2009

Lock-free means the code does not contain a software locking structure such as a mutex, spin-lock, critical section, etc...
It does NOT mean the code is free of hardware LOCK prefix on instructions such as CMPXCHG and its variants (e.g. cmpxchg8b and cmpxchg16b).

Wait-Free has not as clear of definition.

Some claim it means at least one (other)thread than the first thread competing for the resource can make progression.

Others claim it means all threads competing for the resource are capable of make progression.

I've implemented a fully lock-free/wait-free mp-mc FIFO where all threads are capable of make progression. Let me tell you that this is not easy code to write, nor easy code to debug, nor easy code to write a verification test suite.
BTW this code was complicated by the fact that it had to test for and take one of two paths depending on supported instruction set (for the x32 or x64 system). If the processor supported cmpxchg8b (32-bit system DWCAS) or cmpxchg16b (64-bit system DWCAS) it would take a faster route using those instructions. If the system did not support those instructions, the code would use a simulated DWCAS using CAS on a specially crafted three word object (Patent Pending on that).

The importance of a Wait-Free technique is for systems where you cannot have any long period latencies. You do pay for this in longer enqueue/dequeue times however however no in-opportune preemption can block advancement (well excepting for the case where ALL competing threads are simultaneously preempted)

Jim Dempsey

anthony_williams · ‎12-04-2009

As you said, the field is surprisingly wide. Dmitriy listed many directions in which the implementation of a FIFO can vary, though of course not all combinations make sense.

For a data structure to be truly "wait free", then any thread accessing the data structure can make progress, even if other threads were suspended at any point. True wait free data structures have important use cases, not least of which is that a high priority thread never has to wait for a low priority thread to finish accessing the data structure. However, providing this guarantee comes with a cost: each access to the data structure is slightly slower, as it has to take care of the additional book-keeping necessary.

Some data structures provide "nearly wait-free" accesses: the code is wait free unless a thread is suspended at a particularly inopportune point.

If the data structure is not "wait free" then it actually contains the equivalent of a spin-lock mutex: if one thread is performing an operation then others will have to wait/spin until the first thread has completed its operation.

Dmitry_Vyukov · ‎12-04-2009

Quoting - anthony_williams

For a data structure to be truly "wait free", then any thread accessing the data structure can make progress, even if other threads were suspended at any point. True wait free data structures have important use cases, not least of which is that a high priority thread never has to wait for a low priority thread to finish accessing the data structure. However, providing this guarantee comes with a cost: each access to the data structure is slightly slower, as it has to take care of the additional book-keeping necessary.

I would say that in this field QoS (progress guarantees) and performance are actually orthogonal.
One may be trying to create high-performance queue, and it just turns out so that it also wait-free. Or vice versa. Think of a simple single-producer/single-consumer queue:
http://software.intel.com/en-us/articles/single-producer-single-consumer-queue/

Or one may indeed loose performance while trying to provide forward progress guarantees. Think of the classical Michael-Scott queue:
http://www.cs.rochester.edu/research/synchronization/pseudocode/queues.html

Or smart blocking algorithm may be quite faster than lock-free solution:
http://groups.google.com/group/lock-free/browse_frm/thread/55df71b87acb8201

And I think I may not provide an example of a queue which is both slow and does not provide forward progress guarantees :)

Dmitry_Vyukov · ‎12-04-2009

Quoting - anthony_williams

If the data structure is not "wait free" then it actually contains the equivalent of a spin-lock mutex: if one thread is performing an operation then others will have to wait/spin until the first thread has completed its operation.

Indeed. Moreover, blocking data-structure itself may not even contain loops, but it returns 'false' and forces user code to loop. At first glance, such data-structure looks as wait-free, it's blocking, though.

gallus2 · ‎12-04-2009

Thanks for the answers. I think such overview of the topic is what I was looking for. For me, it turns out that the queues are most important blocks in efficient multithreaded apps. Not locks, atomic variables or algorithms, but FIFO queues. For example, the queues can be used as a replacement for locks. When an operation has to be done in at most one thread in the same time (ie. concurrently), you can either synchronize the operation or pass it as a request it to one special, selected thread through FIFO queue.

PS. The question was generally referring to C++.

Jim: your work on the topic is outstanding (I'm thinking about the QTP and its cache/numa sharing awareness). Thanks for sharing information about wait-free mp-mc queue. Information that such queue is possible to create has a value itself. Can the queue be downloaded or purchased somehow?

jimdempseyatthecove · ‎12-05-2009

Quoting - gallus2

Thanks for the answers. I think such overview of the topic is what I was looking for. For me, it turns out that the queues are most important blocks in efficient multithreaded apps. Not locks, atomic variables or algorithms, but FIFO queues. For example, the queues can be used as a replacement for locks. When an operation has to be done in at most one thread in the same time (ie. concurrently), you can either synchronize the operation or pass it as a request it to one special, selected thread through FIFO queue.

PS. The question was generally referring to C++.

Jim: your work on the topic is outstanding (I'm thinking about the QTP and its cache/numa sharing awareness). Thanks for sharing information about wait-free mp-mc queue. Information that such queue is possible to create has a value itself. Can the queue be downloaded or purchased somehow?

We can discuss this off forum. Use

jim at quick thread programming dot com

with theusual substitutions and space squishing.

Jim Dempsey

ninhngt · ‎12-29-2009

Quoting - gallus2

Thanks for the answers. I think such overview of the topic is what I was looking for. For me, it turns out that the queues are most important blocks in efficient multithreaded apps. Not locks, atomic variables or algorithms, but FIFO queues. For example, the queues can be used as a replacement for locks. When an operation has to be done in at most one thread in the same time (ie. concurrently), you can either synchronize the operation or pass it as a request it to one special, selected thread through FIFO queue.

PS. The question was generally referring to C++.

Jim: your work on the topic is outstanding (I'm thinking about the QTP and its cache/numa sharing awareness). Thanks for sharing information about wait-free mp-mc queue. Information that such queue is possible to create has a value itself. Can the queue be downloaded or purchased somehow?

Thank you for posting such interesting questions.

Summary (survey) of available Lock Free and Wait Free queues