- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I've posted novel multi-producer/multi-consumer queue algorithm here:
http://software.intel.com/en-us/blogs/2009/08/11/multi-producermulti-consumer-seh-based-queue/
The algorithm has quite appealing characteristics:
- intrusive
- producers: 1 XCHG, wait-free
- consumers: 1 CAS on common path, mostly lock-free
- producers and consumers do not contend with each other (until queue is empty)
- no need for safe memory reclamation
I bet it will beat Michael and Scott queue to death in every way.
If you are interested we may discuss it here.
http://software.intel.com/en-us/blogs/2009/08/11/multi-producermulti-consumer-seh-based-queue/
The algorithm has quite appealing characteristics:
- intrusive
- producers: 1 XCHG, wait-free
- consumers: 1 CAS on common path, mostly lock-free
- producers and consumers do not contend with each other (until queue is empty)
- no need for safe memory reclamation
I bet it will beat Michael and Scott queue to death in every way.
If you are interested we may discuss it here.
Link Copied
6 Replies
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Quoting - Dmitriy Vyukov
I've posted novel multi-producer/multi-consumer queue algorithm here:
http://software.intel.com/en-us/blogs/2009/08/11/multi-producermulti-consumer-seh-based-queue/
The algorithm has quite appealing characteristics:
- intrusive
- producers: 1 XCHG, wait-free
- consumers: 1 CAS on common path, mostly lock-free
- producers and consumers do not contend with each other (until queue is empty)
- no need for safe memory reclamation
I bet it will beat Michael and Scott queue to death in every way.
If you are interested we may discuss it here.
http://software.intel.com/en-us/blogs/2009/08/11/multi-producermulti-consumer-seh-based-queue/
The algorithm has quite appealing characteristics:
- intrusive
- producers: 1 XCHG, wait-free
- consumers: 1 CAS on common path, mostly lock-free
- producers and consumers do not contend with each other (until queue is empty)
- no need for safe memory reclamation
I bet it will beat Michael and Scott queue to death in every way.
If you are interested we may discuss it here.
FWIW, here is a link to a previous disscussion in `comp.programming.threads':
http://groups.google.com/group/comp.programming.threads/browse_frm/thread/c43d1197c8168b4e
http://groups.google.com/group/comp.programming.threads/browse_frm/thread/c43d1197c8168b4e
The intrusive nature of the queue is extremely attractive indeed. Also, the wait-free nature of push is basically unbeatable. No PDR == very NICE! With regard to comparing it to Michael and Scott queue, well, it's going to devastate it an all fronts. I am sure of that, no testing is even needed!
;^)
;^)
Also, FWIW, here is the version that I created which does require memory management and is not intrusive:
http://software.intel.com/en-us/forums/showthread.php?t=66573
And here is a full-blown example implmentation in C#:
http://cpt.pastebin.com/f72cc3cc1
http://software.intel.com/en-us/forums/showthread.php?t=66573
And here is a full-blown example implmentation in C#:
http://cpt.pastebin.com/f72cc3cc1
The non-intrusive nature of my version is simpler, however that does not mean it's better...
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Quoting - Dmitriy Vyukov
I've posted novel multi-producer/multi-consumer queue algorithm here:
http://software.intel.com/en-us/blogs/2009/08/11/multi-producermulti-consumer-seh-based-queue/
The algorithm has quite appealing characteristics:
- intrusive
- producers: 1 XCHG, wait-free
- consumers: 1 CAS on common path, mostly lock-free
- producers and consumers do not contend with each other (until queue is empty)
- no need for safe memory reclamation
I bet it will beat Michael and Scott queue to death in every way.
If you are interested we may discuss it here.
http://software.intel.com/en-us/blogs/2009/08/11/multi-producermulti-consumer-seh-based-queue/
The algorithm has quite appealing characteristics:
- intrusive
- producers: 1 XCHG, wait-free
- consumers: 1 CAS on common path, mostly lock-free
- producers and consumers do not contend with each other (until queue is empty)
- no need for safe memory reclamation
I bet it will beat Michael and Scott queue to death in every way.
If you are interested we may discuss it here.
Please note that there is a race-condition in the non-intrusive version of the algorithm I posted IF, and ONLY IF, you read the tail pointer BEFORE the tail aba counter. Here is explanation and Relacy source-code which proves it:
http://groups.google.com/group/comp.programming.threads/msg/62527c65a440516a
http://relacy.pastebin.com/f4b57bda2
I am pretty sure that this will effect Dmitriy's algorithm as well as he is loading pointer BEFORE aba counter. However, I need to model his algorithm in Relacy before I give definitive answer.
http://groups.google.com/group/comp.programming.threads/msg/62527c65a440516a
http://relacy.pastebin.com/f4b57bda2
I am pretty sure that this will effect Dmitriy's algorithm as well as he is loading pointer BEFORE aba counter. However, I need to model his algorithm in Relacy before I give definitive answer.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Quoting - Chris M. Thomasson
I am pretty sure that this will effect Dmitriy's algorithm as well as he is loading pointer BEFORE aba counter. However, I need to model his algorithm in Relacy before I give definitive answer.
Here is the test I used for verification:
http://relacy.pastebin.com/f5df9dcfa
The test does not detect any problems.
Unfortunately it uses new not yet published syntax, but at least you may use it as a base.
I express the load as:
[cpp] head_t h1 = head_.next_.load(std::memory_order_acquire); head_t h2 = head_.next_.load(std::memory_order_acquire); head_t h (0, 0); if (rl::rand(2) == 0) { h.ptr_= h1.ptr_; h.cnt_ = h2.cnt_; } else { h.ptr_= h2.ptr_; h.cnt_ = h1.cnt_; } [/cpp]
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Quoting - Dmitriy Vyukov
Here is the test I used for verification:
http://relacy.pastebin.com/f5df9dcfa
The test does not detect any problems.
Unfortunately it uses new not yet published syntax, but at least you may use it as a base.
I express the load as:
[cpp] head_t h1 = head_.next_.load(std::memory_order_acquire); head_t h2 = head_.next_.load(std::memory_order_acquire); head_t h (0, 0); if (rl::rand(2) == 0) { h.ptr_= h1.ptr_; h.cnt_ = h2.cnt_; } else { h.ptr_= h2.ptr_; h.cnt_ = h1.cnt_; } [/cpp]
I will try and convert the example code to Relacy 1.5.0. BTW, how did you eliminate the need for `$'? That's really cool! Anyway, I have one comment. I notice that you are performing the "out-of-order" load only one time, then you enter the "DWCAS loop" and never perform it again upon DWCAS failure. That will not give much of a window for the race, if any, to occur. Might I suggest that you rework the algorithm such that the "out-of-order" load occurs on every iteration of the loop. Something like:
[cpp] node_t* dequeue() { for (;;) { head_t h1 = head_.next_.load(std::memory_order_acquire); head_t h2 = head_.next_.load(std::memory_order_acquire); head_t h (0, 0); if (rl::rand(2) == 0) { h.ptr_= h1.ptr_; h.cnt_ = h2.cnt_; } else { h.ptr_= h2.ptr_; h.cnt_ = h1.cnt_; } // [...] } // [...] }[/cpp]
Or perhaps even:
[cpp]node_t* dequeue() { rl::backoff b; for (;;) { head_t h (0, 0); head_t h1 = head_.next_.load(std::memory_order_acquire); h.ptr = h1.ptr_ b.yield(); head_t h2 = head_.next_.load(std::memory_order_acquire); h.cnt = h2.cnt_ // [...] } // [...] }[/cpp]
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Quoting - Chris M. Thomasson
I will try and convert the example code to Relacy 1.5.0. BTW, how did you eliminate the need for `$'? That's really cool!
#define memory_order_XXX mo_XXX, $
The same for all WinAPI/POSIX functions, i.e.:
#define WaitForSingleObject(obj, timeout) rl::rl_WaitForSingleObject(obj, timeout, $)
There are still some place where one has to use macros:
1. Declaration/accesses to plain vars:
DECL_VAR(int) x;
VAR(x) = 1;
2. Declaration/accesses to thread local vars:
DECL_TLS(int) y;
TLS(y) = 1;
3. Memory management functions:
NEW(int) (0);
DEL(p);
(hmm... I think I may redefine new/delete too)
Btw, I am going to publish next version under BSD style licence w/o any additional clauses.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Quoting - Chris M. Thomasson
Anyway, I have one comment. I notice that you are performing the "out-of-order" load only one time, then you enter the "DWCAS loop" and never perform it again upon DWCAS failure. That will not give much of a window for the race, if any, to occur. Might I suggest that you rework the algorithm such that the "out-of-order" load occurs on every iteration of the loop.
Yes, of course, this will be better. I was just lazy :)
Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page