I would like to understand how much code is shared between the 2 concurrent_queue implementations. From what I understand the bounded variant supports blocking semantics (I am not so interested in the bounded capacity feature).
What I would like to understand is, does the bounded_queue variant take care of the "thundering herd" problem? From what I can understand from the code, when the queue is empty and a thread tries to do a blocking pop() it is put to a waiting state (WaitForSingleObject). But what happens when X items are pushed to the queue - are all waiting threads woken up or only X threads are woken up? I think it is the latter but I want to confirm.
Also when I am using the default concurrent_bounded_queue (unbounded) are there any performance differences between the concurrent_queue and concurrent_bounded_queue for general push and pop ?
What I am interested to know is how concurrent_bounded_queue handles waking up of threads when there are threads blocked at pop(). The reason I ask is because I want to build a NUMA aware application where I maintain 1 queue for each NUMA node. But I also want to support work-stealing between queues, i.e. I want threads from one NUMA node to pickup jobs from other NUMA queues if its own queue is empty.
For historical reasons, there is not much of code sharing between concurrent_queue (CQ) and concurrent_bounded_queue (CBQ) but the general design is quite the same.
Push and pop operations with CBQ only unblock a single waiting thread. In fact, due to the specifics of internal design (necessary to provide sufficient concurrency and yet give certain ordering guarantees), waiting threads are in a sense"ordered", and only one of them can pop the new item or use a freed slot in the queue; so waking all but that one makes no sense.
As for performance difference between CQ and CBQ, in case of no waiting it should be minimal (if any). In principle, since the CQ code is fully in headers, optimizing compilers might be able to inline some of its code and reduce overheads. Other than that, I see no reasons for CBQ to be slower: hot paths are essentially equivalent, and all overheads associated with waiting are on a cold path in CBQ.
Thanks for your reply.
Another thing that I wanted to confirm was what is the order in which we wake up threads? For example if there are 10 threads which are blocked on pop(), after the first push() operation which of these 10 waiting threads are going to be woken up: the one that started waiting the earliest or the one which came in last?
From the code it seems like it is the earliest one but can you please confirm. If yes, then is there at better way where we can start waking up threads which entered the waiting list last for thread locality etc. and other performance reasons?
Yes, the earliest thread will be woken up first. It's a FIFO queue, after all :)
The only other way I could suggest is to wake up all waiting threads with abort(), but I guess this is not what you'd want to do. In theory I could imagine an extension to abort only the last coming thread, but I do not know whether it's doable in practice, and again this is probably not what you want.