Is the number of types known in advance? If so, I'd be inclined to have a concurrent_queue+spin_mutex per type. A thread looking for work could:
- Randomly pick a queue
- Try to acquire a lock on it using try_acquire.
- If not successful, go back to step 1.
- If successful, munch on the queue until it is empty
- Release the lock
New items would be appended to the appropriate queue without any locking.
TBB's concurrent_queue is overkill for this, because the lock restricts the queue to the single-consumer multiple-producer case. There are efficient ways to implement such a queue with little or no locking. E.g., inside the TBB scheduler, class mail_outbox implements such a queue with only a tiny bit of blocking. (It uses a little locking to avoid some headaches that full lock free would introduce.)
The single-consumer multiple-producer restriction enables the use of some structures that would otherwise suffer the
ABA problem. For example, the stack cited in that article actually works when restricted to a single consumer.
A variant on the stack that might be more efficient in your case is to replace the queue with a singly-linked list like the stack example, but have the consumer swipe the entire list instead of popping an individual element.That reduces the number of compare-and-swap operations that the consumer has to execute. The TBB scheduler uses this technique to return blocks of memory to the thread that allocated them (look for "return_list").