The scalable allocator might help, but the only way to know for sure is to try it.
The way the scalable allocator works in this situation is that the block of memory will be allocated from the producer's heap. When the consumer frees it, the scalable allocator will see that it came from the producer thread, and send the block back to the producer to be recycled by the producer.
Is the nature of your producer-consumer such that you can't reuse thebuffers when you're finished with them? If you set up another concurrent_queue to hold the free buffers processed by the consumer, the producer could reuse them. It would cost less than freeing and then reallocating them in general (presuming lots of things), which could shave some time when there's lots of buffers in use. Reuse thebuffers if they're available, else allocate new ones.
The most recent release of Intel Thread Checker for Linux works on Fedora Core 6 , according to the release notes. How long has it been since you last tried it? Did you pursue those failures with the product support groups (threading forum or Premier.intel.com)?
I'm glad to hear of your success, both in sidestepping the Debian issue and finding a reasonable solution to your problems.
If you're referring to this forum thread, I'm intimately familiar with that conversation . As noted in the documentation and as rediscovered in your experiments, concurrent_queue is not well suited to holding locks for any significant duration because ofthe spin-lock implementationbut works faster than other methods when the queueaccess is active. The CPU utilization numbers you quote are typical.
And thanks for letting us know how it all turned out.