The next question, however, was whether or not the cache_aligned_allocator implements pooling. I examined the source code, and found that malloc() was called directly rather than scalable_malloc(). I would take this to imply that in fact the cache_aligned_allocator does not operate from a memory pool.
What is the reason for this?
It's probably not obvious from the code that cache_aligned_allocator uses scalable_malloc if available, otherwise (and in some other cases) falls to malloc. The reason is the same as I told here: making sure TBB is able to work even if the scalable allocator library is absent. You shouldn't just believe my word :) - look for MallocHandler and see how it is used.
aj.guillon:Perhaps this is computationally too difficult to do... finding chunks of memory in the pool that are the right size might be too much computation to do in an allocation call?
We plan improvements to the TBB allocators, and supporting aligned allocation in scalable_malloc is amongst those. And yes, it won't come for free; either it will be slower, or (more likely) it will pad the memory block. As you might guess, cache_aligned_allocator also adds some padding; so it will need to be customized to avoid excessive padding when used together with scalable_malloc.
By the way, the scalable allocator takes own actions to reduce false sharing; namely, different threads can not allocate from the same cache line. Same thread, however, can allocate a few small objects from the same cache line. This is different from the cache_aligned_allocator behaviour; the latter is thread-oblivious and treats each allocation as deserving separate cache line(s).