I remember reading somewhere that if you link TBBMalloc or potentially use Scalabale Allocator, TBB will pre-allocates some amount of memory per thread to avoid implicit synchronizations. But i can't find this any more. I thought I found this in TBB Book but looks like it wasn't.
Is there any per thread preallocation happens in Scalable Allocator or in TBBMalloc?
In many cases Scalable Allocator and TBBMalloc are the same thing. It is different names/interfaces for Intel TBB scalable allocator (libtbbmalloc.so, tbbmalloc.dll or libtbbmalloc.dylib). The internal logic is highly complex but, in naive way, we can say that per thread and global caches are used to avoid/reduce implicit synchronizations and make some preallocation.
If you have some specific questions, feel free to ask.
Thanks a lot for clarifying that. Any suggestion about where I can find details about preallocations you mentioned? Specifically amount of memory it claims per thread. At the moment I have an application that using this library and in some cases it's using excessive heap. I also observed amount of memory varying depending on thread/core count on my computer.
>>At the moment I have an application that using this library and in some cases it's using excessive heap
This may happen when the thread that allocates the memory is not the same thread that frees the same memory.