I am seeing excessive memory consumption when using the scalable_malloc/scalable_free "C" routines and TBB 4.1 (as part of Parallel Studio) that I do not see when using malloc()/free() or the mkl memory allocation routines.
In a loop, I create and destroy threads that make many calls into scalable_malloc and scalable_free. There are no scalable_ calls "across threads" or from the main thread. These calls are all balanced so no allocated memory is being left dangling.
Each time through the loop memory consumption seems to be increasing as if some thread specific buffers are not being returned when the threads are being destroyed.
MKL has a function MKL_Free_Thread_Buffers that I can call at the end of a thread, just before it dies. Does TBB need a similar call?
As Vladimir mentioned, there is an call similar to MKL_Free_Thread_Buffers(), but there is no need for it at thread’s termination time, as all per-thread buffers have to be released automatically. Are sequence of allocations is different between iterations of your outer loop (we have to understand is it memory fragmentation or memory leak)? How big is regression in memory consumption in comparison to system allocator?
I’d love to see the reproducer, if the regression is big.
but there is no need for it at thread’s termination time, as all per-thread buffers have to be released automatically.
How is that possible? How can TBB memory allocators "know" a particular thread has died and that particular thread's buffers can be released? I am using a non TBB threading library (boost::threads) on Windows.
Interestingly , as a side note, I was using OMP threading and this was not an issue. That's because OMP starts up a thread pool and uses the same threads during program execution, so threads are not being repeatedly created and destroyed...
How is that possible? How can TBB memory allocators "know" a particular thread has died and that particular thread's buffers can be released?
Under Windows, DllMain is called with with DLL_THREAD_DETACH argument on thread termination for each DLL.
Your observation about OpenMP is important. Interesting that there were no known issues (and so, fixes) related to memory leaks during thread termination.
Can you encapsulate your use of boost create thread/exit thread such that is uses a pool?
YourCreateThread :: if(ThreadAvailableInPool) takeFromPool else createThread
YourEndThread :: returnThreadContextToYourPool