- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I am seeing excessive memory consumption when using the scalable_malloc/scalable_free "C" routines and TBB 4.1 (as part of Parallel Studio) that I do not see when using malloc()/free() or the mkl memory allocation routines.
In a loop, I create and destroy threads that make many calls into scalable_malloc and scalable_free. There are no scalable_ calls "across threads" or from the main thread. These calls are all balanced so no allocated memory is being left dangling.
Each time through the loop memory consumption seems to be increasing as if some thread specific buffers are not being returned when the threads are being destroyed.
MKL has a function MKL_Free_Thread_Buffers that I can call at the end of a thread, just before it dies. Does TBB need a similar call?
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
Intel TBB 4.2 introduced scalable_allocation_command() function to clean-up either thread buffers or all buffers
More details are here http://software.intel.com/en-us/node/468118
--Vladimir
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
As Vladimir mentioned, there is an call similar to MKL_Free_Thread_Buffers(), but there is no need for it at thread’s termination time, as all per-thread buffers have to be released automatically. Are sequence of allocations is different between iterations of your outer loop (we have to understand is it memory fragmentation or memory leak)? How big is regression in memory consumption in comparison to system allocator?
I’d love to see the reproducer, if the regression is big.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I am seeing many megabytes of extra memory usage when using the scalable allocators.
I can try to come up with a reproducer. But it seems 4.2 addresses this issue.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
but there is no need for it at thread’s termination time, as all per-thread buffers have to be released automatically.
How is that possible? How can TBB memory allocators "know" a particular thread has died and that particular thread's buffers can be released? I am using a non TBB threading library (boost::threads) on Windows.
Interestingly , as a side note, I was using OMP threading and this was not an issue. That's because OMP starts up a thread pool and uses the same threads during program execution, so threads are not being repeatedly created and destroyed...
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
How is that possible? How can TBB memory allocators "know" a particular thread has died and that particular thread's buffers can be released?
Under Windows, DllMain is called with with DLL_THREAD_DETACH argument on thread termination for each DLL.
Your observation about OpenMP is important. Interesting that there were no known issues (and so, fixes) related to memory leaks during thread termination.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Can you encapsulate your use of boost create thread/exit thread such that is uses a pool?
YourCreateThread :: if(ThreadAvailableInPool) takeFromPool else createThread
YourEndThread :: returnThreadContextToYourPool
Jim Dempsey

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page