Intel® oneAPI Threading Building Blocks
Ask questions and share information about adding parallelism to your applications when using this threading library.
2466 Discussions

TBB Malloc memory consumption always rising in our application

danlavoie
Beginner
1,874 Views

Hi,

First of all, I must say thatI amreallyenjoying theTBB library.I modified a few of our single thread algorithms to take advantage of the parallel_for and parallel_sort constructs, andI am really impressed by the results, especially on 8 and 16 cores servers.

Ialso decided to use TBB Malloc as our default allocator. I read the technical article, as well as the source code, to try to better understand how it works and what we should expect in terms of speed up on many cores, etc.

I know that memory is never returned to the OS, as it would add the necessity to lock, and thus would remove most if not all the scalability benefits. However, in our server application, I see a constant rise in memory consumption, and unfortunately, even on 64 bits systems, it can lead to out of memory condition (without TBB, our application in my test setup uses 1 to 1.2 Gb of memory, with TBB, it goes over 2 Gb, and keeps on rising).

I looked at the code to try to understand what we were doing that could possibly be causing TBB Malloc to enter a pattern where memory was not re-used effectively.

As I understand it, TBB Malloc tries to allocate memory from the TLS structures first, then from the publicly freed objects, and lastly from the other threads structures. Our application has a lot of threads running at the same time, over 100 in a typical setup. Also, memory is often allocated in a thread to be freed in another. Finally, threads are created and destroyed from time to time. Is it possible that TBB Malloc favors allocating locally, but to avoid locking when freeing using another thread, favors releasing memory in the public list ? Is it possible that this list is seldom used with our way of allocating and destroying objects and it becomes a very huge list of allocations ? Finally, is there a way to know at runtime, besides recompiling with statistics ?

I will try to analyze our pattern of allocations better, but any help or tip about TBB Malloc inner workings in this situation would be greatly appreciated.

Thank you for your time,

Daniel Lavoie

0 Kudos
22 Replies
sadbhaw
Beginner
177 Views

I do not agree. If every next transaction is served by roughly the same set of threads which allocate objects of roughly the same size(s), then the memory freed on the previous step will be reused, no matter whether it was freed locally or remotely. The problem that Dmitry outlined in that "possible allocator problem" thread will only happen when a thread no more allocates memory of a given size, and so it does not reclaim remotely freed memory of that size. This is not the case in the discussed setup I believe, since every transactionrequires roughly the same memory as the previous one, so over time memory should be reclaimed more or less fully.

Sadbhaw could you try returning the dead objects back to the creator thread in a source-sink fashion? This way the memory will get properly recycled (hopefully)

I might be do not understand what is "source-sink fashion", but in the TBB allocatordeallocated objects are returned back to the creator thread, every time. It's just up to the creator when to reclaim it. By the way, in that forum thread I proposed a possible way to further minimize the possibility for hoarding of unused blocks; nobody commented on it yet.

As for Sadbhaw's problem, to me it looks more likely the effect of excessive padding for 8K+ sizes, which is anotherknown issue.

Hi Alexey,

just to confirm our assumption, while the main process is still running, if the creator thread itself dies, the memory is returned. right?

Thank you,

Sadbhaw

0 Kudos
Alexey-Kukanov
Employee
177 Views
Quoting - sadbhaw
Hi Alexey,

just to confirm our assumption, while the main process is still running, if the creator thread itself dies, the memory is returned. right?

Thank you,

Sadbhaw

Yes and no. The memory is made available to other threads that use the allocator so it can be reused, therefore "yes". But the memory is never unmapped (i.e. never returned to the underlying OS, so your application memory usage don't fall back from peak), therefore "no". That's the current behavior which may change in future.

0 Kudos
Reply