Intel® oneAPI Threading Building Blocks
Ask questions and share information about adding parallelism to your applications when using this threading library.
2464 Discussions

Leaking memory with scalable_malloc/free when closing threads

mdprice
Beginner
427 Views
We've had an issue leaking memory when using scalable_malloc/free on Windows. The fix was to manually call mallocThreadShutdownNotification when the thread was exiting. I never saw any documentation nor was this in Reinders' book. I ended up reading the code in MemoryAllocator.cpp and saw that this was being registered in the pthreads implementation but not the winthreads implementation. To use it I had to include src bbmallocTypeDefinitions.h out of the source code and not the include directories.

I would suggest there being a note in scalable_allocator and that mallocThreadShutdownNotification should be exposed there.

I'm using tbb21_20080605oss with Microsoft Visual Studio 2005.


0 Kudos
10 Replies
Alexey-Kukanov
Employee
427 Views

If you look at the end of src/tbbmalloc/tbbmalloc.cpp source file, you will find there that mallocThreadShutdownNotification is actually called from DllMain under DLL_THREAD_DETACH. Thus it should be called in case you use tbbmalloc.dll, and there is no need to expose the internal callbacks that are not supposed to be called by users.

I'd guess you use the allocator some other way than via tbbmalloc.dll, probably by compiling from sources and linking into your application directly. In such case, you had to figure it out by yourself, and I am glad you did. To make it easier for others, I might leave a comment in the code, say,near DllMain; do you think it would be a proper place for it?

As far as I know, there is no other "official" callback on thread exit than DLL_THREAD_DETACH (if I am mistaken and there is some, please pardon my ignorance and tell me the name). This is just one amongst the variety of reasons why TBB in general and tbbmalloc in particular do not support static linkage.

0 Kudos
Dmitry_Vyukov
Valued Contributor I
427 Views
MADakukanov:

As far as I know, there is no other "official" callback on thread exit than DLL_THREAD_DETACH (if I am mistaken and there is some, please pardon my ignorance and tell me the name). This is just one amongst the variety of reasons why TBB in general and tbbmalloc in particular do not support static linkage.



There is a neat way to solve this problem w/o DLLs. I've described it here:
http://www.rsdn.ru/forum/message/2895408.1.aspx

Well, it's not 100% official. But it will not be changed in future, because too many developers already relying on this. For example, boost.thread uses this trick to allow static linking.

Personally I think that possibility of static linking is serious advantage for any library. It allows more optimizations, it simplifies build process, it simplifies deployment.

0 Kudos
mdprice
Beginner
427 Views
Ok, I think I understand what happened. Because we have a memory leak when threads were being stopped and started and we recently added the scalable_malloc I was seeing if it was causing our problems. So in order to better debug it I staticly linked in the tbbmalloc code into a simple driver. This actually introduced my issue from above which I ended up fixing by calling the cleanup code manually. I now see that when linked as a dll it all the calls to DllMain work fine and the memory is not leaked.

And I concure on the DLL_THREAD_DETACH as the only way I know to get called when a thread exits.

Thanks for your comments, they helped me understand what happened better. And scalable_malloc certianly helped our scaling issues.

On another front I needed a cache aligned allocator for one paticular piece of code to avoid some false sharing. I used the tbb::internal::NFS_Allocate/Free call. I know in general it is not recomended to call things in tbb::internal. What should I do here? Am I supposed to create a cache_aligned_allocator and use it's methods? Am I ok just calling NFS_Allocate/Free directly (it's worked so far)?
0 Kudos
Alexey-Kukanov
Employee
427 Views

Dmitry, thanks for the reference. Now I recall I have read that CodeGuru article before. I will think more whether it is worth using within the allocator.

While generally I agree with you that statically linkable libraries are good, for some libraries it leads to significant issues. Particularly, this is true for libraries that control system-wide resources, because several instances of a static library (and so several contemporary points of resource control) can exist in the same process. And this is exactly the case for both TBB and tbbmalloc; in case of TBB, it could lead to system oversubscription; in case of tbbmalloc, it could lead to excessive memory use and issues when pointers allocated in one module are destroyed in another one. I know the good old advice to always free the memory in the same module that allocated it; but if there would be a really sole memory allocation point, would the advice exist?

I also know the mantra that we should trust the user and etc. Unfortunately it does not work, as our previous experience showed; once we start providing static version of TBB, we will end up dealing with cases where a few instances are mixed but nevertheless expected to work perfectly. Thus we do not support static libraries, and do not plan to do it at least until the point where we know how to deal with multiple instances of the library in the same process.

0 Kudos
Alexey-Kukanov
Employee
427 Views

mdprice,

I agree that the C++ allocator interface is not the most convenient thing to use; but it is not much worse either. In particular, as tbb::cache_aligned_allocator is stateless, you might use temporary instances of it to allocate and free the memory; I believe the following should work though I did not run it through a compiler:

void* p = tbb::cache_aligned_allocator().allocate(required_size);

The other alternative is to use scalable_allocator but request memory in sizes multiple to cache line size. Though this is not documented, the scalable_malloc aligns such allocations (if smaller than 4K) on cache line boundary, and this property will remain. The following should do the trick (assuming cache_line_size is a power of 2):

void* p = scalable_malloc((required_size+cache_line_size-1)&~(cache_line_size-1));
0 Kudos
Dmitry_Vyukov
Valued Contributor I
427 Views
MADakukanov:

I also know the mantra that we should trust the user and etc. Unfortunately it does not work, as our previous experience showed; once we start providing static version of TBB, we will end up dealing with cases where a few instances are mixed but nevertheless expected to work perfectly. Thus we do not support static libraries, and do not plan to do it at least until the point where we know how to deal with multiple instances of the library in the same process.


My personal opinion is that it's better to provide to end user both possibilities (dynamic and static linking). Caveats of both variants must be clearly documented. Dynamic linking can be 'the default'. And to allow static linking user will have to define special macro TBB_I_DO_UNDERSTAND_CAVEATS_OF_STATIC_LINKING.

Frankly, I several times used trick with static linking and manual calls to DllMain's of both TBB and TBB malloc. It's quite inconvenient. But works.


0 Kudos
Alexey-Kukanov
Employee
427 Views

I have recently read some good quote; I can't remember the place and the technology being discussed (something related to C++), but the expressed opinion was that if a user can't figure out how to overcome usage obstacles for the technology, that user should not even try to use this technology because the chances are minimalthat its caveats are well understood. I feel the same is true for static linkage of TBB. Those who need it, will be able to do by themselves, as proved in practice by both of you in this thread. Those who can't figure out how to do that, shouldn't be given a convenient knob :) That's just my opinion, of course.

A later addition, to not be misunderstood: the opinion I quoted above was not about any technology in general, but a particular technology (I just can't recall which one)that has some caveats and should be used carefully; and I consider static linkage of TBB being yet another example of such thing. In most cases, convenient interfaces and documentation should be provided, or course.

0 Kudos
Dmitry_Vyukov
Valued Contributor I
427 Views
MADakukanov:

I have recently read some good quote; I can't remember the place and the technology being discussed (something related to C++), but the expressed opinion was that if a user can't figure out how to overcome usage obstacles for the technology, that user should not even try to use this technology because the chances are minimalthat its caveats are well understood. I feel the same is true for static linkage of TBB. Those who need it, will be able to do by themselves, as proved in practice by both of you in this thread. Those who can't figure out how to do that, shouldn't be given a convenient knob :) That's just my opinion, of course.



Great Point!

Hmm... but you can provide a little support for those who need it and able to do it by themselves... for example rename some functions.... because now I must manually patch TBB source on every machine I use to rename one of the DllMain functions to DllMain2 :)

0 Kudos
Alexey-Kukanov
Employee
427 Views

You can contribute the patch, and we will consider :)

0 Kudos
Dmitry_Vyukov
Valued Contributor I
427 Views
MADakukanov:

You can contribute the patch, and we will consider :)



I see there is already such support in the latest release.
tbbmalloc contains functions:
extern "C" void mallocThreadShutdownNotification(void *);
extern "C" void mallocProcessShutdownNotification(void);
which are described in TypeDefinitions.h file.

And TBB itself contains __TBB_TASK_CPP_DIRECTLY_INCLUDED macro, which allows static linking of TBB.

0 Kudos
Reply