Hello! I've been evaluating the TBB allocator to see if it would provide any performance gains for our project, and I discovered something interesting. I wrote a stress test that constantly allocs/frees memory, allocating slightly more than it frees, until the program crashes. The goal is to see how much useful allocation space the allocator can provide to the user in fragment-genic conditions. In addition, a giant 64MB allocation is periodically made and freed, to make the crash occur at a well-defined moment (i.e., when there isn't 64MB of contiguous address space left).
What I found was that the TBB allocator fails out at only around 0.5GB! The default CRT malloc makes it up to ~3.6GB, successfully utilizing most of the space available to the 32bit LARGEADDRESSAWARE process.
My searches on this board uncovered two threads of interest:
The first did not have a repro-case and seemed to be basically unsolved. The second has a lot of comments, and I'm honestly not sure exactly where it landed--but it did seem to be touching on an issue similar to mine.
This problem only occurs on Windows--I performed the same test on Mac OSX and found that scalable_malloc did roughly comparable to CRT malloc (2.8GB vs 3.1GB). This makes me wonder if perhaps I've just exposed a bug on the Windows TBB build? (This is mainly why I'm writing--if this is by design then it may just be that the TBB allocator doesn't meet our needs, which is fine).
The 'tbb_heapfragment.zip' file attached has my repro. If you have a Visual Studio 2010 command prompt, you should be able to unextract it, run the two batch files, and then run the newly-built exe to see the problem (full steps in the readme.txt).
Thanks for the responses; we are using a 32-bit server, in large part because it shares lots of code with our clients, which still don't have 64-bit chips as a minimum requirement. Converting the server to 64-bit is certainly a possibility we've talked about--obviously it would make address space congestion much less of a worry!
Of course CRT malloc is using the LFH heap by default, and that is kicking ass--beyond that I'm not quite sure what you mean, Jim. Does TBB create thread-specific private heaps? If so, do I have to twiddle something to tell TBB: "Please use the LFH setting when creating your heaps?"
The TBB scalable allocator works somewhat independent from the C++ heap manager. If you have chosen to overload operator new and delete (all allocations going through scaldable allocator) then you are subject to its fragmentation quirks. Should this present a problem, then consider:
a) not overloading new and delete with TBB scalable allocator, and adding specific scalable allocations for your high flux objects
b) overloading new and delete with TBB scalable allocator, and adding specific non-scalable allocations for your large low-flux objects.
This may reduce fragmentation tendency on 32-bit system