Re: Developing a TBB Memory Pool

AJ13 · ‎12-07-2007

Hey,

Memory has become a bottleneck in YetiSim execution (in particular the use of boost::shared_ptr), and I've decided to deal with it now rather than later. The strategy that I've decided upon is to implement a singleton MemoryManager class, which is responsible for allocation / deallocation of all objects. This is fine for a simulator, and rather custom to my project.

Another feature I want, is a memory pool so that I can reuse objects. This should improve performance, but only if there is no mutex on the pool. I've looked at boost::pool, and with a quick glance at it, it seems that the entire pool is mutexed to make it thread safe. This is fine, but I think we can do better with a better thought-out approach. I avoid mutexes except in sections of code where they are completely required. A memory pool could be a highly contested resource, and could lead to incorrect code or inefficient code if not implemented properly. I would prefer an approach in which mutexes are not required due to the algorithm employed.

I will be reading papers today, and thinking on the design of the pool. Thoughts are welcome. The pool will of course be contributed back to TBB for inclusion.

Thanks,

AJ

Alexey-Kukanov · ‎12-07-2007

TBB provides a general-purpose memory allocator that was specially designed to work well with multiple threads allocating and freeing memoryat the same time. It is fast enough (at least for allocation sizes <1K) and tries hard to reuse memory efficiently. I suggest you to look at it and try it and see if it solves your memory issues before you start implementing your ownmemory manager. Seehttp://www.intel.com/technology/itj/2007/v11i4/5-foundations/5-memory.htmfor some information about the allocator and some benchmarking data, and ask here for any additional information you need.

AJ13 · ‎12-07-2007

I am using both the scalable allocator to override new / delete, and cache_aligned_allocator. The problem with my program, is that I could have a huge number (> 100,000) objects being used for a while, then no longer needed, then used, then no longer needed.... so I get a lot of wasted effort in allocation / deallocation of these objects when they could be recycled.

I probably should have said object pool, or object cache, since I only need to be able to pool objects... although a generic memory pool could be helpful too. I'm investigating both right now.

AJ13 · ‎12-07-2007

Hrrm, I didn't know scalable allocator did that already!

I still think an object pool, or object cache, in which objects can be "recycled" by implementing a recycle() function would be useful, if only to minimalize effort in calling constructors, on top of allocating memory.

Alexey-Kukanov · ‎12-07-2007

Yes if you allocate all 100000 objects nearly once and then free them nearly once, and then allocate again etc, keeping them in a pool in between for faster reallocation seems useful but...

... but if your objects can be allocated and deallocated by different threads, and you want to avoid excessive syncronization, you would need a thread-specific object pools,..

... and then you find out that TBB does not provide enough information to you to implement thread-specific pools and you at least need a cross-platform abstraction over thread-local storage,..

... and then you find out your objects can migrate between threads and be deallocated in a different thread than they were allocated in, and you would need to provide some mechanism to compensate such migration and prevent object hoarding unused on consumer pools, and you would need some syncronization for that...

...and if the initial design in your mind is a contiguous array of objects that transfers into a singly-linked FIFO list as objects get allocated and freed,..

then I suggest you to re-think now :) because the scalable allocator did all that already, with relatively little additional overhead, in many cases in a non-blocking manner.

In fact, the TBB task scheduler uses internal pools of free task objects. But we do that in purpose, e.g. for sake of customers who don't want to distribute yet another DLL (I mean tbbmalloc.dll) with their product and TBB ends up using vanilla malloc() which might be slow with multiple threads. And we have fought object hoarding (often perceived as leaking memory) enough already for me to warn you :)

AJ13 · ‎12-07-2007

It's nice when something you think you have to design is hidden away in a library ;-) I'll leave the object pool idea aside for now then, and trust scalable_allocator to take care of it. I'll take some time to review the code of this magic to see how it works.

One final thought.... does the cache_aligned_allocator extend the scalable_allocator? That is, if I'm using the cache_aligned_allocator in STL containers, will this allocator also have the benefits of the scalable_allocator, on top of preventing false sharing ?

Thanks for saving me a few days of work!

Alexey-Kukanov · ‎12-10-2007

I answered in the new thread.