Memory has become a bottleneck in YetiSim execution (in particular the use of boost::shared_ptr), and I've decided to deal with it now rather than later. The strategy that I've decided upon is to implement a singleton MemoryManager class, which is responsible for allocation / deallocation of all objects. This is fine for a simulator, and rather custom to my project.
Another feature I want, is a memory pool so that I can reuse objects. This should improve performance, but only if there is no mutex on the pool. I've looked at boost::pool, and with a quick glance at it, it seems that the entire pool is mutexed to make it thread safe. This is fine, but I think we can do better with a better thought-out approach. I avoid mutexes except in sections of code where they are completely required. A memory pool could be a highly contested resource, and could lead to incorrect code or inefficient code if not implemented properly. I would prefer an approach in which mutexes are not required due to the algorithm employed.
I will be reading papers today, and thinking on the design of the pool. Thoughts are welcome. The pool will of course be contributed back to TBB for inclusion.
I probably should have said object pool, or object cache, since I only need to be able to pool objects... although a generic memory pool could be helpful too. I'm investigating both right now.
I still think an object pool, or object cache, in which objects can be "recycled" by implementing a recycle() function would be useful, if only to minimalize effort in calling constructors, on top of allocating memory.
Yes if you allocate all 100000 objects nearly once and then free them nearly once, and then allocate again etc, keeping them in a pool in between for faster reallocation seems useful but...
... but if your objects can be allocated and deallocated by different threads, and you want to avoid excessive syncronization, you would need a thread-specific object pools,..
... and then you find out that TBB does not provide enough information to you to implement thread-specific pools and you at least need a cross-platform abstraction over thread-local storage,..
... and then you find out your objects can migrate between threads and be deallocated in a different thread than they were allocated in, and you would need to provide some mechanism to compensate such migration and prevent object hoarding unused on consumer pools, and you would need some syncronization for that...
...and if the initial design in your mind is a contiguous array of objects that transfers into a singly-linked FIFO list as objects get allocated and freed,..
then I suggest you to re-think now :) because the scalable allocator did all that already, with relatively little additional overhead, in many cases in a non-blocking manner.
In fact, the TBB task scheduler uses internal pools of free task objects. But we do that in purpose, e.g. for sake of customers who don't want to distribute yet another DLL (I mean tbbmalloc.dll) with their product and TBB ends up using vanilla malloc() which might be slow with multiple threads. And we have fought object hoarding (often perceived as leaking memory) enough already for me to warn you :)
One final thought.... does the cache_aligned_allocator extend the scalable_allocator? That is, if I'm using the cache_aligned_allocator in STL containers, will this allocator also have the benefits of the scalable_allocator, on top of preventing false sharing ?
Thanks for saving me a few days of work!