Two questions about tbb::memory_pool< tbb::scalable_allocator<char> >

bryan_f_2 · ‎03-18-2013

For tbb::memory_pool< tbb::scalable_allocator < char > > shared_memory_pool_ . Am I correct that, it pre allocate a subset of memory to avoid malloc system call during runtime? For example, after we called shared_memeory_pool_.malloc(15000000), it wouldn't call the system malloc again but just allocate from the pre-allocated memory until it's out of range (over 15000000), and need to extend the pool size?

Another question is that if it is instantiated in the main thread. And than, I called shared_memory_pool_.malloc(sizeof(my_class)) in a worker thread. Will tbb allocate that size of memory from the main heap, or would it allocate it from the thread "domain" so that the lock contention causes by the normal malloc() would still be avoided?

Alexandr_K_Intel1 · ‎03-19-2013

Am I correct that, it pre allocate a subset of memory to avoid malloc system call during runtime? For example, after we called shared_memeory_pool_.malloc(15000000), it wouldn't call the system malloc again but just allocate from the pre-allocated memory until it's out of range (over 15000000), and need to extend the pool size?

Not exactly. Propose of memory pools in our implementation is to give an user control over which memory is used for dynamic allocation and to have additional bookkeeping over used memory. If all we want is to ask huge memory block from OS and then split to fulfill smaller user’s allocations (to decrease system calls overhead), that it’s ordinary scalable_malloc() functionality.

Another question is that if it is instantiated in the main thread. And than, I called shared_memory_pool_.malloc(sizeof(my_class)) in a worker thread. Will tbb allocate that size of memory from the main heap, or would it allocate it from the thread "domain" so that the lock contention causes by the normal malloc() would still be avoided?

Here, TBB’s memory pools again work as TBB scalable allocator, i.e. per-thread caches of free objects created, to preventing lock contention and to returning upon request an object supposedly recently released by same thread (as it probably hot in CPU cache).

May I ask you why are you interested in pools? Could you describe your goals a little?

bryan_f_2 · ‎03-20-2013

Alexandr Konovalov (Intel) wrote:

May I ask you why are you interested in pools? Could you describe your goals a little?

Thank you for your reply Alexandr. Basically, my program has a main loop to read message packet from the network, and then post it to an asynchronous message_handler(s).

Therefore, firstly, what i want is to have a pool of pre-allocated memory in the main loop to hold the messages (to eliminate the malloc() overhead) from socket, and post them to the message_handler with the message pointer. This message_handler will free this message pointer once it has done with the message (btw, will this free cause lock competition?).

Secondly, each message_handler (>1 message_handler could exists with its own thread) should has its own memory pool for its own use (with tbb::scalable_allocator to avoid false sharing, and to eliminate the malloc() overhead ) .

How would you suggest me to achieve my goal (eliminate malloc overhead, lock competition, and false sharing) ?

Alexandr_K_Intel1 · ‎03-21-2013

Thanks, I hope I understand you better.

Are you know peak memory consumption for your pools? If yes, fixed_pool can be used (all memory is provided by you during pool creation, but such pools are not growable). If no, you can look at keepAllMemory mode in low-level pool interface. Here, memory only allocated when needed, but not released, so you see no malloc calls after memory consumption’s stabilization.

btw, will this free cause lock competition?

Generally, no. We have per-thread caches, so it’s expected that for common workloads hot objects are live in per-thread caches, and no contention on hot path.

Talking about false sharing, is something known about your typical allocation size? Our large (>=8129 B) objects are cacheline-aligned, so no false sharing for them is possible. For small objects (<8129 B), different objects can share same cache line if they are same size and share same thread. It might be your case, as you have single package reader/memory allocation thread. Sure, objects from different pools can’t share cache line, but there might be other sources of inefficiency for multiple-pools solution (i.e., working set might be larger then).