Difference between scalable_memory and classic memory allocatio

azmodai · ‎05-22-2012

Hello,

I am currently using scalable_malloc instead of classic malloc to allocate the memory I need for my program. It works nicely, I get very interesting speedup because of that. But I think I don't understand what is really the scalable memory allocation, what does it means ?

I've read the explanation about it in the book but I'm not right sure to understand the way it works, it is said in the book that scalable allocator allocates and frees memory in a way that scales with the number of processors, I'm not sur to understand, is it about the caches of the processors ?

Thank you for you enlightenment

jimdempseyatthecove · ‎05-22-2012

Standard malloc uses a shared heap. For allocation/deallocation in a multi-threaded application this requires that the threads enter a critical section, perform the allocation/deallocation, then leave the critical section. This causes a serialization of allocation/deallocation (one thread at a time through the critical section).

scalable_malloc, without getting too technical, is like each thread having a private heap. When an allocation occurs, and the available memory is in the private heap (to the thread), the allocation occurs without a critical section. Should the private heap have insufficient resources, then the standard heap is called (with critical section) to add another hunk of memory to the private heap. This technique reduces the number of times the application allocation/deallocation passes through the critical section (permits parallel allocations).

An optimization of the private heap is to maintain pools of similar sized allocations (generally in 16/32/64 byte incriments).

scalable_malloc is not a "free lunch". The cost is in having a larger memory requirement.

Jim Dempsey

RafSchietekat · ‎05-23-2012

#0 "I'm not sur to understand, is it about the caches of the processors ?"
That's a plus.

#1 "scalable_malloc is not a "free lunch". The cost is in having a larger memory requirement."
Often true, but probablymaybe not in general.

(Edited)

azmodai · ‎05-23-2012

Thanks for your answers, it now pretty clear

jimdempseyatthecove · ‎05-28-2012

Quoting Raf Schietekat

#1 "scalable_malloc is not a "free lunch". The cost is in having a larger memory requirement."
Often true, but probablymaybe not in general.

On 64-bit platform there is generally no issue (of larger memory requiement)with using the scalable_malloc.

On 32-bit platform (or 32-bit applications run on 64-bit platform), it may be advisable to NOT overload new/delete, and the selectively use the scalable_malloc/free routines for the few high frequency malloc/free objects. (On Windows, you may also want to enable the Low Fragmentation Heap feature).

There is an additional issue of where an object is scalable allocated from one thread and deallocated by a different thread. This may cause either memory consumption issues or additional latencies. This is not an issue where allocation/deallocations are performed on a call stack (e.g. ctor/dtor of stack frame objects). But it can be a problem when an arbitrary thread can delete an object pointed/referenced by a concurrent queue of object pointers/references.

Jim Dempsey

RafSchietekat · ‎05-29-2012

The scalable memory allocator is more wasteful for some sizes than for others, and indeed it goes to extremes in efficiency (maybe less so since, e.g., 3.0 update 1?), whereas standard malloc() probably has O(1) overhead over a wide or even the entire range of sizes, but effort has been spent on improving efficiency since early implementatons (I haven't kept track, myself). Using it selectively is probably good advice, but it takes (a lot?) more attention and work. I think it depends on usage (distribution of allocation sizes), and maybe it's the other way around: redirect new/delete to the scalable allocator (C++ objects tend to be in the "good" setrange), and explicitly use malloc() for anything substantial (and perhaps redirect a few atypically large classes back to malloc()). It's worth a try, anyway, and if it's good enough I wouldn't do anything more, unless I see hard data to indicate otherwise.

I don't really know how much impact there is for inter-thread (de)allocation patterns, but maybe somebody could advise/remind us on whether the issue has been solved where memory could be exhausted by a thread that would only deallocate memory from other threads (if that was indeed the situation): it didn't seem impossible to solve, some time has passed since I first remember it being discussed, and there have been some changes in the meantime (perhaps it was fixed in 4.0?).

Did I misinterpret or overlook anything?

(Edited) As indicated (plus anything I forgot to mark).

SergeyKostrov · ‎05-29-2012

Thanks for explanations. The subject is a very interesting and I'll try to compare performance of
scalable_malloc and malloc.

Quoting jimdempseyatthecove

Quoting Raf Schietekat

#1 "scalable_malloc is not a "free lunch". The cost is in having a larger memory requirement."
Often true, but probablymaybe not in general.

...
On 32-bit platform (or 32-bit applications run on 64-bit platform), it may be advisable to NOT overload new/delete...

Almost all well knownlibraries do this!There are many cases whennew/delete C++ operators are
needed for C++ classes andthe most common is a built-inmemory leaks detection.

Best regards,
Sergey

RafSchietekat · ‎05-29-2012

An updated comparison would be quite useful, especially if it includes other promising contenders next to standard malloc (whatever that means to different people).

I'm sure that Jim didn't mean that C++ new/delete should never be redirected, but rather questioned the usefulness of redirecting all C++ (de)allocation requests to the TBB scalable allocator regardless of size. I think that we only differ on how cautious you need to be (opt-in vs. opt-out, so to speak), but, again, my intuition would easily yield to hard data.

jimdempseyatthecove · ‎05-30-2012

>> Jim didn't mean that C++ new/delete should never be redirected

Correct. You might find it convenient to experiment with overloading new/delete and then see if your (test) application fails on memory allocations. Then back-off use of overloaded new/delete as warranted.

*** Keep in mind that the developer of a program is not necessarily the user of the program, and the user (some user)of the program is likely to use the program in a manner the developer overlooks (or feels is stupid). This has to be factored in with the trade-off evaluation of performance vs memory requirements.

In the QuickThreadscalable allocator, you can overload new/delete (malloc/free)....
and then optionally use a global flag to disable/enable the scalable allocator. This means the programmer can:

a) For allocations that occuronce (via static object ctor) or once/infrequently you can start with the scalable allocator flag disabled. The advantage of this is scalable allocators (generally) use thread-by-thread pools of similar sized nodes. When a pool is empty (or first use), a slab of memory is allocated and fragmented into a new pool of similar sized nodes. When these static/once/infrequent allocations are of sizes not generally used elsewhere in the application, then disabling the scalable allocator for these allocations conserves the memory of what would have been unused in the pool(s) had they been scalably allocated.

b) For long lived objects (where critical section contention is a very small fraction of 1% of overhead) and where the object size may constitute a pool not used elsewhere, you can elect to temporarily disable the scalable allocator.

c) You can add a feature, say enabled via environment variable, which the user (or your customer service rep) can use to disable scalable allocation on those systems (or applications) that experience memory allocation failures. IOW for those situations where an application's memory requirement is too large when using scalable allocator, you can disable the scalable allocation feature and possibly manage to run.

Jim Dempsey

Difference between scalable_memory and classic memory allocation