- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
My name is Chris M. Thomasson and I have developed a non-blocking algorithm that allows one to transform an inherently single-threaded memory allocator into a scaleable multi-threaded allocator. This invention allows a programmer to just create an allocator algorithm without worrying about any multi-threading issues. Once your done with your single-threaded allocator, well, when you apply my non-blocking algorithm, and PRESTO! Its now a scaleable multi-threaded allocator! Is that cool or what?
Okay, here is an initial proof of concept, I have transformed the Windows Heap object created with the HEAP_NO_SERIALIZE flag. This flag disables the internal lock used by the heap, which makes it a heck of a lot more efficient. However, since the heap is no longer serialized, one cannot use it in a multi-threaded application as a general allocator; until now... The algorithm I created is fairly simple; basically, it goes like this:
___________________________________________________________________________
I create thread local user allocator in every thread.
Alloc request is forwarded to this thread-local user allocator directly.
If free request goes from thread that allocate block, then free request is forwarded to this thread-local user allocator.
If free request goes from another thread, then you accumulate this block in per-thread stack-based freelist.
Blocks from this freelist are actually freed in batches when thread allocates/deallocates another block.
___________________________________________________________________________
I mentioned this algorithm a while back:
http://groups.google.com/group/comp.programming.threads/browse_frm/thread/eb87f9cb2cb62d84
Well, I have released a pre-alpha version of it which demonstrates how to transform the Windows Heap object under GPL license; the example source-code can be found here:
http://webpages.charter.net/appcore/vzoom/malloc/vzmalloc_win_v000_c.html
Can you please run the test as-is, which uses stock malloc impl, and record the output... Then uncomment the following line:
#define USE_MEM_ALLOC
which engages my algorithm; run the test again; and post the differences in timings between the two. I think you will be fairly surprised at the performance enhancement. The lock in the Windows Heap object that `malloc' is based on severely damages its scalability when used in a multi-threaded environment. When `USE_MEM_ALLOC' is defined, ALL of the locking is removed and the heaps are created on a per-thread basis. This distribution and clever support algorithm I invented allows for the Windows Heap to scale quite nicely indeed.
I am going to tweak the algorithm such that one could plug in different allocators on a per-thread basis! In other words, thread A can use a different allocation algorithm than thread B, and so on. My algorithm actually supports this. I think the possibilities are truly amazing. It even allows thread A to allocate and free in thread B even though they use completely different allocation algorithm. That's pretty darn cool. Some may think its even impossible; its not!
BTW, I am thinking that this algorithm might be able to be used to transform other single-threaded things besides allocators... Humm, need to think some more on that.
P.S.
The test portion of the code uses the pthread-win32 library which can be downloaded from here:
http://sourceware.org/pthreads-win32
I am thinking of removing dependence on it and just use pure Windows primitives.
Any thoughts?
Link Copied
- « Previous
- Next »
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
On the other hand, if it can be implemented efficiently (especially w/o additional overheads for other usage patterns), then it's always nicely to provide efficient support for one more usage pattern...
I was confused because you didn't display sorting and organization into cohorts. I think it's Ok to sort only in mflush(). I think it's even better than my variant.
Though I think you must employ something like this ("Hashed cache array" variant):
http://groups.google.ru/group/comp.programming.threads/msg/547b4edb494beeb9?pli=1
And if we would have dense thread ids, it can help with sorting and organizing into cohorts.
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- « Previous
- Next »