tbb:fixed_pool peformance issue when 'full'

AndrewC · ‎11-10-2014

I have been using tbb::fixed_pool as an allocator for "small objects". When the fixed_pool is full ( returns NULL) , I default to a normal dynamic allocator.

I have found, however, that performance of tbb::fixed_pool drops very dramatically when the pool is "full". It appears that rather than having a simple internal flag to indicate this state (no more space to allocate), the fixed_pool allocator must be doing a 'lot of work' to decide it cannot allocate.

AndrewC · ‎11-10-2014

Forgot to mention , it's TBB 4.2

Alexandr_K_Intel1 · ‎11-11-2014

This is valid point, thanks! To better understand your usage model, could you uncover why you want to switch to (scalable_)malloc on NULL, not asking for more memory in pool’s callback?

AndrewC · ‎11-11-2014

I am not exactly sure what you mean. Perhaps I am missing the point, I assumed a fixed pool had a 'fixed' size. I had found that using a fixed_pool for small allocations improved performance overall but what I also found when testing was that in the 'edge' case where the pool is ( or nearly) completely full performance drops off very badly (10x at least)

static char buf[POOL_SIZE];
static tbb::fixed_pool my_pool(buf, POOL_SIZE);

	if(size<=POOL_ALLOCATION_MIN){
		ptr=my_pool.malloc(  size);
		// fixed pool full!
		if(ptr==NULL){
			ptr=::scalable_malloc( size);
		}
	}else{
		ptr=::scalable_malloc( size);
	}
	return ptr;

Alexandr_K_Intel1 · ‎11-11-2014

Is performance of growable memory pool too low on you workload? In you approach, I see a problem with ownership detection during free, i.e. during release we must find is ptr belong to pool or scalable_malloc. It's solvable but not possibly cheap.

    tbb::memory_pool<tbb::scalable_allocator<char> > pool;

    void *ptr = pool.malloc(12);
    pool.free(ptr);

AndrewC · ‎11-11-2014

I would need to experiment with tbb::memory_pool<tbb::scalable_allocator<char> > pool before making a comment.

Regarding ownership detection during free. If 'p' has been allocated from the fixed_pool ('buf') then it is easy to detect. For example.

	if(p>=buf && p<(buf+POOL_SIZE) ){
		my_pool.free(p);
	}else{
		::scalable_free(p);
	}

AndrewC · ‎11-11-2014

My experiments with tbb::memory_pool<tbb::scalable_allocator<char> > show that it shows performance degradation over using the fixed_pool

Alexandr_K_Intel1 · ‎11-12-2014

This is interesting. So far, I see 2 possible reasons for tbb::memory_pool<tbb::scalable_allocator<char> > regression over fixed_pool: premature memory releasing and small initial buffer in scalable_allocator solution. 1^st can be checked by switching to special pool that never release memory till destruction.

#include <tbb/scalable_allocator.h>

static void *getMem(intptr_t /*pool_id*/, size_t &bytes)
{
    return scalable_malloc(bytes);
}

static int putMem(intptr_t /*pool_id*/, void *ptr, size_t /*bytes*/)
{
    scalable_free(ptr);
    return 0;
}

    rml::MemoryPool *pool;
    rml::MemPoolPolicy pol(getMem, putMem, /*granularity=*/0, /*fixedPool=*/0,
                           /*keepAllMemory=*/true);

    rml::pool_create_v1(0, &pol, &pool);

    void *p1 = rml::pool_malloc(pool, 32);
    rml::pool_free(pool, p1);
    pool_destroy(pool);

And second by switching to pool that has user-controlled initial buffer.

#include <tbb/scalable_allocator.h>
#include <tbb/atomic.h>

static const size_t POOL_SIZE = 10*1024*1024;
static tbb::atomic<bool> start_buf_used;
static char start_buf[POOL_SIZE];

static void *getMem(intptr_t /*pool_id*/, size_t &bytes)
{
    if (0 == start_buf_used.compare_and_swap(1, 0)) {
        bytes = POOL_SIZE;
        return start_buf;
    }
    return scalable_malloc(bytes);
}

static int putMem(intptr_t /*pool_id*/, void *ptr, size_t /*bytes*/)
{
    if (ptr == start_buf)
        start_buf_used = 0;
    else
        scalable_free(ptr);
    return 0;
}

    rml::MemoryPool *pool;
    rml::MemPoolPolicy pol(getMem, putMem);

    rml::pool_create_v1(0, &pol, &pool);

    void *p1 = rml::pool_malloc(pool, 32);
    rml::pool_free(pool, p1);

    pool_destroy(pool);

AndrewC · ‎11-12-2014

I am not so worried about the 'slightly' slower speed of the dynamic vs. fixed_pool. It was simply unexpected that fixed_pool would have such chronically bad performance when full. So its just a heads-up on that one. Otherwise the scalable allocators are simply extremely fast compared anything else I have tried. You might want to suggest to the Intel MKL team they use the TBB alocators as the MKL allocators are much slower!