Intel® oneAPI Threading Building Blocks
Ask questions and share information about adding parallelism to your applications when using this threading library.

Assert in initMemoryManager under PPC64

This line "MALLOC_ASSERT( CACHE_LINE_SIZE == sizeof(Block), ASSERT_TEXT );" in initMemoryManager (MemoryAllocator.cpp) asserts. I am using a G5 with Leopard 10.5.2. ( powerpc-apple-darwin9-gcc-4.0.1 (GCC) 4.0.1 (Apple Inc. build 5465) ) The TBB version I have is tbb20_20080319oss.

BlockS structure in MemoryAllocator.cpp should not be padded for PPC64. I added _ARCH_PPC64 to the padding check. This got rid of the assert but I believe for a more complete solution architecture defines in TypeDefinitions.h should include a define for PPC64 as well.

Thanks for such a great product.

Orhun Birsoy
0 Kudos
4 Replies


Thanks for reporting the problem! You are right, the _ARCH part in tbbmalloc should be improved for better platform support. I will take care of this.

Black Belt
char pad[CACHE_LINE_SIZE-6*sizeof(void*)-4*sizeof(int)];
/* verified by initMemoryManager() */

(Added) cache_aligned_allocator uses twice CACHE_LINE_SIZE's value (128 vs. 64), which might need to be reconciled or explained somehow (I didn't look any further yet), and it seems strange that all hardware should agree on the same value?

Yes, I am going to do padding in a similar way though not exactly the same.

The cache_aligned_allocator has the constant to be 128 for the sake of platforms where cache lines are 128 bytes (e.g. Intel Itanium processor). Again, you are right that ideally it should be determined at runtime for each HW (and I hope it will be reworked this way); but as most of current CPUs (well, at least Intel processors) have either 64 or 128 bytes in a cache line, for simplicity the biggest of the two was chosen.

In the scalable allocator, it wasconsidered less important; but my current vision is that good cache behavior is more important than smaller memory overhead; thus I will rework it soon.The block header will be of 128 bytes; and its fields changed by "foreign" threads will reside in the second half while the fields changed by owning thread only will reside in the first half. I expect it will improve performance on the hot path for systems with 64 bytes per cache line, and on Itanium processorit will at least eliminate the situation where the block header share its cache line with data.

Valued Contributor II
There might also be some implications for the older Intel NetBurst micro-architecture, which uses a 128 byte sectored L2 cache. Some BIOSes allow Adjacent Sector Prefetch to be disabled; otherwise this might be a source of thrash for the foreign block.