Intel® oneAPI Threading Building Blocks
Ask questions and share information about adding parallelism to your applications when using this threading library.

cache_aligned_allocator aligment?

renorm
Beginner
575 Views
The reference says it is typically 128 bytes. Is there a way to detect it at compiler time or run time?

Thanks in advance.
0 Kudos
7 Replies
RafSchietekat
Valued Contributor III
575 Views
Not officially, I think. It might help if you would state your purpose, because there are several issues related to this that may be relevant or not.
0 Kudos
renorm
Beginner
575 Views
I need to build an array (a pool) of objects of some class. The class alignment needs to be at least as strict as the cache line alignment to avoid false sharing, but it can't be stricter than the array alignment.

Btw, what is the smallest alignment to avoid false sharing on Core architecture - 64 or 128 bytes?
0 Kudos
RafSchietekat
Valued Contributor III
575 Views
TBB is really a bit schizophrenic about that, which is somewhat confusing me at the moment, and the value may depend on the actual hardware, so you should probably do your own research instead of looking to TBB for answers.

But are you certain that you need to avoid false sharing in an array? Unless you're certain that you've already got good parallelism with low scheduling overhead and no memory bandwidth issues, this may well be just premature optimisation. If the pool is small, you might as well allocate the members individually or always go with 128 for an array, and if the pool is large, what will be the real degree of false sharing going on anyway? Do you have any numbers to substantiate that the alignment matters on your development system?
0 Kudos
renorm
Beginner
575 Views
I have no numbers, but the false sharing is a real issue. The objects are random bit generator used in Monte-Carlo program.

I would probably go with individual allocation. That way I don't need to bother imposing alignment restrictions on class itself, since the objects aren't stored contiguously.
0 Kudos
RafSchietekat
Valued Contributor III
575 Views
For a relatively small number of intensively used objects, simply not worrying about potentially wasting 64 or so bytes per object seems to be the obvious solution, and with individual allocations you also don't need to bother with explicit padding.
0 Kudos
renorm
Beginner
575 Views
Contiguous allocation with padding has its own advantages. All frequently updated variables would be locked up inside one contiguous memory block. Once frequently written variables are isolated, there is no need to align read only variables.

Btw, using STL vector with cache_aligned_allocator doesn't prevent false sharing of vector's elements. One could try to align the contained class itself:
[cpp]struct __declspec(align(CACHE_LINE_SIZE)) MyStruct;

// this won't compile
std::vector x;

// this won't compile too
std::vector > y;[/cpp]

Structures with aligned members such as SSE primitives __m128 don't work with STL containers too.

Anyway, imposing alignment requirement on the class is a draconian solution bound to fail after hardware or library update.

A better solution (maybe new TBB feature?) would be a vector like container with automatic padding:
[cpp]// no need to align
struct RNG;

padded_vector v[10];

for (int i=0; i<10; ++i)
    assert(size_t(&v)%CACH_LINE_SIZE == 0);[/cpp]
0 Kudos
RafSchietekat
Valued Contributor III
575 Views
"Contiguous allocation with padding has its own advantages. All frequently updated variables would be locked up inside one contiguous memory block. Once frequently written variables are isolated, there is no need to align read only variables."
I don't see the difference between contiguous and non-contiguous in this regard. And if you were to allocate the objects sequentially at the start of the program most of them would in fact be contiguous, with almost zero overhead, c/o TBB's scalable allocator.

The idea of such a container had crossed my mind, and compared to a std::vector with cache_aligned_allocator and padded elements it has the theoretical advantage of being able to adapt to a different cache line size at run time, but I've stated my reservations about large numbers of aligned objects above.
0 Kudos
Reply