- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The reference says it is typically 128 bytes. Is there a way to detect it at compiler time or run time?
Thanks in advance.
Thanks in advance.
Link Copied
7 Replies
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Not officially, I think. It might help if you would state your purpose, because there are several issues related to this that may be relevant or not.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I need to build an array (a pool) of objects of some class. The class alignment needs to be at least as strict as the cache line alignment to avoid false sharing, but it can't be stricter than the array alignment.
Btw, what is the smallest alignment to avoid false sharing on Core architecture - 64 or 128 bytes?
Btw, what is the smallest alignment to avoid false sharing on Core architecture - 64 or 128 bytes?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
TBB is really a bit schizophrenic about that, which is somewhat confusing me at the moment, and the value may depend on the actual hardware, so you should probably do your own research instead of looking to TBB for answers.
But are you certain that you need to avoid false sharing in an array? Unless you're certain that you've already got good parallelism with low scheduling overhead and no memory bandwidth issues, this may well be just premature optimisation. If the pool is small, you might as well allocate the members individually or always go with 128 for an array, and if the pool is large, what will be the real degree of false sharing going on anyway? Do you have any numbers to substantiate that the alignment matters on your development system?
But are you certain that you need to avoid false sharing in an array? Unless you're certain that you've already got good parallelism with low scheduling overhead and no memory bandwidth issues, this may well be just premature optimisation. If the pool is small, you might as well allocate the members individually or always go with 128 for an array, and if the pool is large, what will be the real degree of false sharing going on anyway? Do you have any numbers to substantiate that the alignment matters on your development system?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I have no numbers, but the false sharing is a real issue. The objects are random bit generator used in Monte-Carlo program.
I would probably go with individual allocation. That way I don't need to bother imposing alignment restrictions on class itself, since the objects aren't stored contiguously.
I would probably go with individual allocation. That way I don't need to bother imposing alignment restrictions on class itself, since the objects aren't stored contiguously.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
For a relatively small number of intensively used objects, simply not worrying about potentially wasting 64 or so bytes per object seems to be the obvious solution, and with individual allocations you also don't need to bother with explicit padding.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Contiguous allocation with padding has its own advantages. All frequently updated variables would be locked up inside one contiguous memory block. Once frequently written variables are isolated, there is no need to align read only variables.
Btw, using STL vector with cache_aligned_allocator doesn't prevent false sharing of vector's elements. One could try to align the contained class itself:
Structures with aligned members such as SSE primitives __m128 don't work with STL containers too.
Anyway, imposing alignment requirement on the class is a draconian solution bound to fail after hardware or library update.
A better solution (maybe new TBB feature?) would be a vector like container with automatic padding:
Btw, using STL vector with cache_aligned_allocator doesn't prevent false sharing of vector's elements. One could try to align the contained class itself:
[cpp]struct __declspec(align(CACHE_LINE_SIZE)) MyStruct; // this won't compile std::vectorx; // this won't compile too std::vector > y;[/cpp]
Structures with aligned members such as SSE primitives __m128 don't work with STL containers too.
Anyway, imposing alignment requirement on the class is a draconian solution bound to fail after hardware or library update.
A better solution (maybe new TBB feature?) would be a vector like container with automatic padding:
[cpp]// no need to align struct RNG; padded_vectorv[10]; for (int i=0; i<10; ++i) assert(size_t(&v)%CACH_LINE_SIZE == 0);[/cpp]
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
"Contiguous allocation with padding has its own advantages. All
frequently updated variables would be locked up inside one contiguous
memory block. Once frequently written variables are isolated, there is
no need to align read only variables."
I don't see the difference between contiguous and non-contiguous in this regard. And if you were to allocate the objects sequentially at the start of the program most of them would in fact be contiguous, with almost zero overhead, c/o TBB's scalable allocator.
The idea of such a container had crossed my mind, and compared to a std::vector with cache_aligned_allocator and padded elements it has the theoretical advantage of being able to adapt to a different cache line size at run time, but I've stated my reservations about large numbers of aligned objects above.
I don't see the difference between contiguous and non-contiguous in this regard. And if you were to allocate the objects sequentially at the start of the program most of them would in fact be contiguous, with almost zero overhead, c/o TBB's scalable allocator.
The idea of such a container had crossed my mind, and compared to a std::vector with cache_aligned_allocator and padded elements it has the theoretical advantage of being able to adapt to a different cache line size at run time, but I've stated my reservations about large numbers of aligned objects above.

Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page