- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
I'm trying to modernize some of my codes, and face to a strange warning when compiling this simplified version of a pimpl implementation using unique_ptr.
The message is
icpc -xHOST pimpl.cpp pimpl.cpp(42): warning #2261: __assume expression with side effects discarded __assume_aligned(static_cast<const void*>(pimpl->data), ALIGN_SIZE); ^
and the code:
#include <memory> #define NCHANNELS 100 #define ALIGN_SIZE 64 // Main class definition (normally in header) class LT { public: LT(); ~LT(); void run(); private: struct pimpl_LT; std::unique_ptr<pimpl_LT> pimpl; }; // pimpl definition (normally in source file) struct LT::pimpl_LT { double *data; pimpl_LT(){ // Alloc data data = reinterpret_cast<double*>(_mm_malloc(NCHANNELS * sizeof(double), ALIGN_SIZE)); }; ~pimpl_LT() { // Free data _mm_free(data); } }; LT::LT() : pimpl(new pimpl_LT) {} LT::~LT() = default; void LT::run() { #pragma ivdep for (int i = 0; i < NCHANNELS; ++i) { __assume_aligned(static_cast<const void*>(pimpl->data), ALIGN_SIZE); pimpl->data = i; } } //peruse of LT in program int main() { LT lt; lt.run(); }
As long as I'm not using unique_ptr no error message. In my full code, I need the assume_aligned for vectorization to take place. I don't see why alignment should not be satisfied in unique_ptr elements or I'm missing some important point on the usage of __assume_aligned.
Any enlightenment would be appreciated.
Daniel
- Tags:
- CC++
- Development Tools
- Intel® C++ Compiler
- Intel® Parallel Studio XE
- Intel® System Studio
- Optimization
- Parallel Computing
- Vectorization
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
>> I was a bit afraid that adding a new variable would be consuming some resources
Just take a moment. Your original code had (ignoring the cast):
for (int i = 0; i < NCHANNELS; ++i) { pimpl->data = i; }
Ignoring the cast. Pseudo code (unoptimized)
for (int i = 0; i < NCHANNELS; ++i) { mov [pimpl] to register // [] == memory location of... mov [register + offset to data] to register convert i to double in temp add (i scaled to sizeof double) to register mov temp [register] }
Optimized:
mov [pimpl] to register // [] == memory location of... mov [register + offset to data] to register for (int i = 0; i < NCHANNELS; ++i) { convert i to double in temp mov temp [register+i*4] }
IOW the compiler will use a register in any event. Explicitly declaring the temporary variable *** within the narrow scope of the run() will permit the variable to exist only in register form.
Note, assume run is more complex
void LT::run() { ... some major code here for (int i = 0; i < NCHANNELS; ++i) { pimpl->data = i; } } void LT::run() { ... some major code here { // narrow the scope double*data = pimpl->data for (int i = 0; i < NCHANNELS; ++i) { data = i; } } }
What you do is to scope the temporary next to the for(... (as opposed to placing it at the top of the run() function.
Jim Dempsey
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Is your cast wrong?
__assume_aligned(static_cast<const void**>(pimpl->data), ALIGN_SIZE); // target with sizeof(intptr_t) or __assume_aligned(static_cast<const double*>(pimpl->data), ALIGN_SIZE);
As originally written
pimpl->data
is an lvalue of type void
The sizeof void has no size. How is this to be vectorized across an iteration space?
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Jim,
By construction (line 21 of original code), pimpl->data is double*.
I don't think casting is wrong... __assume_aligned is asking for a generic pointer (void*) as first argument. (Actually I could remove the cast, it won't change the problem).
As far as I understand, __assume_align is just telling the compiler that the pointer argument is located in memory at a multiple of ALIGN_SIZE (which is true because I'm using _mm_malloc accordingly). Then the compiler is able to vectorize the loop. This is what happens when the pimpl is not a unique_ptr but a regular pointer.
Daniel
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Would this work?
void LT::run() { double* data = pimpl->data; #pragma ivdep for (int i = 0; i < NCHANNELS; ++i) { __assume_aligned(static_cast<const double*>(data), ALIGN_SIZE); data = i; } }
Note, the generated (optimized) code should be the same. IOW pimpl->data would have been registerized.
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Yes it is working. I was a bit afraid that adding a new variable would be consuming some resources.
Actually this is a reduced size model of what I'm using, so would the optimized version always "be the same"?
By the way, doing the loop in the pimpl works also fine. It looks like the compiler has some trouble looking inside the unique_ptr structure.
eg:
struct LT::pimpl_LT { ... void run() { #pragma ivdep for (int i = 0; i < NCHANNELS; ++i) { __assume_aligned(static_cast<const double*>(data), ALIGN_SIZE); data = i; } } ... } .... void LT::run() { pimpl->run(); }
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
>> I was a bit afraid that adding a new variable would be consuming some resources
Just take a moment. Your original code had (ignoring the cast):
for (int i = 0; i < NCHANNELS; ++i) { pimpl->data = i; }
Ignoring the cast. Pseudo code (unoptimized)
for (int i = 0; i < NCHANNELS; ++i) { mov [pimpl] to register // [] == memory location of... mov [register + offset to data] to register convert i to double in temp add (i scaled to sizeof double) to register mov temp [register] }
Optimized:
mov [pimpl] to register // [] == memory location of... mov [register + offset to data] to register for (int i = 0; i < NCHANNELS; ++i) { convert i to double in temp mov temp [register+i*4] }
IOW the compiler will use a register in any event. Explicitly declaring the temporary variable *** within the narrow scope of the run() will permit the variable to exist only in register form.
Note, assume run is more complex
void LT::run() { ... some major code here for (int i = 0; i < NCHANNELS; ++i) { pimpl->data = i; } } void LT::run() { ... some major code here { // narrow the scope double*data = pimpl->data for (int i = 0; i < NCHANNELS; ++i) { data = i; } } }
What you do is to scope the temporary next to the for(... (as opposed to placing it at the top of the run() function.
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks Jim for the clarification. That saves me a lot.
Daniel

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page