Solved: Trouble understanding warning message

Daniel_H · ‎05-13-2019

Hello,

I'm trying to modernize some of my codes, and face to a strange warning when compiling this simplified version of a pimpl implementation using unique_ptr.

The message is

icpc -xHOST pimpl.cpp
pimpl.cpp(42): warning #2261: __assume expression with side effects discarded
      __assume_aligned(static_cast<const void*>(pimpl->data), ALIGN_SIZE);
                                                              ^

and the code:

#include <memory>

#define NCHANNELS 100
#define ALIGN_SIZE 64

// Main class definition (normally in header)
class LT {
public:
  LT();
  ~LT();
  
  void run();
  
private:
  struct pimpl_LT;
  std::unique_ptr<pimpl_LT> pimpl;
};

// pimpl definition (normally in source file)
struct LT::pimpl_LT {
  double *data;
  
  pimpl_LT(){
    // Alloc data
    data = reinterpret_cast<double*>(_mm_malloc(NCHANNELS * sizeof(double), ALIGN_SIZE));
  };
  
  ~pimpl_LT() {    
    // Free data
    _mm_free(data);
  }
};


LT::LT() : pimpl(new pimpl_LT) {}

LT::~LT() = default;

void LT::run() {
  #pragma ivdep
  for (int i = 0; i < NCHANNELS; ++i) {
    __assume_aligned(static_cast<const void*>(pimpl->data), ALIGN_SIZE);
    pimpl->data = i;
  }
}


//peruse of LT in program
int main() {
 LT lt;

 lt.run(); 
}

As long as I'm not using unique_ptr no error message. In my full code, I need the assume_aligned for vectorization to take place. I don't see why alignment should not be satisfied in unique_ptr elements or I'm missing some important point on the usage of __assume_aligned.

Any enlightenment would be appreciated.

Daniel

jimdempseyatthecove · ‎05-13-2019

>> I was a bit afraid that adding a new variable would be consuming some resources

Just take a moment. Your original code had (ignoring the cast):

  for (int i = 0; i < NCHANNELS; ++i) {
    pimpl->data = i;
  }

Ignoring the cast. Pseudo code (unoptimized)

  for (int i = 0; i < NCHANNELS; ++i) {
mov [pimpl] to register // [] == memory location of...
mov [register + offset to data] to register
convert i to double in temp
add (i scaled to sizeof double) to register
mov temp [register]
}

Optimized:

mov [pimpl] to register // [] == memory location of...
mov [register + offset to data] to register
  for (int i = 0; i < NCHANNELS; ++i) {
convert i to double in temp
mov temp [register+i*4]
}

IOW the compiler will use a register in any event. Explicitly declaring the temporary variable *** within the narrow scope of the run() will permit the variable to exist only in register form.

Note, assume run is more complex

void LT::run() {
... some major code here
  for (int i = 0; i < NCHANNELS; ++i) {
    pimpl->data = i;
  }
}

void LT::run() {
... some major code here
  { // narrow the scope
    double*data = pimpl->data
    for (int i = 0; i < NCHANNELS; ++i) {
      data = i;
    }
  }
}

What you do is to scope the temporary next to the for(... (as opposed to placing it at the top of the run() function.

Jim Dempsey

View solution in original post

jimdempseyatthecove · ‎05-13-2019

Is your cast wrong?

      __assume_aligned(static_cast<const void**>(pimpl->data), ALIGN_SIZE); // target with sizeof(intptr_t)
or
      __assume_aligned(static_cast<const double*>(pimpl->data), ALIGN_SIZE);

As originally written

pimpl->data

is an lvalue of type void

The sizeof void has no size. How is this to be vectorized across an iteration space?

Jim Dempsey

Daniel_H · ‎05-13-2019

Hi Jim,

By construction (line 21 of original code), pimpl->data is double*.

I don't think casting is wrong... __assume_aligned is asking for a generic pointer (void*) as first argument. (Actually I could remove the cast, it won't change the problem).

As far as I understand, __assume_align is just telling the compiler that the pointer argument is located in memory at a multiple of ALIGN_SIZE (which is true because I'm using _mm_malloc accordingly). Then the compiler is able to vectorize the loop. This is what happens when the pimpl is not a unique_ptr but a regular pointer.

Daniel

jimdempseyatthecove · ‎05-13-2019

Would this work?

void LT::run() {
  double* data = pimpl->data;
  #pragma ivdep
  for (int i = 0; i < NCHANNELS; ++i) {
    __assume_aligned(static_cast<const double*>(data), ALIGN_SIZE);
    data = i;
  }
}

Note, the generated (optimized) code should be the same. IOW pimpl->data would have been registerized.

Jim Dempsey

Daniel_H · ‎05-13-2019

Yes it is working. I was a bit afraid that adding a new variable would be consuming some resources.

Actually this is a reduced size model of what I'm using, so would the optimized version always "be the same"?

By the way, doing the loop in the pimpl works also fine. It looks like the compiler has some trouble looking inside the unique_ptr structure.

eg:

struct LT::pimpl_LT {

...

  void run() {
    #pragma ivdep
    for (int i = 0; i < NCHANNELS; ++i) {
      __assume_aligned(static_cast<const double*>(data), ALIGN_SIZE);
      data = i;
    }
  }

...
}

....


void LT::run() {
  pimpl->run();
}

jimdempseyatthecove · ‎05-13-2019

>> I was a bit afraid that adding a new variable would be consuming some resources

Just take a moment. Your original code had (ignoring the cast):

  for (int i = 0; i < NCHANNELS; ++i) {
    pimpl->data = i;
  }

Ignoring the cast. Pseudo code (unoptimized)

  for (int i = 0; i < NCHANNELS; ++i) {
mov [pimpl] to register // [] == memory location of...
mov [register + offset to data] to register
convert i to double in temp
add (i scaled to sizeof double) to register
mov temp [register]
}

Optimized:

mov [pimpl] to register // [] == memory location of...
mov [register + offset to data] to register
  for (int i = 0; i < NCHANNELS; ++i) {
convert i to double in temp
mov temp [register+i*4]
}

IOW the compiler will use a register in any event. Explicitly declaring the temporary variable *** within the narrow scope of the run() will permit the variable to exist only in register form.

Note, assume run is more complex

void LT::run() {
... some major code here
  for (int i = 0; i < NCHANNELS; ++i) {
    pimpl->data = i;
  }
}

void LT::run() {
... some major code here
  { // narrow the scope
    double*data = pimpl->data
    for (int i = 0; i < NCHANNELS; ++i) {
      data = i;
    }
  }
}

What you do is to scope the temporary next to the for(... (as opposed to placing it at the top of the run() function.

Jim Dempsey

Daniel_H · ‎05-13-2019

Thanks Jim for the clarification. That saves me a lot.

Daniel