!dir$ vector aligned vs. !$omp simd aligned(....)

TimP · ‎12-27-2015

This question arose on a webinar recently. Looking into it, there is more to it than I recognized at first.

Besides evident differences such as the "legacy" Intel specific directive being applicable to array assignments as well as do loops, there is frequently a performance deficit for the omp simd. Looking at generated code, the biggest differences appear to be in more frequent use of AVX vmaskmov instructions for omp simd.

The OpenMP is a bit more trouble, as it involves checking opt-report to see if all the alignments are taken, besides not (to my knowledge) allowing for cases where arrays are aligned at elements other than the first.

Could anyone explain why the older directive (what is it called nowadays, surely not Cilk(tm) Plus?) should produce more efficient code when both are applicable? I would see a stronger argument for portable code if it were as efficient. It would also help with the idea of controlling all vectorization by omp simd directives, if anyone buys into that idea.

I suspect a similar situation arises between #pragma vector aligned and omp simd with C for loops; I'll check further if it's of interest. Some C compilers require one or the other pragma to be protected by ifdef or the like, considering them as erroneous syntax. Fortran is more forgiving.

gfortran doesn't appear to take advantage of directive, so I'm not surprised that it has the problem with use of vmaskmov (as it has to deal with more misalignments), as well at not always vectorizing cases where ifort needs directives.

I could post some examples or try to submit IPS if it's of interest.

Steven_L_Intel1 · ‎12-27-2015

Tim, an IPS case would be best in this instance.

TimP · ‎12-28-2015

IPS 6000147679

TimP · ‎12-28-2015

To add to the list of legitimate differences between vector aligned and omp simd aligned, the latter prevents a useful fusion in one of the examples, but I don't see that as accounting for much of the deficit.

The test suite also has a case where omp simd is used intentionally as a portable replacement for the Intel-specific !dir$ nofusion, to avoid a vector misalignment problem with fusion. The alternate solution of peeling and fusing loops explicitly so as to avoid vector misalignment also is demonstrated.