I have encountered numerous code sections in a recent project that the compiler fails to vectorize the code. I've recently posted a message about this relating to AVX2 and multi-dimensional array (lack of) vectorization, but I did not make a suggestion for feature.
It is not unusual to have a multi-dimension array where the first dimension is a power of 2. Principally 4, 8 or 16. As such using expressions containing array(:,i,j) should be fully vectorizable.
I've noticed that...
When compiling without IPO, that compiler directives such as !dir$ vector aligned always are honored,
When compiling with IPO these directives are not honored. (IOW scalar code is generated)
IOW using IPO is counter-productive when you have code that explicitly directs vectorization.
this would make sense if the loop nest contains external subroutines or functions and those routines have not been explicitly '!$omp declare simd' so that vector versions are available. If you have an non-simd routine call IPO could inline it and decide not to vectorize the loop after the inline. What does the opt-report inline phase show?
w/o IPO the call can be done non-simd with the rest of the loop done vectorized (inside the vectorized loop call the routine with vector-length # calls each with a different loop index ). This is how loops with non-simd external calls can still allow the loop to vectorize. OR you can explicitly declare simd the routines, provide it in a module or interface so the compiler knows that call has a vectorized version to insert.
I suspect there is a function call in there. Tell me if I'm right or wrong.
I'd also try using !$omp simd instead of dir$ vector. but would require -qopenmp-simd option which is a bit of a nuisance.
In this particular case I have in a module routines that are vectorized (take vectors in,, perform vector operations, emit vector results)
This is not the issue.
The issue is, the caller, which itself, has loops that can be vectorized, or strait line code that can be vectorized, and in fact are vectorized when compiled without IPO, those sections of code, become scalar when IPO'd (regardless of the fact of vectorization of the "inlined" IPO code). IOW it appears as if the use of multi-file IPO causes the compiler to forget the alignment attributes of the containing code (code making call to subroutine/function IPO'd into containing code).