SIMD directive with -O2 / -O3 flags produces different results.

kang__myeongseok · ‎08-08-2018

While working on my code, I have found that using !DIR$ SIMD to vectorize an outer loop along with -O3 flag produces incorrect result.

To track down causes, I have made an example source code which reproduces the problem except that not only does this example produce incorrect result with -O3 flag, but also with -O2, which suggests that the problem is related to the vectorization process.

The loop part is shown below, and the full source file is attached.

subroutine loop_A
   use mod_A
   implicit none
   integer :: j,i, i_f,i_l, idx_1,idx_2

      do j = 1, nj
         i_f = NI(j-1) + 1
         i_l = NI(j)
!DIR$ SIMD
         do i = i_f, i_l

            idx_1 = i_stuff%i2idx(1,i)
            idx_2 = i_stuff%i2idx(2,i)

            a0(i) = &
              ( 1.d0 - d0(i) )*A(idx_1) + &
                       d0(i)  *A(idx_2) + &
              dot_product(e0(1:3,i), AA(1:3,idx_1))
         enddo ! i loop
      enddo ! j loop

! check if the result is correct
      write(*,*) 'sum of all vars :',sum(a0)
end subroutine loop_A

What I have is a nested loop where the second outer loop of the '10'th line above, is the one I want to vectorize.

Interestingly, I found that whenever the code works incorrectly, that is, with -O2 or -O3 with the example above and, -O3 with my original code, I see "Preprocess Loopnests: Moving Out Store" at the most outer loop. For instance, the vec-report from the example above says,

LOOP BEGIN at test_ver2.f90(96,15) inlined into test_ver2.f90(107,9)

remark #25084: Preprocess Loopnests: Moving Out Store [ test_ver2.f90(85,10) ]

which refers to the 7th line above.

To sum up, in my original code, when -O2 is turned on, the message "Preprocess Loopnests: Moving Out Store" does not show up and the code works fine, but when -O3 is turned on, the message shows up and the code produces wrong result. In the example case above, both -O2 and -O3 lead to the message "Preprocess Loopnests: Moving Out Store" in vec-report and incorrect result.

Any help to properly vectorize the second outer loop of the '10'th line will be deeply appreciated.

Juergen_R_R · ‎08-08-2018

So either you are editing the text of this thread every 1-5 minutes, or the notification system of Intel broke down.

kang__myeongseok · ‎08-08-2018

So sorry about that.. I did not know about it. I won't modify the thread again.

TimP · ‎08-09-2018

I don't see anything in your example as quoted now which might be expected to result in !dir$ simd breaking your code. In cases where your code depends on firstprivate, lastprivate, or reduction clauses, omitting those will break it frequently and inconsistently. I think this directive is deprecated, and was recommended only in the past prior to the availability of OpenMP 4, thus your bug report may not produce action. !$omp simd is safer, in part because it can't depend on firstprivate, in part because it should be maintained better. There is much more good advice on the web about OpenMP 4.x; unfortunately, much is specific to C.

The directives tell the compiler to vectorize without considering whether it is good for performance. If you believe that non-vectorization is due to the compiler assuming a shorter loop count (e.g. 300) than you want, you should read up on the LOOP COUNT directive. If you used a module procedure with module data, the compiler should optimize automatically for loop counts close to the declared array size.

kang__myeongseok · ‎08-09-2018

Thank you so much for your helpful reply Tim.

I have tried to use !$OMP SIMD with -qopenmp-simd option for ifort and regretfully, the result was the same as !DIR$ SIMD. This may suggest that the problem I'm having here is not related to firstprivate clause.

A way out of this trouble with -O3 flag that I have found out is to unroll any possible inner loops existing inside the target loop that I hope to vectorize.

For example, the 18th line is considered as an inner loop to the compiler compared to the 10th, which I do not understand why, and if I unroll dot-product manually, (I do not know any way to give !DIR$ UNROLL=3 to 10th line since it is not an explicit loop) the 10th line loop becomes the inner loop to be vectorized. If I enforce vectorization either by !DIR$ SIMD or !$OMP SIMD, it vectorize the 10th line without any problem up to -O3 flag.

This method still applies when there's an explicit do loop inside the 10th line loop, except that !DIR$ UNROLL=3 must not be used since it seems to me that as soon as the compiler sees that there is an inner loop inside the 10th line loop, it produces incorrect result again. So, what I did is that instead of using unroll directive, I unrolled the explicit loop manually and vectorization worked fine.

I honestly do not know why this problem occurs if I try to vectorize the 2nd loop of the certain type of imperfectly nested triple-nested loops.

The only way to get out of all this is simply to make it double-nested loop not letting the compiler know there's a possible inner loop inside the 2nd loop. Then either dir simd or omp simd works fine.

The attached file herein is the one I have tested out and you may use "ifort -O3 -qopenmp-simd -qopt-report=5 test_ver2.f90" command to compile it.

Any clue to explain why this problem happens in the first place, that would be a great help.