Wrong results with -O2 and OpenMP SIMD SIMDLEN(length) clause

themos · ‎07-05-2022

Hello,

I see wrong results for most SIMDLEN lengths, at optimization level -O2 and above.

Regards

Themos Tsikas

Ron_Green · ‎07-08-2022

@themos I've analyzed this extensively. And modified the testcase for simplicity and debug.

The result: IFORT requires users to declare reduction variables with REDUCTION. S(1:VL) is a reduction variable and needs to be declared as such. This has always been true for our compiler. Now one could argue, as I have, that IFORT "should" be smart enough to recognize the reduction and magically declare S as such. But it doesn't. SO if you add a REDUCTION(+:s) clause your code will work. Here is my reproducer

Module kiki
    Integer, Parameter ::wp=kind(1.0d0), wip=kind(1)
    Integer, Parameter ::qp=selected_real_kind(R=Range(1.0_wp)+1)

contains
Function sumx128(x,n) Result(suma)
    implicit none
    Integer, Parameter :: VL = 128
    Integer(wip),Intent(in) :: n
    Real(wp),Intent(in) :: x(n)
    Real(wp) :: suma,s(1:VL)
    Integer(wip) :: i
    real(wp) :: partialsum

    s = 0
    If (n>=VL) Then
      !$OMP SIMD SIMDLEN(VL) REDUCTION(+:s)
      do i=1,n-VL+1,VL
        s(1:VL) = s(1:VL)+x(i:i+VL-1)
      End Do
    Else
      i=1
    Endif
  
    partialsum = sum(s) 
    if (partialsum /= 1024.0_wp) then
      print*, "OMP SIMD ERROR!"
      print*, "ERROR: Last value of i ", i, "  value of N ", n 
      print*, "ERROR: partial sum before remainder summation ", sum(s)
      print*, "ERROR: parital sum at this point should be 1024.0"
    end if

    !...compute sum of remaining elements < VL
    If (i<=n) Then
        s(1:n+1-i) = s(1:n+1-i) + x(i:n)
    End If  
    suma = sum(s)
End Function
end module kiki

Program test
  Use kiki
  implicit none
  Real(wp) , Allocatable:: x(:)
  Integer, Parameter :: N=2065
  Integer :: i
  real(wp) :: rez

  Allocate(x(N), SOURCE=[(Real(i,wp)*(-1)**i,i=1,N)])
  Print *, "size of array ",  N
  rez = sumx128(x,N)
  if (rez /= SUM(x) ) then
    print *, "ERROR: !! these should match !!"
    print *, "ERROR:   user summation of X ", rez 
    print *, "CORRECT: intrinsic  SUM of X ", SUM(x)
  else
    print *, "CORRECT: these should match "
    print *, "CORRECT: user summation of X ", rez
    print *, "CORRECT: intrinsic  SUM of X ", SUM(x)
  end if

End

So if you add the REDUCTION(+:s) this fixes the error. THis makes IFORT give correct results.

However, IFX gets wrong results for this AND seg faults at program exit. So I entered a IFX bug ID is CMPLRLLVM-38873

Theodore_T_1 · ‎07-11-2022

Thanks for the workaround.

I guess I don't understand why $!OMP SIMD cannot be applied to the array assignment itself (the "s(1:VL) = s(1:VL)+x(i:i+VL-1)" statement), but I had to insert it before the DO construct. I think we can blame the OpenMP specs for that.

Also, I am not convinced that S is a reduction variable. I am specifying exactly which elements of X are to be summed to exactly which elements of S. As far as the !$OMP SIMD directive is concerned, I may not even be doing the "partialsum=sum(s)" statement.

Regards

Themos Tsikas

Ron_Green · ‎07-11-2022

Our OMP Standards rep has been looking at the case of applying OMP directives to array syntax, as has the OMP committee. I am not sure why the committee has not allowed this. I have wondered myself. Let me ask and see if there is a valid technical reason for allowing OMP directives on array syntax.

Not convinced of S needing to be a Reduction var? Consider the case of VL=1. Then the expression reduces to the VERY ROUGH equivalent of:

S = S + array( index )

S is an array in our example, but you get the idea I hope.

You'd agree in this case that S is a reduction variable. No question about this syntax. Reduction variables can be arrays. S in this example is indeed a reduction variable, it just happens to be an array.

themos · ‎07-12-2022

Hello,

I was still thinking that !$OMP SIMD SIMDLEN(vl) would be applying to the array assignment statement (where no element of S is a reduction variable). But it applies to the loop, so S(1), say, is updated in each iteration and that makes it a reduction variable. I think that is the right way to think about it. Ultimately, I should not be using !$OMP SIMD there, but the compiler should either reject the code or produce correct executable code (and not skip some iterations that would have been performed if the OpenMP directive was disregarded).

Regards

Themos Tsikas

jimdempseyatthecove · ‎07-12-2022

>> "s(1:VL) = s(1:VL)+x(i:i+VL-1)"

Try using

!dir$ vector always

s(1:VL) = s(1:VL)+x(i:i+VL-1)

You may also see some benefit with (insert at declaration of s(1:VL)

!dir$ attributes align : 64 : s

And if x is known to be aligned (insert at statement in question)

!dir$ assume_aligned X : 64 ! or what ever is the byte alignment for x

Note, the !dir$ ... directives are not portable, this is the motivation behind the push to put these into OpenMP, which imho is misplaced. Instead, there should be a different standard adopted, perhaps call it OpenVP (Open Vector Programming) because, vector programming is distinctly different from mult-process(or) programming.

Jim Dempsey

jimdempseyatthecove · ‎07-09-2022

>> IFORT requires users to declare reduction variables with REDUCTION. S(1:VL) is a reduction variable and needs to be declared as such.

vsv

>>

!$OMP SIMD SIMDLEN(VL) REDUCTION(+:s)

Ron,

The code example should not require the !$OMP SIMD to facillitate the use of SIMD instructions for that loop. By examining the loop, the compiler should be able to (it used to) SIMDize and temporary registerize the partial sum.

Also, the sample code does not make use of OpenMP, and very well should be capable of being compiled without -qopenmp/Qopenmp, and in which case the !$OMP SIMD... would be ignored.

Jim Dempsey

Ron_Green · ‎07-11-2022

Jim: "the compiler should be able to (it used to) SIMDize and temporary registerize the partial sum."

I tend to have that view myself. But in the past when I say "SHOULD" about a compiler, the developers come up with some corner case where SHOULD would created incorrect code. Now then, why does gfortran not require the REDUCTION clause?? Hmmm, another thing to take back to our Standards rep. This is a great little example that raises 2 solid questions about the OMP Standard and our implementation.

Ron

Ron_Green · ‎07-12-2022

Jim,

I mostly agree with you. I was not in favor of this change to make openmp-simd on by default at O1 and above. Like you, if I WANT openmp to be enabled, I'll use a -qopenmp* option! That fits with my view of compiler options and behaviors: if you want a behavior out of the compiler you should explicitly ask for it. This is why I find the -fast option so vulgar: it's a collection of a bunch of junk that can and does change over time depending on what makes the latest processor family run fast. I've said many times, it's like a sausage: you don't know exactly what is inside it and some of these things can be really bad for you.

Now those of us "with some years of Fortran experience" THOUGHT we knew what -O1, -O2, -O3 gave us, granted that these are also macro collections of behaviors (options), now have to rethink of what these mean! Yes, I'm not a fan of this. And to add to this argument: OpenMP is NOT part of the Fortran Language. I can understand if -O options change how the compiler optimizes my Fortran code and maybe changes expression evaluation order, order of reductions, etc. BUT I do NOT expect an optimization option to PULL IN extensions to the core Language like OpenMP. To me, I've always thought of OpenMP as "Mostly Harmless". I can add the OpenMP directives/clauses to my code and by DEFAULT they are harmless. But when I enable them SOMEHOW then they change behavior of my program in well understood ways. The SOMEHOW that enables this mostly harmless directives has traditionally been using -qopenmp* (or -fopenmp* or -whatever). That is to say, options with "openmp" in the name of the option. But now SOMEHOW is not just one family of options but is creeping into the -O options, rendering them sausage options like -fast! Ack! Very unsavory, in my humble option.

I'll escalate this. I'm sure it's coming from the performance and benchmarking team on the C++ side of the house. We used to have proprietary #pragma SIMD and !dir$ SIMD with the Classic compilers. The new LLVM compilers have dropped the Intel proprietary SIMD pragma/directive and told users to use OMP directives instead. In this mindset, the old SIMD directives WERE enabled by default at O1 and above. I'm sure some wise person decided that therefore OMP SIMD should likewise be recognized by the upper -O options. Because surely if they put in SIMD options they'd want those pulled in at O1 and above. Actually, the likely decision was some argument along the lines of: Intel processors usually can get better performance with vectorization, hence we'll do whatever we can to vectorize your code as aggressively as possible.

I'll let you know the outcome.

Ron

Barbara_P_Intel · ‎07-11-2022

-qopenmp-simd (/Qopenmp-simd) is automatically enabled with ifort -O2 and ifx -O1 according to the Fortran DGR.

To disable the OpenMP SIMD directives at higher opt levels use -qno-openmp-simd (/Qopenmp-simd-).

jimdempseyatthecove · ‎07-12-2022

Ron,

I think this/these actions were the result of the purge (forced retirement) of the old stalwarts of the design teams&committees. This left the noobs to have more weigh-in in getting a hack introduced. IMHO

#pragma simd ...

!dir$ simd ...

Would have been the right way of handling this.

Note, this is not to say OpenMP SIMD directives are not needed. You will, at times, need to specify where iteration space slicing is to occur in order to take full advantage of the SIMD instructions across thread workspace. To make a chimera out of the directive is like a cross-breed pooch.

Jim Dempsey

Ron_Green · ‎01-13-2023

This is fixed in IFX v2023.0. Note that you do not need the reduction clause in IFX.

Wrong results with -O2 and OpenMP SIMD SIMDLEN(length) clause

Compile Error

OpenMP