Then !$OMP FLUSH(t2) - Page 2

Ryan_S_3 · ‎04-06-2017

We recently upgraded to ifort version 17.0.2, and a piece of code that had worked for decades suddenly broke. I traced the problem to a call to FLOOR() with a REAL*4, and was surprised at what I found.

FLOOR(), by definition, should return the greatest integer less than or equal to its argument (http://software.intel.com/en-us/node/679278). Thus, (x-FLOOR(x)) should never be negative. But I have an example that violates this. Here is a very simple subroutine:

SUBROUTINE floortst(x,n)
  REAL*4 :: x, y
  INTEGER*4 :: n

  y = x*n - FLOOR(x*n)
  IF (y < 0.e0) THEN
     PRINT '("v-FLOOR(v) is negative!")'
  ELSE IF (y > 1.e0) THEN
     PRINT '("v-FLOOR(v) is >1!")'
  ELSE
     PRINT '("v-FLOOR(v) lies within [0,1]")'
  END IF
  RETURN

END SUBROUTINE floortst

It should always print "v-FLOOR(v) lies within [0,1]". However, when I compile this code with "-O1 -xAVX2" using ifort 17.0.2, and call the routine with arguments n=400000 and x=0.3496874869e0, it prints "v-FLOOR(v) is negative!". I get the expected behavior if I remove -xAVX2 or change -O1 to -O0, so the issue appears to be an AVX2-specific optimization. (I am running the code on an i7-7700.)

To be clear, I understand that the optimizer can make some modifications that impact floating point accuracy, but an optimization that causes x-FLOOR(x) to be negative seems spectacularly unsafe. Am I missing something, or is this really the expected behavior?

By the way, the above code works as expected for all versions of ifort and GCC that I have access to, except for ifort 17.0.2.

jimdempseyatthecove · ‎04-14-2017

Martyn,

Thanks. does that produce a CALL, or is the compiler smart enough to "inline" that into either a) an MFENCE + compiler memory barrier, or b) compiler memory barrier alone. This may work too:

t1 = x*N
FLUSH(t1)
t2 = FLOOR(t1)
y = t1 - t2

or

t1 = x*N
t2 = FLOOR(t1)
FLUSH(t2)
y = t1 - t2

Jim

Steve_Lionel · ‎04-14-2017

It doesn't produce a call - it is effectively the same as the C intrinsics and is interpreted by the code generator. This is an undocumented feature that works with most of the "instruction intrinsics" that take normal register arguments - not those that require AVX or MMX registers. The trick is figuring out what name to use in the NAME=.

jimdempseyatthecove · ‎04-15-2017

What about using the FLUSH as an equivalent to a C compiler memory barrier?

Jim Dempsey

Steve_Lionel · ‎04-15-2017

You mean the Fortran FLUSH statement? That affects I/O only.

jimdempseyatthecove · ‎04-17-2017

Then !$OMP FLUSH(t2)

(requires OpenMP compiler enables...)

The point being to stay within the realm of Fortran.

Jim Dempsey

Martyn_C_Intel · ‎04-17-2017

You'd need to test that this works outside a threaded context, but the documentation implies that you'd need to give t2 the SAVE attribute. This method seems a bit heavy unless you are already compiling with OpenMP.

jimdempseyatthecove · ‎04-18-2017

>> but the documentation implies that you'd need to give t2 the SAVE attribute.

To be safe, it would also have to apply to any SHARED variable as well (SHARED is default). The !$OMP FLUSH(var) should generate no code other than assuring that the specified variable(s) are (would be) visible to other threads should there be other threads which there need not be any other threads. The concept is to gently coerce the compiler to produce the stored variant (complete with rounding) value for use in subsequent instructions.

If Ryan's affected code is in a small subroutine/function he might consider !DIR$ NOOPTIMIZE-ing the program unit.

Jim Dempsey

Ryan_S_3 · ‎04-19-2017

Thank you, everyone, for contributing to such an interesting discussion. The fact that the discussion has become lengthy, and now touches on some mildly-obscure compiler and language features, makes me wonder if I am perhaps asking the wrong question.

I would like Intel's answer to this: I have two REAL*4 variables, x and y. I know they both satisfy 0 < x,y < 500. Let z be their REAL*4 product, z=x*y. I need to decompose z into two parts: an INTEGER*4 part iz, and a fractional REAL*4 part fz. It is essential that fz satisfy 0 <= fz < 1. What is the correct way to do this? For decades, I have done the obvious thing:

z=x*y
iz=FLOOR(z)
fz=z-iz

But this code no longer works with ifort, as demonstrated in this thread. Mr. Corden says that ifort is working as intended, and that my code is incorrect. What I am trying to do is perfectly well defined -- surely there is a portable and reliable way to do this in standard Fortran, without resorting to OMP, memfences, compiler flags that badly reduce performance, and the like: what is it? And if the answer is that no such method exists, is Intel sure that this particular optimization is really such a good idea?

Martyn_C_Intel · ‎04-19-2017

I'm not saying that your code is incorrect, so much as that it is subject to the finite precision of floating-point operations.

To keep fz within your bounds, I would do the most simple-minded thing:

z=x*y 
iz=FLOOR(z) 
fz=z-iz 
fz=min(max(fz,0.0),0.99999995)

assuming single precision and that a value of 1.0 is not acceptable.

There are lots of occasions in floating-point code where this sort of protection against rounding effects is a good idea.

Surprising rounding behavior with FLOOR() with code optimized for AVX2