IFX flags allowing fast math, but keeping floating point exceptions

hakostra1 · ‎01-10-2023

I have a CFD code. When I compile it with GFortran, I usually use some flags "-O3 -march=x86-64-v2 -ffpe-trap=invalid,zero,overflow -g -fbacktrace". Floating point trapping is extremely useful when debugging problems, and this works great. If there is a divide by zero or usage of Inf or NaN, the program stops and let me know where the problem is.

For those not familiar with GFortran, it does by default set "-ffp-contract=fast", this allows optimizations that violate floating point semantics, such as FMA. However, by default is does not set "-ffast-math", and it does not generate any instructions that break floating point exception trapping.

Performance matters very much to me, and I do not care if the compiler choose more efficient and less precise math function implementations, evaluate expressions in different orders and other optimizations that changes the last decimals of my answers. But I still want the floating point exception traps to work!

Now how to realize this is IFX?

I tried to read up on the IFX documentation for -fp-model. It is clear that it sets "-fp-model=fast" by default. If i turn on "-fpe0" I get lots of false positive floating point errors, so my program does not work. I tried "-fp-model=strict", which gives me a huge performance penalty. I believe the "strict" floating point mode is way stricter than it needs to be. As previously mentioned, I am fine with additional round-off errors, re-ordering, fma, etc.

Can anyone give me any hints towards flags that can give best possible performance, yet still not generate invalid floating point numbers in IFX?

jimdempseyatthecove · ‎01-10-2023

Until you get an answer and/or fix....

Object files are interoperable between ifort and ifx.

First possible (interim) work around: Compile the main (PROGRAM procedure) with ifort using the -ffpe... option, and compile the remainder using ifx. This is under a presumption that the FP error state is set once at program initialization.

Second possible work around, if FP error state is set (reset) elsewhere, thus negating first work around, compile potential problem-some files using ifort and the remainder with ifx.

Third possible work around:

Using ifx alone, make use of IEEE_EXCEPTIONS and to manipulate the exception conditions.

Jim Dempsey

hakostra1 · ‎01-10-2023

The problem is not to set "-fpe0" in IFX, the problem is that the code IFX generate has false positives and triggers errors in places where it should not be errors. However, I believe this is in the "spirit" of the "-fp-mode=fast".

Consider the following example:

DO i = 1, icells
        nxi = area(1, i)
        nyi = area(2, i)
        nzi = area(3, i)

        nn = SQRT(nxi**2 + nyi**2 + nzi**2)
        
        IF (nn > TINY(1.0)) THEN
            nvecs(1, i) = nxi/nn
            nvecs(2, i) = nyi/nn
            nvecs(3, i) = nzi/nn
        ELSE
            nvecs(1, i) = 0.0
            nvecs(2, i) = 0.0
            nvecs(3, i) = 0.0
        END IF
    END DO

with "-O3 -xSSE4.2" and nothing else the square root is compiled into the "rsqrtps" instruction, which in one instruction compute the reciprocal of the square root 1/sqrt(x) and that saves three divisions later. This is of course a very good optimization and exactly the things I expect "-fp-model=fast" is doing.

The problem occur when area, and then nxi, nyi, nzi are zero, then the argument to the square root function is zero. This is perfectly valid, square root of zero is zero. The code that follows is also perfectly valid, it takes into account that the zero might appear and skips the divide by zero in that case. So the code is OK.

However, since ifx aggressively compile the square root into "rsqrtps", i.e. the reciprocal, we have a 1/0, the result of the rsqrtps is Inf and the resulting operation raises a flating point error, which I think is also perfectly valid, the result is actually Inf...

jimdempseyatthecove · ‎01-10-2023

Consider:

DO i = 1, icells
        nxi = area(1, i)
        nyi = area(2, i)
        nzi = area(3, i)

        nn = nxi**2 + nyi**2 + nzi**2
        
        IF (nn > 0.0) then
            nn = SQRT(nn) ! note sqrt(TINY(1.0)) ~= 1.0842022E-19
        ELSE
            nn = HUGE(1.0) ! force n?i / nn to 0.0
        ENDIF
        nvecs(1, i) = nxi/nn
        nvecs(2, i) = nyi/nn
        nvecs(3, i) = nzi/nn
    END DO

I suspect the above (untested) code will not generate the rsqrtps (but you should check).

Jim Dempsey

jimdempseyatthecove · ‎01-10-2023

By the way, instead of producing a unit vector of [0.0, 0.0, 0.0] consider if your results might be better served by producing a random unit vector. For example, a collision of particles will rebound in some arbitrary direction (conserving momentum) as opposed to collecting into a point location (not conserving momentum).

Jim Dempsey