OpenMP SIMD produce wrong result (ifx bug??)

hakostra1 · ‎04-18-2024

I have found another possible bug in ifx.

Steps to reproduce:

Put the following in simdfunctions_mod.F90:

MODULE simdfunctions_mod
    IMPLICIT NONE (type, external)
    PRIVATE

    PUBLIC :: divide0
CONTAINS
    PURE ELEMENTAL REAL FUNCTION divide0(a, b) RESULT(res)
        !$omp declare simd(divide0)
        REAL, INTENT(in) :: a, b

        IF (b == 0.0) THEN
            res = 0.0
        ELSE
            res = a/b
        END IF
    END FUNCTION divide0
END MODULE simdfunctions_mod

And put the following in efvisc.F90:

MODULE efvisc_mod
    USE simdfunctions_mod

    IMPLICIT NONE (type, external)
    PRIVATE

    PUBLIC :: efvisc

CONTAINS
    SUBROUTINE efvisc(kk, jj, ii, bu, graduz)
        ! Subroutine arguments
        INTEGER, INTENT(in) :: kk, jj, ii
        REAL, INTENT(in) :: bu(kk, jj, ii)
        REAL, INTENT(in) :: graduz(kk, jj, ii)

        ! Local variables
        INTEGER :: k, j, i
        REAL :: dudz(kk-4)
        REAL :: dd1, dd2, dd3, dd4, ddd

        DO i = 3, ii-2
            DO j = 3, jj-2
                !$omp simd private(dd1, dd2, dd3, dd4, ddd)
                DO k = 3, kk-2
                    ! dU/dZ
                    dd1 = bu(k+1, j, i)*bu(k, j, i)
                    dd2 = bu(k, j, i)*bu(k-1, j, i)
                    dd3 = bu(k+1, j, i-1)*bu(k, j, i-1)
                    dd4 = bu(k, j, i-1)*bu(k-1, j, i-1)
                    ddd = dd1 + dd2 + dd3 + dd4
                    dudz(k-2) = divide0(graduz(k, j, i)*dd1 &
                        + graduz(k-1, j, i)*dd2 &
                        + graduz(k, j, i-1)*dd3 &
                        + graduz(k-1, j, i-1)*dd4, ddd)
                END DO

                IF (i == 18 .AND. j == 18) THEN
                    k = 33
                    WRITE(*, *) "dudz = ", dudz(k-2)
                END IF
            END DO
        END DO
    END SUBROUTINE efvisc
END MODULE efvisc_mod


PROGRAM main
    USE efvisc_mod
    IMPLICIT NONE (type, external)

    INTEGER, PARAMETER :: ii = 36, jj = 36, kk = 36

    REAL :: bu(kk, jj, ii)
    REAL :: graduz(kk, jj, ii)

    ! Initialize the RNG to give the same (compiler-dependent) random
    ! sequence every time
    CALL RANDOM_INIT(.TRUE., .FALSE.)

    ! Initialize variables
    bu = 1.0
    CALL RANDOM_NUMBER(graduz)

    CALL efvisc(kk, jj, ii, bu, graduz)
END PROGRAM main

It is essential that these two pieces of code go in different files, otherwise the bug won't show up. Now compile with ifx, using -O0 optimization level. The output of this is the reference (which is assumed to be correct):

$ ifx -O0 simdfunctions_mod.F90 efvisc2.F90 && ./a.out
 dudz =   0.4440139

Compile over again with -O1 or higher and the result changes completely. This is wrong:

$ ifx -O1 simdfunctions_mod.F90 efvisc2.F90 && ./a.out
 dudz =   0.5154935

Please note that the graduz array is filled with random numbers, and the results on your computer/architecture might differ. The key point is that it should produce the same result with both -O0 and -O1 (and other optimization levels).

Analysis:

The critical source file is efvisc.F90, when this is compiled with -O0 everything works and when this is compiled with -O1 or higher it does not work. If you compile simdfunctions_mod.F90 separately with for instance -O3 it will still works as long as efvisc.F90 is compiled with -O0.

Compiling efvisc.F90 with -qno-openmp-simd will also solve the problem and produce the correct result with all optimization levels (independent on how simdfunctions_mod.F90 is compiled).

ifort produce the expected output in all cases (ifort does not enable openmp-simd until -O2, this differ from ifx that enable it at -O1):

$ ifort -O3 -diag-disable=10448 simdfunctions_mod.F90 efvisc2.F90 && ./a.out
 dudz =   0.4440139

Since the program rely on a random (but seeded in a predictable way) generated field, the results with different compilers will vary, but trying with gfortran I cannot find any errors in the results and my program execute just fine, the results are the same independent on openmp-simd and optimization level.

Workaround:

Change the efvisc-subroutine to:

    SUBROUTINE efvisc(kk, jj, ii, bu, graduz)
        ! Subroutine arguments
        INTEGER, INTENT(in) :: kk, jj, ii
        REAL, INTENT(in) :: bu(kk, jj, ii)
        REAL, INTENT(in) :: graduz(kk, jj, ii)

        ! Local variables
        INTEGER :: k, j, i
        REAL :: dudz(kk-4)
        REAL :: dd1, dd2, dd3, dd4, dsum, ddd

        DO i = 3, ii-2
            DO j = 3, jj-2
                !$omp simd private(dd1, dd2, dd3, dd4, dsum, ddd)
                DO k = 3, kk-2
                    ! dU/dZ
                    dd1 = bu(k+1, j, i)*bu(k, j, i)
                    dd2 = bu(k, j, i)*bu(k-1, j, i)
                    dd3 = bu(k+1, j, i-1)*bu(k, j, i-1)
                    dd4 = bu(k, j, i-1)*bu(k-1, j, i-1)

                    dsum = graduz(k, j, i)*dd1 &
                        + graduz(k-1, j, i)*dd2 &
                        + graduz(k, j, i-1)*dd3 &
                        + graduz(k-1, j, i-1)*dd4
                    ddd = dd1 + dd2 + dd3 + dd4
                    
                    dudz(k-2) = divide0(dsum, ddd)
                END DO

                IF (i == 18 .AND. j == 18) THEN
                    k = 33
                    WRITE(*, *) "dudz = ", dudz(k-2)
                END IF
            END DO
        END DO
    END SUBROUTINE efvisc

The difference is that the new variable dsum (which is also declared private in the simd construct) contains the expression that was previously in the function call to divide0. This seems to solve the problem.

ifx version:

$ ifx --version
ifx (IFX) 2024.1.0 20240308
Copyright (C) 1985-2024 Intel Corporation. All rights reserved.

Final remark:

I'm always careful to conclude that something is a compiler bug, the Fortran programming language and associated OpenMP specification has way too many caveats and pitfalls for me to know them all. But the behavior here seems fishy and I cannot see any obvious mistakes in my code.

I really appreciate all comments, feedback and help!

Barbara_P_Intel · ‎04-22-2024

Thank you for the nice triage of this issue! Usually when there's a runtime failure at -O1 and above, but works ok at -O0 it's an issue with the compiler.

I filed a bug report, CMPLRLLVM-57890. We'll let you know when it's fixed.

hakostra1 · ‎04-23-2024

Thanks, looking forward to the fix.

Devorah_H_Intel · ‎04-03-2025

This issue was fixed in ifx 2025.1

Get Intel® oneAPI HPC Toolkit