Wrong result with ifx -O2 -qopenmp -xHost and static arrays

m_vdb · ‎04-05-2024

Hi! I've observed some odd behavior with the latest releases of the Intel Fortran Compiler (ifx).

Minimal working example #1

program test_program
implicit none


integer, parameter :: dp = kind(0.d0)                                           
integer, dimension(2) :: dims

dims = [10, 2]
call test_procedure(dims)


contains


subroutine test_procedure(dims)
    integer, dimension(2), intent(in) :: dims

    integer :: i
    real(dp) :: tot
    real(dp), dimension(dims(1), dims(2)) :: mat                                
#ifdef STATIC
    real(dp), dimension(dims(1)) :: tmp
#else
    real(dp), dimension(:), allocatable :: tmp
    allocate(tmp(dims(1)))
#endif

    do i = 1, dims(2)
        mat(:, i) = real(i-1, kind=dp)
    end do

    tot = 0._dp

    !$omp parallel do default(none), private(tmp), shared(mat, dims), &
    !$omp&            reduction(+:tot)
    do i = 1, dims(2)
        tmp = mat(:, i)
        tot = tot + sum(tmp)
    end do
    !$omp end parallel do

    print *, 'tot:', tot, 'check:', sum(mat)
end subroutine test_procedure

end program test_program

This simply calculates a matrix sum, possibly with OpenMP parallelization. In this example the sum is expected to be equal to 10, but in the following scenario a wrong value is produced with the latest ifx release (2024.1.0):

~$ ifx -O2 -xHost -qopenmp -DSTATIC test.F90
~$ OMP_NUM_THREADS=1 ./a.out
tot: 20.0000000000000 check: 10.0000000000000

The correct result is recovered when either:

* lowering the optimization to e.g. O1

* removing the -xHost flag

* removing the -qopenmp flag

* removing the -DSTATIC directive

* not using the 'tmp' array (obviously)

* using two OpenMP threads instead of one

* using ifort instead of ifx 2024.1.0

* using an older ifx version (i.e. 2024.0.0 and earlier)

Minimal working example #2

A related example, however, suggests that also some older ifx releases are affected:

program test_program
implicit none


integer, parameter :: dp = kind(0.d0)                                           
integer, dimension(3) :: dims

dims = [10, 2, 27]
call test_procedure(dims)


contains


subroutine test_procedure(dims)
    integer, dimension(3), intent(in) :: dims

    integer :: i, j, k
    real(dp) :: tot
    real(dp), dimension(dims(1), dims(2)) :: mat2d
    real(dp), dimension(dims(1), dims(2), dims(3)) :: mat3d                                
#ifdef STATIC
    real(dp), dimension(dims(1)) :: tmp
#else
    real(dp), dimension(:), allocatable :: tmp
    allocate(tmp(dims(2)))
#endif

    mat2d = 1._dp
    mat3d = 1._dp
    tot = 0._dp

    !$omp parallel do default(none), private(tmp, j, k), &
    !$omp&            shared(mat2d, mat3d, dims), reduction(+:tot)
    do i = 1, dims(3)
        tmp = 0._dp
        do j = 1, dims(2)                                                       
            do k = 1, dims(1)
               tmp(j) = tmp(j) + mat2d(k, j) * mat3d(k, j, i)
            end do                                                              
        end do                                                                  

        tot = tot + sum(tmp)
    end do
    !$omp end parallel do

    print *, 'tot:', tot, 'check:', sum(mat2d * sum(mat3d, dim=3))
end subroutine test_procedure

end program test_program

Here the result is expected to be 540, whereas compiling with "ifx -O2 -xHost -qopenmp -DSTATIC" and executing with either OMP_NUM_THREADS=1 or 2 yields different values. As in the first example, changing the -O2, -xHost, -qopenmp or -DSTATIC flags does give the expected result. This time also e.g. ifx 2023.2.0 and ifx 2024.0.0 show this behavior (so not just ifx 2024.1.0). The corresponding ifort releases work fine and also at least one older ifx release (2022.2.1) works as expected.

So even though there are several possible workarounds in both cases, I suppose that this is worth looking into. I suspect it's a compiler bug, but definitely tell in case the code itself is flawed.

Some more notes:

* OS: Rocky Linux 8.9

* CPUs: Intel Xeon Platinum 8360Y (Icelake), Intel Xeon Gold 6140 (Skylake)

Ron_Green · ‎04-05-2024

I can reproduce what you are seeing in the 2024.1.0 compiler. But our main branch from last night is producing correct results. This code will be in 2024.2, that is, Update 2 around end of June or early July.

I will see if I can isolate this more. It's impossible to say what bug was fixed. We release around 200 fixes in each update, so I can't point to one and say "it was this fix".

$ module purge
$ module load oneapi/2024.1.0
$ echo $OMP_NUM_THREADS
1
$ ifx -O2 -xhost -qopenmp -DSTATIC test.F90 -what -V ; ./a.out
Intel(R) Fortran Compiler for applications running on Intel(R) 64, Version 2024.1.0 Build 20240308
Copyright (C) 1985-2024 Intel Corporation. All rights reserved.

 Intel(R) Fortran 24.0-1472.3
GNU ld version 2.39-9.fc38
 tot:   20.0000000000000      check:   10.0000000000000     
$ 
$ module purge ; module load oneapi/nightly
$ 
$ ifx -O2 -xhost -qopenmp -DSTATIC test.F90 -what -V ; ./a.out
Intel(R) Fortran Compiler for applications running on Intel(R) 64, Version dev.x.0 Mainline Build 20240404
Copyright (C) 1985-2024 Intel Corporation. All rights reserved.

 Intel(R) Fortran 24.0-1656
GNU ld version 2.39-9.fc38
 tot:   10.0000000000000      check:   10.0000000000000     
$

For example, the "edit number" on the 2024.1 compiler is "24.0-1472". Important number here is 1472. Our main branch which will EVENTUALLY go into 2024.2 is at "24.0-1656", edit 1656 TODAY (more edits coming before code freeze). that's 184 edits as of today that are in the main branch over and above what we released for 2024.1.0.

Ron_Green · ‎04-05-2024

@m_vdb same on 2nd example - currently working with nightly build that will eventually get into 2024.2.0 builds


$ ifx -O2 -xhost -qopenmp -DSTATIC test2.F90 -what -V ; ./a.out
Intel(R) Fortran Compiler for applications running on Intel(R) 64, Version 2024.1.0 Build 20240308
Copyright (C) 1985-2024 Intel Corporation. All rights reserved.

 Intel(R) Fortran 24.0-1472.3
GNU ld version 2.39-9.fc38
 tot:   14580.0000000000      check:   540.000000000000     

$ module purge
$ module load oneapi/nightly
$ ifx -O2 -xhost -qopenmp -DSTATIC test2.F90 -what -V ; ./a.out
Intel(R) Fortran Compiler for applications running on Intel(R) 64, Version dev.x.0 Mainline Build 20240404
Copyright (C) 1985-2024 Intel Corporation. All rights reserved.

 Intel(R) Fortran 24.0-1656
GNU ld version 2.39-9.fc38
 tot:   540.000000000000      check:   540.000000000000

Ron_Green · ‎04-05-2024

@m_vdb one possible workaround until the 2024.2 release this summer.: use openmp stub calls. You use OMP_NUM_THREADS=1 to easily test the behavior at 1 thread. But you have OMP api calls in the code perhaps, so you need -qopenmp option. I noted that the error you see is only when OMP_NUM_THREADS=1, right?

If you have not used stubs, it is -qopenmp-stubs option. Like this:

BEFORE - you see the error

$ ifx -O2 -xhost -qopenmp -DSTATIC test.F90 -what -V ; ./a.out
Intel(R) Fortran Compiler for applications running on Intel(R) 64, Version 2024.1.0 Build 20240308
Copyright (C) 1985-2024 Intel Corporation. All rights reserved.

 Intel(R) Fortran 24.0-1472.3
GNU ld version 2.39-9.fc38
 tot:   20.0000000000000      check:   10.0000000000000   

STUB the OpenMP calls and correct results
  
$ ifx -O2 -xhost -qopenmp-stubs -DSTATIC test.F90 -what -V ; ./a.out
Intel(R) Fortran Compiler for applications running on Intel(R) 64, Version 2024.1.0 Build 20240308
Copyright (C) 1985-2024 Intel Corporation. All rights reserved.

 Intel(R) Fortran 24.0-1472.3
GNU ld version 2.39-9.fc38
 tot:   10.0000000000000      check:   10.0000000000000

m_vdb · ‎04-07-2024

Thanks for looking into it, glad to hear that this will be fixed in the next release!

So far I've been wrapping OpenMP API related parts in preprocessor directives, so for a serial executable I can drop -qopenmp without needing to add -qopenmp-stubs.

There is one more example that I would like to get checked. Again a variation on the same theme, but now e.g. with reduction on an array:

program test_program

implicit none


integer, parameter :: dp = kind(0.d0)
integer, parameter :: d1 = 9
integer, parameter :: d2 = 27
real(dp), dimension(d1) :: tot

tot = test_function(d1, d2)


contains


function test_function(d1, d2) result(tot)
    integer, intent(in) :: d1
    integer, intent(in) :: d2
    real(dp), dimension(d1) :: tot

    integer :: i, j
    real(dp), dimension(d1) :: vec
    real(dp), dimension(d1, d2) :: mat
#ifdef STATIC
    real(dp), dimension(d1) :: tmp
#else
    real(dp), dimension(:), allocatable :: tmp
    allocate(tmp(d1))
#endif

    vec = 1._dp
    mat = 1._dp
    tot = 0._dp

    !$omp parallel do default(none), private(j, tmp) &
    !$omp&            shared(d1, d2, mat, vec), reduction(+:tot)
    do i = 1, d2
        tmp = 0._dp
        do j = 1, d1
            tmp(j) = tmp(j) + vec(j) * mat(j, i)
        end do

        tot = tot + tmp
    end do
    !$omp end parallel do

    print *, 'tot:', sum(tot), 'check:', sum(vec * sum(mat, dim=2))
end function test_function

end program test_program

For the latest ifx release (2024.1.0), it's a similar story as the second example, e.g.:

~$ ifx -O2 -xHost -qopenmp -DSTATIC test.F90
~$ OMP_NUM_THREADS=1 ./a.out
tot: 6561.00000000000 check: 243.000000000000
~$ OMP_NUM_THREADS=2 ./a.out
tot: 3285.00000000000 check: 243.000000000000

Could you verify that also this example is working correctly with the nightly build?

Additionally, this example cannot be compiled with the latest ifort release (2021.12.0) and OpenMP:

~$ ifort -DSTATIC -O2 -xHost -qopenmp test.F90
/tmp/ifortSdscAo.i90: catastrophic error: **Internal compiler error: segmentation violation signal raised** Please report this error along with the circumstances in which it occurred in a Software Problem Report. Note: File and line given may not be explicit cause of this error.
compilation aborted for test.F90 (code 1)

I can only make it compile by dropping -qopenmp (or substituting it with -qopenmp-stubs of course).

Barbara_P_Intel · ‎04-08-2024

@m_vdb, you won the trifecta! The third one works with the nightly compiler, too!

ifx -what -O2 -xHost -qopenmp -DSTATIC m_vdb.F90
 Intel(R) Fortran 24.0-1660
OMP_NUM_THREADS=1 ./a.out
 tot:   243.000000000000      check:   243.000000000000
OMP_NUM_THREADS=2 ./a.out
 tot:   243.000000000000      check:   243.000000000000

Unfortunately, ifort still ICEs (internal compiler error) with the nightly. Since ifort is deprecated and ifx will work shortly, I doubt the compiler team will work on it. There's only one more ifort release. BTW, ifort compiles and runs OK without -qopenmp.

*trifecta -a situation in which you achieve three things

m_vdb · ‎04-09-2024

Thanks for checking @Barbara_P_Intel, now I'm reasonably confident that the next ifx version will work correctly for cases like these.