- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi! I've observed some odd behavior with the latest releases of the Intel Fortran Compiler (ifx).
Minimal working example #1
program test_program
implicit none
integer, parameter :: dp = kind(0.d0)
integer, dimension(2) :: dims
dims = [10, 2]
call test_procedure(dims)
contains
subroutine test_procedure(dims)
integer, dimension(2), intent(in) :: dims
integer :: i
real(dp) :: tot
real(dp), dimension(dims(1), dims(2)) :: mat
#ifdef STATIC
real(dp), dimension(dims(1)) :: tmp
#else
real(dp), dimension(:), allocatable :: tmp
allocate(tmp(dims(1)))
#endif
do i = 1, dims(2)
mat(:, i) = real(i-1, kind=dp)
end do
tot = 0._dp
!$omp parallel do default(none), private(tmp), shared(mat, dims), &
!$omp& reduction(+:tot)
do i = 1, dims(2)
tmp = mat(:, i)
tot = tot + sum(tmp)
end do
!$omp end parallel do
print *, 'tot:', tot, 'check:', sum(mat)
end subroutine test_procedure
end program test_program
This simply calculates a matrix sum, possibly with OpenMP parallelization. In this example the sum is expected to be equal to 10, but in the following scenario a wrong value is produced with the latest ifx release (2024.1.0):
~$ ifx -O2 -xHost -qopenmp -DSTATIC test.F90
~$ OMP_NUM_THREADS=1 ./a.out
tot: 20.0000000000000 check: 10.0000000000000
The correct result is recovered when either:
* lowering the optimization to e.g. O1
* removing the -xHost flag
* removing the -qopenmp flag
* removing the -DSTATIC directive
* not using the 'tmp' array (obviously)
* using two OpenMP threads instead of one
* using ifort instead of ifx 2024.1.0
* using an older ifx version (i.e. 2024.0.0 and earlier)
Minimal working example #2
A related example, however, suggests that also some older ifx releases are affected:
program test_program
implicit none
integer, parameter :: dp = kind(0.d0)
integer, dimension(3) :: dims
dims = [10, 2, 27]
call test_procedure(dims)
contains
subroutine test_procedure(dims)
integer, dimension(3), intent(in) :: dims
integer :: i, j, k
real(dp) :: tot
real(dp), dimension(dims(1), dims(2)) :: mat2d
real(dp), dimension(dims(1), dims(2), dims(3)) :: mat3d
#ifdef STATIC
real(dp), dimension(dims(1)) :: tmp
#else
real(dp), dimension(:), allocatable :: tmp
allocate(tmp(dims(2)))
#endif
mat2d = 1._dp
mat3d = 1._dp
tot = 0._dp
!$omp parallel do default(none), private(tmp, j, k), &
!$omp& shared(mat2d, mat3d, dims), reduction(+:tot)
do i = 1, dims(3)
tmp = 0._dp
do j = 1, dims(2)
do k = 1, dims(1)
tmp(j) = tmp(j) + mat2d(k, j) * mat3d(k, j, i)
end do
end do
tot = tot + sum(tmp)
end do
!$omp end parallel do
print *, 'tot:', tot, 'check:', sum(mat2d * sum(mat3d, dim=3))
end subroutine test_procedure
end program test_program
Here the result is expected to be 540, whereas compiling with "ifx -O2 -xHost -qopenmp -DSTATIC" and executing with either OMP_NUM_THREADS=1 or 2 yields different values. As in the first example, changing the -O2, -xHost, -qopenmp or -DSTATIC flags does give the expected result. This time also e.g. ifx 2023.2.0 and ifx 2024.0.0 show this behavior (so not just ifx 2024.1.0). The corresponding ifort releases work fine and also at least one older ifx release (2022.2.1) works as expected.
So even though there are several possible workarounds in both cases, I suppose that this is worth looking into. I suspect it's a compiler bug, but definitely tell in case the code itself is flawed.
Some more notes:
* OS: Rocky Linux 8.9
* CPUs: Intel Xeon Platinum 8360Y (Icelake), Intel Xeon Gold 6140 (Skylake)
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I can reproduce what you are seeing in the 2024.1.0 compiler. But our main branch from last night is producing correct results. This code will be in 2024.2, that is, Update 2 around end of June or early July.
I will see if I can isolate this more. It's impossible to say what bug was fixed. We release around 200 fixes in each update, so I can't point to one and say "it was this fix".
$ module purge
$ module load oneapi/2024.1.0
$ echo $OMP_NUM_THREADS
1
$ ifx -O2 -xhost -qopenmp -DSTATIC test.F90 -what -V ; ./a.out
Intel(R) Fortran Compiler for applications running on Intel(R) 64, Version 2024.1.0 Build 20240308
Copyright (C) 1985-2024 Intel Corporation. All rights reserved.
Intel(R) Fortran 24.0-1472.3
GNU ld version 2.39-9.fc38
tot: 20.0000000000000 check: 10.0000000000000
$
$ module purge ; module load oneapi/nightly
$
$ ifx -O2 -xhost -qopenmp -DSTATIC test.F90 -what -V ; ./a.out
Intel(R) Fortran Compiler for applications running on Intel(R) 64, Version dev.x.0 Mainline Build 20240404
Copyright (C) 1985-2024 Intel Corporation. All rights reserved.
Intel(R) Fortran 24.0-1656
GNU ld version 2.39-9.fc38
tot: 10.0000000000000 check: 10.0000000000000
$
For example, the "edit number" on the 2024.1 compiler is "24.0-1472". Important number here is 1472. Our main branch which will EVENTUALLY go into 2024.2 is at "24.0-1656", edit 1656 TODAY (more edits coming before code freeze). that's 184 edits as of today that are in the main branch over and above what we released for 2024.1.0.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@m_vdb same on 2nd example - currently working with nightly build that will eventually get into 2024.2.0 builds
$ ifx -O2 -xhost -qopenmp -DSTATIC test2.F90 -what -V ; ./a.out
Intel(R) Fortran Compiler for applications running on Intel(R) 64, Version 2024.1.0 Build 20240308
Copyright (C) 1985-2024 Intel Corporation. All rights reserved.
Intel(R) Fortran 24.0-1472.3
GNU ld version 2.39-9.fc38
tot: 14580.0000000000 check: 540.000000000000
$ module purge
$ module load oneapi/nightly
$ ifx -O2 -xhost -qopenmp -DSTATIC test2.F90 -what -V ; ./a.out
Intel(R) Fortran Compiler for applications running on Intel(R) 64, Version dev.x.0 Mainline Build 20240404
Copyright (C) 1985-2024 Intel Corporation. All rights reserved.
Intel(R) Fortran 24.0-1656
GNU ld version 2.39-9.fc38
tot: 540.000000000000 check: 540.000000000000
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@m_vdb one possible workaround until the 2024.2 release this summer.: use openmp stub calls. You use OMP_NUM_THREADS=1 to easily test the behavior at 1 thread. But you have OMP api calls in the code perhaps, so you need -qopenmp option. I noted that the error you see is only when OMP_NUM_THREADS=1, right?
If you have not used stubs, it is -qopenmp-stubs option. Like this:
BEFORE - you see the error
$ ifx -O2 -xhost -qopenmp -DSTATIC test.F90 -what -V ; ./a.out
Intel(R) Fortran Compiler for applications running on Intel(R) 64, Version 2024.1.0 Build 20240308
Copyright (C) 1985-2024 Intel Corporation. All rights reserved.
Intel(R) Fortran 24.0-1472.3
GNU ld version 2.39-9.fc38
tot: 20.0000000000000 check: 10.0000000000000
STUB the OpenMP calls and correct results
$ ifx -O2 -xhost -qopenmp-stubs -DSTATIC test.F90 -what -V ; ./a.out
Intel(R) Fortran Compiler for applications running on Intel(R) 64, Version 2024.1.0 Build 20240308
Copyright (C) 1985-2024 Intel Corporation. All rights reserved.
Intel(R) Fortran 24.0-1472.3
GNU ld version 2.39-9.fc38
tot: 10.0000000000000 check: 10.0000000000000
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks for looking into it, glad to hear that this will be fixed in the next release!
So far I've been wrapping OpenMP API related parts in preprocessor directives, so for a serial executable I can drop -qopenmp without needing to add -qopenmp-stubs.
There is one more example that I would like to get checked. Again a variation on the same theme, but now e.g. with reduction on an array:
program test_program
implicit none
integer, parameter :: dp = kind(0.d0)
integer, parameter :: d1 = 9
integer, parameter :: d2 = 27
real(dp), dimension(d1) :: tot
tot = test_function(d1, d2)
contains
function test_function(d1, d2) result(tot)
integer, intent(in) :: d1
integer, intent(in) :: d2
real(dp), dimension(d1) :: tot
integer :: i, j
real(dp), dimension(d1) :: vec
real(dp), dimension(d1, d2) :: mat
#ifdef STATIC
real(dp), dimension(d1) :: tmp
#else
real(dp), dimension(:), allocatable :: tmp
allocate(tmp(d1))
#endif
vec = 1._dp
mat = 1._dp
tot = 0._dp
!$omp parallel do default(none), private(j, tmp) &
!$omp& shared(d1, d2, mat, vec), reduction(+:tot)
do i = 1, d2
tmp = 0._dp
do j = 1, d1
tmp(j) = tmp(j) + vec(j) * mat(j, i)
end do
tot = tot + tmp
end do
!$omp end parallel do
print *, 'tot:', sum(tot), 'check:', sum(vec * sum(mat, dim=2))
end function test_function
end program test_program
For the latest ifx release (2024.1.0), it's a similar story as the second example, e.g.:
~$ ifx -O2 -xHost -qopenmp -DSTATIC test.F90
~$ OMP_NUM_THREADS=1 ./a.out
tot: 6561.00000000000 check: 243.000000000000
~$ OMP_NUM_THREADS=2 ./a.out
tot: 3285.00000000000 check: 243.000000000000
Could you verify that also this example is working correctly with the nightly build?
Additionally, this example cannot be compiled with the latest ifort release (2021.12.0) and OpenMP:
~$ ifort -DSTATIC -O2 -xHost -qopenmp test.F90
/tmp/ifortSdscAo.i90: catastrophic error: **Internal compiler error: segmentation violation signal raised** Please report this error along with the circumstances in which it occurred in a Software Problem Report. Note: File and line given may not be explicit cause of this error.
compilation aborted for test.F90 (code 1)
I can only make it compile by dropping -qopenmp (or substituting it with -qopenmp-stubs of course).
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@m_vdb, you won the trifecta! The third one works with the nightly compiler, too!
ifx -what -O2 -xHost -qopenmp -DSTATIC m_vdb.F90
Intel(R) Fortran 24.0-1660
OMP_NUM_THREADS=1 ./a.out
tot: 243.000000000000 check: 243.000000000000
OMP_NUM_THREADS=2 ./a.out
tot: 243.000000000000 check: 243.000000000000
Unfortunately, ifort still ICEs (internal compiler error) with the nightly. Since ifort is deprecated and ifx will work shortly, I doubt the compiler team will work on it. There's only one more ifort release. BTW, ifort compiles and runs OK without -qopenmp.
*trifecta -a situation in which you achieve three things
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks for checking @Barbara_P_Intel, now I'm reasonably confident that the next ifx version will work correctly for cases like these.

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page