- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Title: Bug when using IFX with OpenMP SIMD directive
System: Windows 10 22H2 with VS2022 or Windows WSL2
CPU: Intel(R) Core(TM) i7-9750H CPU @ 2.60GHz
OneAPI version: 2024.1
I have a numerical computation program parallelized with OpenMP, which runs well using ifort or gfortran compilers with any commonly seen compiling options. For the attached code, the correct output iteration number is 1259 (should use FP64). (It's only one indication that the result is correct. Actually when the output is visualized, there are significant differences if the iteration number is different.)
Recently I am trying to compile it with IFX. However, I have the wired observations.
For the 3 loops that I marked with comment "! BUG with IFX " in the file "AWENO_solver.f90":
1. If I use OMP PARALLEL DO SIMD or simply OMP SIMD directive to any of them, then: (1) If compiled using ifort or gfortran with any options, everything is fine; (2) if compiled using IFX with "-O0 -qopenmp -r8" or "-On -qopenmp-stubs -r8" (n can be 1,2,3), everything is fine; (3) if compiled using IFX with "-O2 -qopenmp -r8", then the result is wrong (in the sense that the iteration number is 1250, and the plotted solution is greatly different).
2. If I just use OMP PARALLEL DO to them, the result is always fine regardless of the compiler and compiling options.
It seems there is something wrong with IFX+OpenMP SIMD?
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
What version of ifx do you use?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I cannot reproduce the issue. I replaced the OMP directives in the 3 loops with SIMD directives as shown below.
I compile with the 2024.2.0 compiler which is same-as 2024.2.1, there was no change in the compiler between these 2 versions.
rm -Rf *.o *.mod a.out
ifx -what -V -O2 -r8 -qopenmp -c io.f90
ifx -what -V -O2 -r8 -qopenmp -c weno.f90
ifx -what -V -O2 -r8 -qopenmp -c Euler_PDE.f90
ifx -what -V -O2 -r8 -qopenmp -c Euler_nflux.f90
ifx -what -V -O2 -r8 -qopenmp -c AWENO_solver.f90
ifx -what -V -O2 -r8 -qopenmp -c problem.f90
ifx -what -V -O2 -r8 -qopenmp main.F90 io.o weno.o Euler_PDE.o Euler_nflux.o AWENO_solver.f90 problem.o
! BUG with IFX:
! If this loop uses OMP DO SIMD or simply OMP SIMD, and the program is compiled with IFX with args containing -qopenmp -O2, then the total num of iterations will be 1250, and the result is wrong!
! Under the above condition, if compiled with IFX with -O0 or compiled with IFORT or GFORTRAN, the total num of iterations will be 1259, and the result is correct.
!rwg !$omp parallel do schedule(static)
!$omp do simd
do i = 1-ste_r, Nx+ste_r
sonics(i) = SQRT( gamma * abs( u_pri(3,i) / u_pri(1,i) ) ) ! stable implementation
end do
!$omp end do simd
!rwg !$omp end parallel do
if (disp_correction .or. use_flux_limiter) then
! compute the exact flux (can be and should be done from outside)
! BUG with IFX:
! If this loop uses OMP DO SIMD or simply OMP SIMD, and the program is compiled with IFX with args containing -qopenmp -O2, then the total num of iterations will be 1250, and the result is wrong!
! Under the above condition, if compiled with IFX with -O0 or compiled with IFORT or GFORTRAN, the total num of iterations will be 1259, and the result is correct.
!rwg !$omp parallel do schedule(static)
!$omp do simd
do i = 1-ste_r, Nx+ste_r
FF(:,i) = Euler_advective_flux(u_con(:,i), u_pri(3,i), [1.0], 1)
end do
!$omp end do simd
!rwg !$omp end parallel do
end if
if (interp_method == CH_RI) then
! compute the Riemann invariants (can be and should be done from outside)
! BUG with IFX:
! If this loop uses OMP DO SIMD or simply OMP SIMD, and the program is compiled with IFX with args containing -qopenmp -O2, then the total num of iterations will be 1250, and the result is wrong!
! Under the above condition, if compiled with IFX with -O0 or compiled with IFORT or GFORTRAN, the total num of iterations will be 1259, and the result is correct.
!rwg !$omp parallel do schedule(static) firstprivate(gamma_coef)
!$omp do simd
do i = 1-ste_r, Nx+ste_r
RIs(1,i) = u_pri(2,i) - gamma_coef * sonics(i)
RIs(2,i) = sqrt(u_pri(3,i)**(1.0/gamma) / u_pri(1,i))
RIs(3,i) = u_pri(2,i) + gamma_coef * sonics(i)
end do
!$omp end do simd
!rwg !$omp end parallel do
results, trimmed down. But I am on Redhat Linux. I will try this on Windows when my server comes back up.
more /etc/redhat-release
Red Hat Enterprise Linux release 8.6 (Ootpa)
model name : Intel(R) Xeon(R) Platinum 8260L CPU @ 2.40GHz
t= 3.7989637457480699E-02
t= 3.7989637457480699E-02
Solving completed.
total number of time steps= 1259
cpu time= 1.6401E+00 s
OMP_num_threads= 8
Program terminates successfully.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
My Windows server shows the same result 1259 time steps as expected.
Windows options. /O2 /real-size:64 /Qopenmp
Ran on 72 threads on a 2 processor Xeon Gold 6140 with Windows
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Ron,
Thank you for your effort very much. Now I can get the correct results using "-O3 -r8 -qopenmp -fpp". BTW: Sorry that I forgot to include all my compiling options. I tried several cases and finally found that the possible problem may be with -ipo. I list my results here. My opinion now is that my previously encountered problem has NOTHING to do with OpenMP or SIMD, because such error can also emerge if I compile without OpenMP.
Moreover, I doubt that this problem is caused by inlining, because when I test on Windows with "/Qipo /fpp /Qopenmp /real-size:64" (like setting -ipo on Linux), I can get correct results using additional "/Ob0" and wrong results using additional "/Ob2".
Interestingly, EVERY time I got the "wrong" result, the iteration count and the plotted solution is always the same. It behaves pretty like a regular bug.
Options | correct plot and correct iteration time (==1259) |
-O1 -ipo -r8 -qopenmp -fpp | yes |
-O2 -ipo -r8 -qopenmp -fpp | NO |
-O3 -ipo -r8 -qopenmp -fpp | NO |
-O1 -r8 -qopenmp -fpp | yes |
-O2 -r8 -qopenmp -fpp | yes |
-O3 -r8 -qopenmp -fpp | yes |
-O1 -xHost -r8 -qopenmp -fpp | yes |
-O2 -xHost -r8 -qopenmp -fpp | yes |
-O3 -xHost -r8 -qopenmp -fpp | yes |
-O1 -ipo -r8 -fpp | yes |
-O2 -ipo -r8 -fpp | NO |
-O3 -ipo -r8 -fpp | NO |
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
This is great news. The IPO will change the vectorization behavior by inlining. I can check the -qopt-report output to confirm but I suspect ABS and maybe SQRT gets inlined with the IPO option.
This code runs very quickly with openmp. Do you see any need for IPO? Maybe it could just be avoided if the performance without it is good enough.
I will run some tests with IPO along with -fp-model options and timings to see if we can get IPO and maintain the same convergence timesteps.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Yes. I am trying -ipo just becuase this code is a building block of a future "big" computation code.
I think that -fp-model is not crucial (actually I have tested different -fp-model's). The reason is that, if I compile using -r16 (i.e. /real-size:128) and disable OpenMP, then:
"-O3 -xHost -fpp" gives correct plot and correct iteration nums;
"-O3 -ipo -xHost -fpp" gives incorrect plot (almost the same as those incorrect ones using -r8) and incorrect iteration nums.
My opinion is that some "structural and logical" but not "floating point-al" error happens when "-ipo" is used with "-O2" or "-O3".
Remark: such error doesn't happen on ifort.

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page