Solved: ifx Produces Incorrect Results with OpenMP and O3 Optimization Level on Linux

lins2 · ‎09-23-2024

Hello,

I would like to bring to your attention an issue encountered with the Intel Fortran Compiler (ifx) when utilizing OpenMP in conjunction with the O3 optimization level on Linux.

Description:

When using OpenMP in conjunction with the O3 optimization level to compute a 12x12 matrix multiplication, the result produced by ifx is incorrect.

Version Used:

ifx (IFX) 2023.2.0
ifort (IFORT) 2021.10.0

Steps to Reproduce:

Insert the following code into openmp_parallel_operations.for:

      SUBROUTINE CalculateMatrix12x12(A, B, C, NTHREAD)
      INTEGER, INTENT(IN) :: NTHREAD
      REAL*8, DIMENSION(12, 12), INTENT(IN) :: A, B
      REAL*8, DIMENSION(12, 12), INTENT(OUT) :: C
      INTEGER :: I, J, K
C$OMP PARALLEL DO PRIVATE(I, J, K) DEFAULT(SHARED) NUM_THREADS(NTHREAD)
      DO I = 1, 12
        DO J = 1, 12
          C(I, J) = 0.0
          DO K = 1, 12
            C(I, J) = C(I, J) + A(I, K) * B(K, J)
          ENDDO
        ENDDO
      ENDDO
C$OMP END PARALLEL DO
      END SUBROUTINE CalculateMatrix12x12

Insert the following code into another file test_openmp_accuracy.f90:

program test_openmp_accuracy
   implicit none

   real*8 :: A(12,12), B(12,12), C(12,12)
   integer :: NTHREAD, i, j

   NTHREAD = 4

   ! Generate random matrices
   call generate_random_matrix_real8(A)
   call generate_random_matrix_real8(B)

   ! Calculate the result matrix
   call CalculateMatrix12x12(A, B, C, NTHREAD)

   ! Print the result matrix to the console in 12x12 format
   print *, 'Result matrix C:'
   do i = 1, 12
      write(*, '(12F10.4)') (C(i, j), j = 1, 12)
   end do

contains
   ! random values from  ([-1, 1)).
   subroutine generate_random_matrix_real8(matrix)
       real*8, intent(out) :: matrix(12,12)
       call random_number(matrix)
       matrix = 2.0 * matrix - 1.0
   end subroutine generate_random_matrix_real8

end program test_openmp_accuracy

Compile using ifx and ifort with the following commands:

ifx -O3 -fp-model=precise -qopenmp -o ifx_executable test_openmp_accuracy.f90 openmp_parallel_operations.for

ifort -O3 -fp-model=precise -qopenmp -o ifort_executable test_openmp_accuracy.f90 openmp_parallel_operations.for

Run the executables and notice the significant difference in the results between the two compilations

Note:

This issue does not occur on Windows or in Docker on Windows.
This issue does not occur when using OpenMP with other optimization levels (O0, O1, O2) on Linux.
This issue does not occur with smaller matrices (e.g., a 6x6 matrix).

Ron_Green · ‎09-23-2024

Yes, seems to be a bug in last year's compiler. It no longer reproduces with the most recent compilers.


$ ifx -what -V -O3 -fp-model=precise -qopenmp -o ifx_executable test_openmp_accuracy.f90 openmp_parallel_operations.for
Intel(R) Fortran Compiler for applications running on Intel(R) 64, Version 2024.2.0 Build 20240602
Copyright (C) 1985-2024 Intel Corporation. All rights reserved.

 Intel(R) Fortran 24.0-1693.2
 Intel(R) Fortran 24.0-1693.2
GNU ld version 2.41-34.fc40



$ ifort -what -V -O3 -fp-model=precise -qopenmp -o ifort_executable test_openmp_accuracy.f90 openmp_parallel_operations.for
Intel(R) Fortran Intel(R) 64 Compiler Classic for applications running on Intel(R) 64, Version 2021.13.0 Build 20240602_000000
Copyright (C) 1985-2024 Intel Corporation.  All rights reserved.

ifort: remark #10448: Intel(R) Fortran Compiler Classic (ifort) is now deprecated and will be discontinued late 2024. Intel recommends that customers transition now to using the LLVM-based Intel(R) Fortran Compiler (ifx) for continued Windows* and Linux* support, new language support, new language features, and optimizations. Use '-diag-disable=10448' to disable this message.
 Intel(R) Fortran 2021.13.0-1693
 Intel(R) Fortran 2021.13.0-1693
GNU ld version 2.41-34.fc40
$ 

$ ./ifort_executable 
 Result matrix C:
    1.2557    0.0991   -1.3688   -1.7122   -0.6708    0.4373    0.6591   -0.0344   -1.9416   -0.6624   -0.5902    0.7481
    1.7409   -2.4182    0.3304   -0.0638   -0.2765    0.4845   -0.4137   -0.6930   -1.9433    1.6292    0.8591    1.0151
   -1.5535   -0.4692    0.8686   -0.6040   -0.8230   -1.7682   -0.7766    0.4733    0.8394    0.0063   -0.8128   -0.3116
   -0.3513   -0.8747    1.3415    0.3133   -0.0994   -0.2417   -0.2725   -0.8401    0.9738    0.4297   -1.3580   -1.3184
   -0.2910    0.3369    2.0200   -0.4847   -0.9144   -1.3964   -0.4151   -0.7440    1.2311    1.1842   -1.3972    0.5298
    0.4564    2.6547   -2.7408    0.5782    0.8219    2.0366    2.2464    0.5847    1.1539   -2.1728   -0.0567   -1.3700
    0.2926   -0.9777   -0.5853    1.3546    0.2526    1.3688    0.3736    0.7067   -1.7196    1.5659    2.2514   -0.3523
    2.0301    2.6029    0.2292   -1.4828    1.0922   -0.3179   -0.2188    0.8242   -0.7364    0.5274    1.6330    1.9936
    0.7644    2.0724   -1.5471   -1.0306    0.0433    0.3247    1.7963    1.7618   -0.9914    0.0393   -0.4635   -1.5764
    0.3055    1.7673   -1.8686    0.1983    1.2646   -0.0156    0.8488    1.1951    0.3812   -1.0473    0.7903    0.7391
    1.3497   -0.9203   -0.3968   -0.3713    1.6635   -0.3593   -0.1516    0.2105   -0.8938    1.2786    1.5356   -0.2852
    0.0651    1.1726    1.1340   -0.0836    2.0617   -0.6213   -0.9471    1.8397   -0.4343    1.1522    2.1557   -1.4057

$ ./ifx_executable 
 Result matrix C:
    1.2557    0.0991   -1.3688   -1.7122   -0.6708    0.4373    0.6591   -0.0344   -1.9416   -0.6624   -0.5902    0.7481
    1.7409   -2.4182    0.3304   -0.0638   -0.2765    0.4845   -0.4137   -0.6930   -1.9433    1.6292    0.8591    1.0151
   -1.5535   -0.4692    0.8686   -0.6040   -0.8230   -1.7682   -0.7766    0.4733    0.8394    0.0063   -0.8128   -0.3116
   -0.3513   -0.8747    1.3415    0.3133   -0.0994   -0.2417   -0.2725   -0.8401    0.9738    0.4297   -1.3580   -1.3184
   -0.2910    0.3369    2.0200   -0.4847   -0.9144   -1.3964   -0.4151   -0.7440    1.2311    1.1842   -1.3972    0.5298
    0.4564    2.6547   -2.7408    0.5782    0.8219    2.0366    2.2464    0.5847    1.1539   -2.1728   -0.0567   -1.3700
    0.2926   -0.9777   -0.5853    1.3546    0.2526    1.3688    0.3736    0.7067   -1.7196    1.5659    2.2514   -0.3523
    2.0301    2.6029    0.2292   -1.4828    1.0922   -0.3179   -0.2188    0.8242   -0.7364    0.5274    1.6330    1.9936
    0.7644    2.0724   -1.5471   -1.0306    0.0433    0.3247    1.7963    1.7618   -0.9914    0.0393   -0.4635   -1.5764
    0.3055    1.7673   -1.8686    0.1983    1.2646   -0.0156    0.8488    1.1951    0.3812   -1.0473    0.7903    0.7391
    1.3497   -0.9203   -0.3968   -0.3713    1.6635   -0.3593   -0.1516    0.2105   -0.8938    1.2786    1.5356   -0.2852
    0.0651    1.1726    1.1340   -0.0836    2.0617   -0.6213   -0.9471    1.8397   -0.4343    1.1522    2.1557   -1.4057

View solution in original post

Ron_Green · ‎09-23-2024

Yes, seems to be a bug in last year's compiler. It no longer reproduces with the most recent compilers.


$ ifx -what -V -O3 -fp-model=precise -qopenmp -o ifx_executable test_openmp_accuracy.f90 openmp_parallel_operations.for
Intel(R) Fortran Compiler for applications running on Intel(R) 64, Version 2024.2.0 Build 20240602
Copyright (C) 1985-2024 Intel Corporation. All rights reserved.

 Intel(R) Fortran 24.0-1693.2
 Intel(R) Fortran 24.0-1693.2
GNU ld version 2.41-34.fc40



$ ifort -what -V -O3 -fp-model=precise -qopenmp -o ifort_executable test_openmp_accuracy.f90 openmp_parallel_operations.for
Intel(R) Fortran Intel(R) 64 Compiler Classic for applications running on Intel(R) 64, Version 2021.13.0 Build 20240602_000000
Copyright (C) 1985-2024 Intel Corporation.  All rights reserved.

ifort: remark #10448: Intel(R) Fortran Compiler Classic (ifort) is now deprecated and will be discontinued late 2024. Intel recommends that customers transition now to using the LLVM-based Intel(R) Fortran Compiler (ifx) for continued Windows* and Linux* support, new language support, new language features, and optimizations. Use '-diag-disable=10448' to disable this message.
 Intel(R) Fortran 2021.13.0-1693
 Intel(R) Fortran 2021.13.0-1693
GNU ld version 2.41-34.fc40
$ 

$ ./ifort_executable 
 Result matrix C:
    1.2557    0.0991   -1.3688   -1.7122   -0.6708    0.4373    0.6591   -0.0344   -1.9416   -0.6624   -0.5902    0.7481
    1.7409   -2.4182    0.3304   -0.0638   -0.2765    0.4845   -0.4137   -0.6930   -1.9433    1.6292    0.8591    1.0151
   -1.5535   -0.4692    0.8686   -0.6040   -0.8230   -1.7682   -0.7766    0.4733    0.8394    0.0063   -0.8128   -0.3116
   -0.3513   -0.8747    1.3415    0.3133   -0.0994   -0.2417   -0.2725   -0.8401    0.9738    0.4297   -1.3580   -1.3184
   -0.2910    0.3369    2.0200   -0.4847   -0.9144   -1.3964   -0.4151   -0.7440    1.2311    1.1842   -1.3972    0.5298
    0.4564    2.6547   -2.7408    0.5782    0.8219    2.0366    2.2464    0.5847    1.1539   -2.1728   -0.0567   -1.3700
    0.2926   -0.9777   -0.5853    1.3546    0.2526    1.3688    0.3736    0.7067   -1.7196    1.5659    2.2514   -0.3523
    2.0301    2.6029    0.2292   -1.4828    1.0922   -0.3179   -0.2188    0.8242   -0.7364    0.5274    1.6330    1.9936
    0.7644    2.0724   -1.5471   -1.0306    0.0433    0.3247    1.7963    1.7618   -0.9914    0.0393   -0.4635   -1.5764
    0.3055    1.7673   -1.8686    0.1983    1.2646   -0.0156    0.8488    1.1951    0.3812   -1.0473    0.7903    0.7391
    1.3497   -0.9203   -0.3968   -0.3713    1.6635   -0.3593   -0.1516    0.2105   -0.8938    1.2786    1.5356   -0.2852
    0.0651    1.1726    1.1340   -0.0836    2.0617   -0.6213   -0.9471    1.8397   -0.4343    1.1522    2.1557   -1.4057

$ ./ifx_executable 
 Result matrix C:
    1.2557    0.0991   -1.3688   -1.7122   -0.6708    0.4373    0.6591   -0.0344   -1.9416   -0.6624   -0.5902    0.7481
    1.7409   -2.4182    0.3304   -0.0638   -0.2765    0.4845   -0.4137   -0.6930   -1.9433    1.6292    0.8591    1.0151
   -1.5535   -0.4692    0.8686   -0.6040   -0.8230   -1.7682   -0.7766    0.4733    0.8394    0.0063   -0.8128   -0.3116
   -0.3513   -0.8747    1.3415    0.3133   -0.0994   -0.2417   -0.2725   -0.8401    0.9738    0.4297   -1.3580   -1.3184
   -0.2910    0.3369    2.0200   -0.4847   -0.9144   -1.3964   -0.4151   -0.7440    1.2311    1.1842   -1.3972    0.5298
    0.4564    2.6547   -2.7408    0.5782    0.8219    2.0366    2.2464    0.5847    1.1539   -2.1728   -0.0567   -1.3700
    0.2926   -0.9777   -0.5853    1.3546    0.2526    1.3688    0.3736    0.7067   -1.7196    1.5659    2.2514   -0.3523
    2.0301    2.6029    0.2292   -1.4828    1.0922   -0.3179   -0.2188    0.8242   -0.7364    0.5274    1.6330    1.9936
    0.7644    2.0724   -1.5471   -1.0306    0.0433    0.3247    1.7963    1.7618   -0.9914    0.0393   -0.4635   -1.5764
    0.3055    1.7673   -1.8686    0.1983    1.2646   -0.0156    0.8488    1.1951    0.3812   -1.0473    0.7903    0.7391
    1.3497   -0.9203   -0.3968   -0.3713    1.6635   -0.3593   -0.1516    0.2105   -0.8938    1.2786    1.5356   -0.2852
    0.0651    1.1726    1.1340   -0.0836    2.0617   -0.6213   -0.9471    1.8397   -0.4343    1.1522    2.1557   -1.4057

jimdempseyatthecove · ‎09-24-2024

While @Ron_Green states this has been fixed...

You should be aware that your coding is inefficient.

Consider swapping the order of the I and J loops. This should reduce the cache line evictions between cores.

Also, the size of the array (12x12) requires 12^3 (1728) iterations of the inner most statement. This might not be sufficient to benefit from parallelization.

This said, if your program has many 12x12 such arrays that need multiplication, then lift the parallelization to the level where the individual arrays are selected for multiplication.

Jim Dempsey

ifx Produces Incorrect Results with OpenMP and O3 Optimization Level on Linux

Fortran Language

OpenMP