- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
I would like to bring to your attention an issue encountered with the Intel Fortran Compiler (ifx) when utilizing OpenMP in conjunction with the O3 optimization level on Linux.
Description:
When using OpenMP in conjunction with the O3 optimization level to compute a 12x12 matrix multiplication, the result produced by ifx is incorrect.
Version Used:
- ifx (IFX) 2023.2.0
- ifort (IFORT) 2021.10.0
Steps to Reproduce:
- Insert the following code into openmp_parallel_operations.for:
SUBROUTINE CalculateMatrix12x12(A, B, C, NTHREAD)
INTEGER, INTENT(IN) :: NTHREAD
REAL*8, DIMENSION(12, 12), INTENT(IN) :: A, B
REAL*8, DIMENSION(12, 12), INTENT(OUT) :: C
INTEGER :: I, J, K
C$OMP PARALLEL DO PRIVATE(I, J, K) DEFAULT(SHARED) NUM_THREADS(NTHREAD)
DO I = 1, 12
DO J = 1, 12
C(I, J) = 0.0
DO K = 1, 12
C(I, J) = C(I, J) + A(I, K) * B(K, J)
ENDDO
ENDDO
ENDDO
C$OMP END PARALLEL DO
END SUBROUTINE CalculateMatrix12x12
- Insert the following code into another file test_openmp_accuracy.f90:
program test_openmp_accuracy
implicit none
real*8 :: A(12,12), B(12,12), C(12,12)
integer :: NTHREAD, i, j
NTHREAD = 4
! Generate random matrices
call generate_random_matrix_real8(A)
call generate_random_matrix_real8(B)
! Calculate the result matrix
call CalculateMatrix12x12(A, B, C, NTHREAD)
! Print the result matrix to the console in 12x12 format
print *, 'Result matrix C:'
do i = 1, 12
write(*, '(12F10.4)') (C(i, j), j = 1, 12)
end do
contains
! random values from ([-1, 1)).
subroutine generate_random_matrix_real8(matrix)
real*8, intent(out) :: matrix(12,12)
call random_number(matrix)
matrix = 2.0 * matrix - 1.0
end subroutine generate_random_matrix_real8
end program test_openmp_accuracy
- Compile using ifx and ifort with the following commands:
ifx -O3 -fp-model=precise -qopenmp -o ifx_executable test_openmp_accuracy.f90 openmp_parallel_operations.for
ifort -O3 -fp-model=precise -qopenmp -o ifort_executable test_openmp_accuracy.f90 openmp_parallel_operations.for
- Run the executables and notice the significant difference in the results between the two compilations
Note:
- This issue does not occur on Windows or in Docker on Windows.
- This issue does not occur when using OpenMP with other optimization levels (O0, O1, O2) on Linux.
- This issue does not occur with smaller matrices (e.g., a 6x6 matrix).
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Yes, seems to be a bug in last year's compiler. It no longer reproduces with the most recent compilers.
$ ifx -what -V -O3 -fp-model=precise -qopenmp -o ifx_executable test_openmp_accuracy.f90 openmp_parallel_operations.for
Intel(R) Fortran Compiler for applications running on Intel(R) 64, Version 2024.2.0 Build 20240602
Copyright (C) 1985-2024 Intel Corporation. All rights reserved.
Intel(R) Fortran 24.0-1693.2
Intel(R) Fortran 24.0-1693.2
GNU ld version 2.41-34.fc40
$ ifort -what -V -O3 -fp-model=precise -qopenmp -o ifort_executable test_openmp_accuracy.f90 openmp_parallel_operations.for
Intel(R) Fortran Intel(R) 64 Compiler Classic for applications running on Intel(R) 64, Version 2021.13.0 Build 20240602_000000
Copyright (C) 1985-2024 Intel Corporation. All rights reserved.
ifort: remark #10448: Intel(R) Fortran Compiler Classic (ifort) is now deprecated and will be discontinued late 2024. Intel recommends that customers transition now to using the LLVM-based Intel(R) Fortran Compiler (ifx) for continued Windows* and Linux* support, new language support, new language features, and optimizations. Use '-diag-disable=10448' to disable this message.
Intel(R) Fortran 2021.13.0-1693
Intel(R) Fortran 2021.13.0-1693
GNU ld version 2.41-34.fc40
$
$ ./ifort_executable
Result matrix C:
1.2557 0.0991 -1.3688 -1.7122 -0.6708 0.4373 0.6591 -0.0344 -1.9416 -0.6624 -0.5902 0.7481
1.7409 -2.4182 0.3304 -0.0638 -0.2765 0.4845 -0.4137 -0.6930 -1.9433 1.6292 0.8591 1.0151
-1.5535 -0.4692 0.8686 -0.6040 -0.8230 -1.7682 -0.7766 0.4733 0.8394 0.0063 -0.8128 -0.3116
-0.3513 -0.8747 1.3415 0.3133 -0.0994 -0.2417 -0.2725 -0.8401 0.9738 0.4297 -1.3580 -1.3184
-0.2910 0.3369 2.0200 -0.4847 -0.9144 -1.3964 -0.4151 -0.7440 1.2311 1.1842 -1.3972 0.5298
0.4564 2.6547 -2.7408 0.5782 0.8219 2.0366 2.2464 0.5847 1.1539 -2.1728 -0.0567 -1.3700
0.2926 -0.9777 -0.5853 1.3546 0.2526 1.3688 0.3736 0.7067 -1.7196 1.5659 2.2514 -0.3523
2.0301 2.6029 0.2292 -1.4828 1.0922 -0.3179 -0.2188 0.8242 -0.7364 0.5274 1.6330 1.9936
0.7644 2.0724 -1.5471 -1.0306 0.0433 0.3247 1.7963 1.7618 -0.9914 0.0393 -0.4635 -1.5764
0.3055 1.7673 -1.8686 0.1983 1.2646 -0.0156 0.8488 1.1951 0.3812 -1.0473 0.7903 0.7391
1.3497 -0.9203 -0.3968 -0.3713 1.6635 -0.3593 -0.1516 0.2105 -0.8938 1.2786 1.5356 -0.2852
0.0651 1.1726 1.1340 -0.0836 2.0617 -0.6213 -0.9471 1.8397 -0.4343 1.1522 2.1557 -1.4057
$ ./ifx_executable
Result matrix C:
1.2557 0.0991 -1.3688 -1.7122 -0.6708 0.4373 0.6591 -0.0344 -1.9416 -0.6624 -0.5902 0.7481
1.7409 -2.4182 0.3304 -0.0638 -0.2765 0.4845 -0.4137 -0.6930 -1.9433 1.6292 0.8591 1.0151
-1.5535 -0.4692 0.8686 -0.6040 -0.8230 -1.7682 -0.7766 0.4733 0.8394 0.0063 -0.8128 -0.3116
-0.3513 -0.8747 1.3415 0.3133 -0.0994 -0.2417 -0.2725 -0.8401 0.9738 0.4297 -1.3580 -1.3184
-0.2910 0.3369 2.0200 -0.4847 -0.9144 -1.3964 -0.4151 -0.7440 1.2311 1.1842 -1.3972 0.5298
0.4564 2.6547 -2.7408 0.5782 0.8219 2.0366 2.2464 0.5847 1.1539 -2.1728 -0.0567 -1.3700
0.2926 -0.9777 -0.5853 1.3546 0.2526 1.3688 0.3736 0.7067 -1.7196 1.5659 2.2514 -0.3523
2.0301 2.6029 0.2292 -1.4828 1.0922 -0.3179 -0.2188 0.8242 -0.7364 0.5274 1.6330 1.9936
0.7644 2.0724 -1.5471 -1.0306 0.0433 0.3247 1.7963 1.7618 -0.9914 0.0393 -0.4635 -1.5764
0.3055 1.7673 -1.8686 0.1983 1.2646 -0.0156 0.8488 1.1951 0.3812 -1.0473 0.7903 0.7391
1.3497 -0.9203 -0.3968 -0.3713 1.6635 -0.3593 -0.1516 0.2105 -0.8938 1.2786 1.5356 -0.2852
0.0651 1.1726 1.1340 -0.0836 2.0617 -0.6213 -0.9471 1.8397 -0.4343 1.1522 2.1557 -1.4057
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Yes, seems to be a bug in last year's compiler. It no longer reproduces with the most recent compilers.
$ ifx -what -V -O3 -fp-model=precise -qopenmp -o ifx_executable test_openmp_accuracy.f90 openmp_parallel_operations.for
Intel(R) Fortran Compiler for applications running on Intel(R) 64, Version 2024.2.0 Build 20240602
Copyright (C) 1985-2024 Intel Corporation. All rights reserved.
Intel(R) Fortran 24.0-1693.2
Intel(R) Fortran 24.0-1693.2
GNU ld version 2.41-34.fc40
$ ifort -what -V -O3 -fp-model=precise -qopenmp -o ifort_executable test_openmp_accuracy.f90 openmp_parallel_operations.for
Intel(R) Fortran Intel(R) 64 Compiler Classic for applications running on Intel(R) 64, Version 2021.13.0 Build 20240602_000000
Copyright (C) 1985-2024 Intel Corporation. All rights reserved.
ifort: remark #10448: Intel(R) Fortran Compiler Classic (ifort) is now deprecated and will be discontinued late 2024. Intel recommends that customers transition now to using the LLVM-based Intel(R) Fortran Compiler (ifx) for continued Windows* and Linux* support, new language support, new language features, and optimizations. Use '-diag-disable=10448' to disable this message.
Intel(R) Fortran 2021.13.0-1693
Intel(R) Fortran 2021.13.0-1693
GNU ld version 2.41-34.fc40
$
$ ./ifort_executable
Result matrix C:
1.2557 0.0991 -1.3688 -1.7122 -0.6708 0.4373 0.6591 -0.0344 -1.9416 -0.6624 -0.5902 0.7481
1.7409 -2.4182 0.3304 -0.0638 -0.2765 0.4845 -0.4137 -0.6930 -1.9433 1.6292 0.8591 1.0151
-1.5535 -0.4692 0.8686 -0.6040 -0.8230 -1.7682 -0.7766 0.4733 0.8394 0.0063 -0.8128 -0.3116
-0.3513 -0.8747 1.3415 0.3133 -0.0994 -0.2417 -0.2725 -0.8401 0.9738 0.4297 -1.3580 -1.3184
-0.2910 0.3369 2.0200 -0.4847 -0.9144 -1.3964 -0.4151 -0.7440 1.2311 1.1842 -1.3972 0.5298
0.4564 2.6547 -2.7408 0.5782 0.8219 2.0366 2.2464 0.5847 1.1539 -2.1728 -0.0567 -1.3700
0.2926 -0.9777 -0.5853 1.3546 0.2526 1.3688 0.3736 0.7067 -1.7196 1.5659 2.2514 -0.3523
2.0301 2.6029 0.2292 -1.4828 1.0922 -0.3179 -0.2188 0.8242 -0.7364 0.5274 1.6330 1.9936
0.7644 2.0724 -1.5471 -1.0306 0.0433 0.3247 1.7963 1.7618 -0.9914 0.0393 -0.4635 -1.5764
0.3055 1.7673 -1.8686 0.1983 1.2646 -0.0156 0.8488 1.1951 0.3812 -1.0473 0.7903 0.7391
1.3497 -0.9203 -0.3968 -0.3713 1.6635 -0.3593 -0.1516 0.2105 -0.8938 1.2786 1.5356 -0.2852
0.0651 1.1726 1.1340 -0.0836 2.0617 -0.6213 -0.9471 1.8397 -0.4343 1.1522 2.1557 -1.4057
$ ./ifx_executable
Result matrix C:
1.2557 0.0991 -1.3688 -1.7122 -0.6708 0.4373 0.6591 -0.0344 -1.9416 -0.6624 -0.5902 0.7481
1.7409 -2.4182 0.3304 -0.0638 -0.2765 0.4845 -0.4137 -0.6930 -1.9433 1.6292 0.8591 1.0151
-1.5535 -0.4692 0.8686 -0.6040 -0.8230 -1.7682 -0.7766 0.4733 0.8394 0.0063 -0.8128 -0.3116
-0.3513 -0.8747 1.3415 0.3133 -0.0994 -0.2417 -0.2725 -0.8401 0.9738 0.4297 -1.3580 -1.3184
-0.2910 0.3369 2.0200 -0.4847 -0.9144 -1.3964 -0.4151 -0.7440 1.2311 1.1842 -1.3972 0.5298
0.4564 2.6547 -2.7408 0.5782 0.8219 2.0366 2.2464 0.5847 1.1539 -2.1728 -0.0567 -1.3700
0.2926 -0.9777 -0.5853 1.3546 0.2526 1.3688 0.3736 0.7067 -1.7196 1.5659 2.2514 -0.3523
2.0301 2.6029 0.2292 -1.4828 1.0922 -0.3179 -0.2188 0.8242 -0.7364 0.5274 1.6330 1.9936
0.7644 2.0724 -1.5471 -1.0306 0.0433 0.3247 1.7963 1.7618 -0.9914 0.0393 -0.4635 -1.5764
0.3055 1.7673 -1.8686 0.1983 1.2646 -0.0156 0.8488 1.1951 0.3812 -1.0473 0.7903 0.7391
1.3497 -0.9203 -0.3968 -0.3713 1.6635 -0.3593 -0.1516 0.2105 -0.8938 1.2786 1.5356 -0.2852
0.0651 1.1726 1.1340 -0.0836 2.0617 -0.6213 -0.9471 1.8397 -0.4343 1.1522 2.1557 -1.4057
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
While @Ron_Green states this has been fixed...
You should be aware that your coding is inefficient.
Consider swapping the order of the I and J loops. This should reduce the cache line evictions between cores.
Also, the size of the array (12x12) requires 12^3 (1728) iterations of the inner most statement. This might not be sufficient to benefit from parallelization.
This said, if your program has many 12x12 such arrays that need multiplication, then lift the parallelization to the level where the individual arrays are selected for multiplication.
Jim Dempsey

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page