- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Dear community,
I am testing a small code with ifort 2021.6.0, which is showing a strange behaviour with the compilation flag -O3.
The code essentially performs matrix multiplication with explicit loops and accumulates the result in another matrix, starting from two random matrices.
Compiling with ifort -O3 main.f90 and running the program, the final result comes full of zeros, and it shouldn't. Compiling with -O2 gives the expected behaviour.
The terminal output I get in my machine is:
a
3.920868194323862E-007 2.548044275764261E-002 0.352516161261067
0.666914481524251 0.963055531894656 0.838288203465982
0.335355043646496 0.915327203368213 0.795863676652503
b
0.832693143644796 0.345042693116063 0.871183932316783
8.991835668825542E-002 0.888283839684037 0.700978902440147
0.734552583860683 0.300175817923128 4.971772349719251E-002
c
0.000000000000000E+000 0.000000000000000E+000 0.000000000000000E+000
0.000000000000000E+000 0.000000000000000E+000 0.000000000000000E+000
0.000000000000000E+000 0.000000000000000E+000 0.000000000000000E+000
Somehow, if I change the order of the loops from jki to any other permutation, it runs perfectly.
Lastly, I have also tried with ifx 2025.2.0 and it runs normally.
Does anyone have an idea what is going on?
program main
implicit none
integer :: nrows, ncols, nintermediate, nrepeat, i
real(8), allocatable, dimension(:,:) :: a, b, c
nrows = 3
ncols = 3
nintermediate = 3
nrepeat = 1000000
allocate(a(nrows, nintermediate))
allocate(b(nintermediate, ncols))
allocate(c(nrows, ncols), source = 0.0d0)
call random_number(a)
call random_number(b)
c = 0.0d0
do i = 1, nrepeat
call multiply_add_jki_loop(a, b, c)
end do
write(*,*) "a"
write(*,*) a
write(*,*) "b"
write(*,*) b
write(*,*) "c"
write(*,*) c
contains
subroutine multiply_add_jki_loop(a, b, c)
real(8), dimension(:,:), intent(in) :: a, b
real(8), dimension(:,:), intent(inout) :: c
integer :: i, j, k
do j = 1, size(c, 2)
do k = 1, size(a, 2)
do i = 1, size(c, 1)
c(i, j) = c(i, j) + a(i, k) * b(k, j)
end do
end do
end do
end subroutine multiply_add_jki_loop
end program main
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Good catch! This looks very much like -O3 is "optimizing" out the whole loop:
- Printing out the c matrix at the end of every repeat loop you get the correct result.
- The -O3 run time is unaffected by the nrepeat size.
Unfortunately, we aren't going to get any more ifort fixes/releases.
And some of us need to keep using ifort due to ifx bugs or for 32-bit builds.
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Good catch! This looks very much like -O3 is "optimizing" out the whole loop:
- Printing out the c matrix at the end of every repeat loop you get the correct result.
- The -O3 run time is unaffected by the nrepeat size.
Unfortunately, we aren't going to get any more ifort fixes/releases.
And some of us need to keep using ifort due to ifx bugs or for 32-bit builds.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I understand. In any case, I wanted to leave this documented for future reference, if anyone needs.
Thank you for the answer.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
This looks like a compiler bug with -O3 in ifort. Since it works fine with -O2 and ifx, you can either use -O2 or try flags like -no-vec or -fp-model precise as a fix.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I suspected it would be something around that, but I was far from sure.
Thank you for the answer and recommendation.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Disabling vectorization (-no-vec or /Qvec-) isn't a work-around but I was surprised to find that using any floating point model other than fast is a work-around. I naively assumed that the logic to decide whether or not a loop can be omitted would be independent of the floating point model. I guess it is more complex than that under the hood, and the branch used for the fast floating point model is separate and has this bug. Fun stuff!

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page