The following code does not yield the correct result with openmp enabled (version 220.127.116.11). Bug?
program test implicit none real :: val call sumarray(10,val) print *,val contains subroutine sumarray(n,val) real :: y(n),val integer :: n,i y=0.0 !$omp parallel do reduction(+:y) do i=1,100 y(mod(i,n)+1)=y(mod(i,n)+1)+1 enddo val=sum(y) end subroutine end program
As n==10, you have every 10th element of y incremented 10 times. Unless restricted to 1 thread, there is a race condition, as more than 1 thread could be updating some elements of y. If you ran one of the parallel checking tools and this didn't raise a warning, that would be a bug.
Array reduction means that each element of an array has its own reduction, with the final assignment performed by one thread. Even with the expected restrictions, it tends to be inefficient beyond a small number of threads. Perhaps the restrictions aren't evident in the standard, and I sympathize with not trying to read the source code of an openmp implementation to see whether such a thing is supported.
You could simplify what you have written perhaps using i=1,100,10 (without mod) and then have it work as expected, but it would not depend on the reduction clause.
Thanks for your answer. I don't really understand it, though. With the reduction clause, there shouldn't be any race condition involving updates of y--that is its very raison-d'etre, no? I would have thought that if an array is in the reduction clause, each thread has a private array y, collecting partial sums, and these private y's are then combined into the overall sum on exit of the parallel region. What am I missing?
(The code is obviously useless and inefficient, the point here is only to document an unexpected behavior).
I should add that everything works as expected if the array y in the subroutine has fixed size 10, that is with
subroutine sumarray(n,val) real :: y(10),val integer :: n,i y=0.0 !$omp parallel do reduction(+:y) do i=1,100 y(mod(i,n)+1)=y(mod(i,n)+1)+1 enddo val=sum(y) end subroutine
which is further evidence that this is a compiler bug...
In fact, even the following fails to yield the correct result in all optimization levels.
program test implicit none real :: val call sumarray(10,val) print *,val contains subroutine sumarray(n,val) real :: y(n),val integer :: n y=0.0 !$omp parallel do reduction(+:y) do i=1,100 y=y+1 enddo val=sum(y) end subroutine end program
I suspect that if an array is automatic with a size that is an argument to a subroutine, the private versions in a reduction clause are not correctly initialized.
I would appreciate if somebody from Intel could weigh in. Thank you.
I agree that this issue is a bug. It occurs in compiler versions 18.0.0 and 18.0.1 (and even when running on a single thread).
The good news is that both your examples run correctly with the recently released update 18.0.2 (and also with the earlier version 17 compiler). Please could you give the latest compiler a try and see if you agree?