- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The following code does not yield the correct result with openmp enabled (version 18.0.1.156). Bug?
program test implicit none real :: val call sumarray(10,val) print *,val contains subroutine sumarray(n,val) real :: y(n),val integer :: n,i y=0.0 !$omp parallel do reduction(+:y) do i=1,100 y(mod(i,n)+1)=y(mod(i,n)+1)+1 enddo val=sum(y) end subroutine end program
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
As n==10, you have every 10th element of y incremented 10 times. Unless restricted to 1 thread, there is a race condition, as more than 1 thread could be updating some elements of y. If you ran one of the parallel checking tools and this didn't raise a warning, that would be a bug.
Array reduction means that each element of an array has its own reduction, with the final assignment performed by one thread. Even with the expected restrictions, it tends to be inefficient beyond a small number of threads. Perhaps the restrictions aren't evident in the standard, and I sympathize with not trying to read the source code of an openmp implementation to see whether such a thing is supported.
You could simplify what you have written perhaps using i=1,100,10 (without mod) and then have it work as expected, but it would not depend on the reduction clause.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks for your answer. I don't really understand it, though. With the reduction clause, there shouldn't be any race condition involving updates of y--that is its very raison-d'etre, no? I would have thought that if an array is in the reduction clause, each thread has a private array y, collecting partial sums, and these private y's are then combined into the overall sum on exit of the parallel region. What am I missing?
(The code is obviously useless and inefficient, the point here is only to document an unexpected behavior).
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I should add that everything works as expected if the array y in the subroutine has fixed size 10, that is with
subroutine sumarray(n,val) real :: y(10),val integer :: n,i y=0.0 !$omp parallel do reduction(+:y) do i=1,100 y(mod(i,n)+1)=y(mod(i,n)+1)+1 enddo val=sum(y) end subroutine
which is further evidence that this is a compiler bug...
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
In fact, even the following fails to yield the correct result in all optimization levels.
program test implicit none real :: val call sumarray(10,val) print *,val contains subroutine sumarray(n,val) real :: y(n),val integer :: n y=0.0 !$omp parallel do reduction(+:y) do i=1,100 y=y+1 enddo val=sum(y) end subroutine end program
I suspect that if an array is automatic with a size that is an argument to a subroutine, the private versions in a reduction clause are not correctly initialized.
I would appreciate if somebody from Intel could weigh in. Thank you.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I agree that this issue is a bug. It occurs in compiler versions 18.0.0 and 18.0.1 (and even when running on a single thread).
The good news is that both your examples run correctly with the recently released update 18.0.2 (and also with the earlier version 17 compiler). Please could you give the latest compiler a try and see if you agree?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Yes, appears fixed in 18.0.2. Thank you for your response!

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page