Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.
28459 Discussions

OpenMP Reduction on Array in COMMON block being set in non inlined subroutine.

Olivier_C_
New Contributor I
625 Views
Hello,

I need to perform a reduction on an array that is inside a named COMMON block.
When the reduction is in the same unit as the one where the parallel reduction is declared, everything goes fine.
When the reduction has to occur in subroutines that are called from the parallel region, then the result is not predictable.

The problem is rooted in the fact that a reduction induces a per thread array, but when the program reaches a subroutine it sees the global array only. If the array is declared threadprivate in the subroutine, then the output is not gathered.

If I mix-up threadprivate/reduction, then I have to use 2 different names for the common, which I would like to avoid.
Is there a design pattern or an OpenMP instruction to make as though the subroutines were inlined ?

It can be funny to look at the provided code in redko.f90, and play with the subroutine being in the same file or not as the main program.

Of course, I would be very grateful to any help, since I dearly need to overcome that tricky problem.

Oliver.
0 Kudos
1 Solution
pbkenned1
Employee
625 Views

I took a closer look at making the common block 'reduction' array threadprivate:

subroutine setvalue(i,j)

IMPLICIT NONE

integer :: redarray(2)

common /comarray/ redarray

!$OMP threadprivate(/comarray/)

integer :: i,j

redarray(j) = redarray(j) + i

end subroutine setvalue


That works fine, you just need to add up the partial sums after the OpenMP loop calling the subroutine (similar to what Jim suggested). Something like:

integer :: redarray(2), redarray_copy(2)

common /comarray/ redarray

!$OMP threadprivate(/comarray/)

....

redarray = 0

!$OMP PARALLEL COPYIN(redarray)

!$OMP DO

do i=1, n

call setvalue(i,indir(i))

enddo

!$OMP END DO

!$OMP SINGLE

j = omp_get_num_threads()

redarray_copy = 0

!$OMP END SINGLE

!$OMP DO REDUCTION(+:redarray_copy)

do i=1,j

do k=1,2

redarray_copy(k) = redarray_copy(k) + redarray(k)

enddo

enddo

!$OMP END DO

!$OMP END PARALLEL

redarray = redarray_copy


Patrick

View solution in original post

0 Kudos
10 Replies
TimP
Honored Contributor III
625 Views
If you are storing a reduction into an element of an array in common, I would think you would declare a local scalar reduction variable, rather than naming the entire array in the reduction. In your example, however, it doesn't look like the context is appropriate for reduction. Besides, it looks like you have intentionally created a race condition, so no advice we give can make the result "predictable" other than to eliminate the race.
0 Kudos
pbkenned1
Employee
625 Views
When you try to do the reduction on common block array member 'redarray' via the subroutine, there's no way for the compiler to know that redarray is a reduction variable, and so as Tim says, there's a data race on statement 'redarray(j) = redarray(j) + i'

If redarray was not a member of a common block, then you could pass it in the argument list of the subroutine, and the compiler would then know that it's a reduction variable, and the race conditional wouldelminated.

So, make a copy of redarray, pass the copy in the subroutine call argument list, and then update the common block array after the parallel region:

redarray_copy = 0

!$OMP PARALLEL

!$OMP DO REDUCTION(+:redarray_copy)

do i=1, n

call setvalue(i,indir(i),redarray_copy)

enddo

!$OMP END DO

!$OMP END PARALLEL

redarray = redarray_copy

The subroutine would then become

subroutine setvalue(i,j,subarray)

IMPLICIT NONE

integer, intent(inout) :: subarray(2)

integer :: i,j

subarray(j) = subarray(j) + i

end subroutine setvalue

This implementation is kinda ugly, but correct (race-free).

Patrick

0 Kudos
Olivier_C_
New Contributor I
625 Views
Thanks for the answers Patrick, but I would like to minimize the code change - of course I provide here a reproducer from a much bigger old code - and passing the array as an argument to the subroutine would make too much change - path to the subroutine is a bit long and there are many paths to it, and there are many "setvalue" routines.
I wished there were some other means to tell setvalue that it handles a reduction array...
The closest thing to what I would like in terms of minimal code change looks like the threadprivate/reduction stuff as in reduc.f90/setval.f90 below, but I found no way to use only one common block. In the real code, setval may be in parallel or non parallel context, as being called from different code sections.
Oliver.


0 Kudos
pbkenned1
Employee
625 Views
I can appreciate that you don't want to modify the argument list for a whole bunch of subroutines, although trying to make abig legacy code work with a modern compiler might require that. But anyway, if you can't/won't modify the calls, then I think you're stuck with you're double common block implementation. Of course you couldput a CRITICAL section around the update of the common block array in setvalue(i,j), but that just serializes things.

Patrick
0 Kudos
jimdempseyatthecove
Honored Contributor III
625 Views
Oliver,

After a short look at your code, you really do not have what constitutes a "reduction array".

The fly in your ointment is your indir(n) array containing 1,2,1,2,1,2,...

Resulting in a parallel loop, using x threads, xnot equal2, stomping on each others redarray(indir(i))

Consider something along the line of:

!$OMP PARALLEL
numThreads = omp_get_num_threads()
myThread = omp_get_thread_num()
if(myThread .lt. n-1) then
do i=1, n
if(mod(indir(i), numThreads) .eq. myThread) then
redarray(indir(i)) = redarray(indir(i))+ 1
endif
end do
endif
!$OMP END PARALLEL

Jim Dempsey
0 Kudos
pbkenned1
Employee
626 Views

I took a closer look at making the common block 'reduction' array threadprivate:

subroutine setvalue(i,j)

IMPLICIT NONE

integer :: redarray(2)

common /comarray/ redarray

!$OMP threadprivate(/comarray/)

integer :: i,j

redarray(j) = redarray(j) + i

end subroutine setvalue


That works fine, you just need to add up the partial sums after the OpenMP loop calling the subroutine (similar to what Jim suggested). Something like:

integer :: redarray(2), redarray_copy(2)

common /comarray/ redarray

!$OMP threadprivate(/comarray/)

....

redarray = 0

!$OMP PARALLEL COPYIN(redarray)

!$OMP DO

do i=1, n

call setvalue(i,indir(i))

enddo

!$OMP END DO

!$OMP SINGLE

j = omp_get_num_threads()

redarray_copy = 0

!$OMP END SINGLE

!$OMP DO REDUCTION(+:redarray_copy)

do i=1,j

do k=1,2

redarray_copy(k) = redarray_copy(k) + redarray(k)

enddo

enddo

!$OMP END DO

!$OMP END PARALLEL

redarray = redarray_copy


Patrick
0 Kudos
Olivier_C_
New Contributor I
625 Views
That's great ! I wish I could have thought doing so before !

To initialize, do you reckon a COPYIN or a mere:
!$OMP DO
do i=1,j
redarray = 0
enddo

To finalize, do you reckon a REDUCTION or a
!$OMP DO
do i=1, j
!$OMP CRITICAL
redarray_copy = redarray_copy + redarray
!$OMP END CRITICAL
enddo

Thansk a lot
0 Kudos
Olivier_C_
New Contributor I
625 Views
Thanks Jim, but indir is in fact not initialized this way: it 'is a cell to material non injective mapping.
Sorry, I should have pointed it out before. Cheers.
0 Kudos
pbkenned1
Employee
625 Views
Hello Oliver,
I'd recommend both COPYIN and REDUCTION over loop-based initialization/finalization, sincethey might be a little bit more efficient, especially the REDUCTION. REDUCTIONs are optimised for the target machine and don't have the global overhead of unnamed CRITICAL sections. If you prefer a CRITICAL section, I suggest using a named CRITICAL section...but don't use the same name you've given to the COMMON block (comarray). That's because CRITICAL section namesare in the same namespace as COMMON block names, and conflicts can result in undefined behaviour.

Patrick Kennedy
0 Kudos
Olivier_C_
New Contributor I
625 Views
Clear.
Thanks !
Oliver.
0 Kudos
Reply