Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.
29238 Discussions

Reduction clause in OpenMP with ifort

gottoomanyaccounts
1,453 Views
Dear forum,

I am trying to use the 'reduction' clause with ifort version 11.1.038, but get segmentation fault error. I have been trying with the version ifort 10.1 in the past, out of luck too.

I am attaching the code at then end, the code was compiled with
ifort -warn all -check all -openmp -o reduction.x reduction.f90

Any help is much appreciated. Thanks.


file reduction.f90:
===============
program test_reduction

integer, parameter :: n=100
real :: r = 0.45
real, dimension(n,n,n) :: a, summ
integer :: i,j,k


summ = 0.

!$OMP PARALLEL DO &
!$OMP default(shared) &
!$OMP reduction(+: summ) &
!$OMP private(i,j,k)
do i = 1,n
do j = 1,n
do k = 1,n
a(i,j,k) = (i+j)/k
summ(i,j,k) = summ(i,j,k) + a(i,j,k)*r
enddo
enddo
enddo
!$OMP END PARALLEL DO

write(*,*) summ

end program
0 Kudos
6 Replies
TimP
Honored Contributor III
1,453 Views
Your reduction variable must be a scalar. Alternatively, you could remove the reduction clause, and your code would have a meaning, but not a reduction.
Without OpenMP, you might expect ifort to optimize the loop nesting, at least at -O3, but the OpenMP would probably prevent optimization.
0 Kudos
gottoomanyaccounts
1,453 Views
Quoting - tim18
Your reduction variable must be a scalar. Alternatively, you could remove the reduction clause, and your code would have a meaning, but not a reduction.
Without OpenMP, you might expect ifort to optimize the loop nesting, at least at -O3, but the OpenMP would probably prevent optimization.

Thanks for the answer. Is 'the reduction variable must be a scalar' the OpenMP standard? My impression is I have seen examples using reduction array..

The above code is what I used to test the reduction clause, in reality, the different private loops are likely to sum to the same array position; in this case, simply removing the reduction clause will result in race condition, right? What is the recommended solution in this situation? Thanks a lot.


0 Kudos
TimP
Honored Contributor III
1,453 Views
When you store to different array elements in each thread, as you have written it, there is no race, and no reduction. Reduction is for combining results from multiple threads, where, as you say, if multiple threads tried to add directly to a single variable, without reduction or critical section, there would be a race.
You could do in effect a reduction from rank 3 to rank 2. I don't see why you would use an OpenMP reduction operation for that, as you would want each thread to have its own results, no race with the other threads.
It looks like you are saying that what you show here doesn't represent your actual problem, which may be one which isn't reliable to parallelize.
0 Kudos
gottoomanyaccounts
1,453 Views
Quoting - tim18
It looks like you are saying that what you show here doesn't represent your actual problem, which may be one which isn't reliable to parallelize.

That's correct. My actual code looks like something like this
do m = 1, N
i = f(m) ! i is a function of m
j = g(m) ! j is another function of m
s(i,j) = s(i,j) + a(i,j)
enddo

Different m values may result in the same (i,j), so what should I do in this case?

Thanks.
0 Kudos
jimdempseyatthecove
Honored Contributor III
1,453 Views

That's correct. My actual code looks like something like this
do m = 1, N
i = f(m) ! i is a function of m
j = g(m) ! j is another function of m
s(i,j) = s(i,j) + a(i,j)
enddo

Different m values may result in the same (i,j), so what should I do in this case?

Thanks.

Do your own reduction of array

[cpp]s = 0.0
!$omp parallel private(i,j,m,sLocal)
sLocal = 0.0
!$omp do
do m = 1, N
i = f(m)  ! i is a function of m
j = g(m) ! j is another function of m
s(i,j) = s(i,j) + a(i,j)
enddo
!$omp end do
!$omp critical
s = s + sLocal
!$omp end critical
!$omp end parallel

[/cpp]
Jim Dempsey
0 Kudos
jimdempseyatthecove
Honored Contributor III
1,453 Views

Note, if s is large, replace the critical section with nested loops containing a barrier. With each thread working on a different section of the array.

me = omp_get_thread_num()
do j=1,jMax
mej = me+j
if(mej .gt. jMax) mej = mej - jMax
do i=1,iMax
s(i,mej) = s(i,mej) + sLocal(i,mej)
end do
!$omp barrier
end do

Jim Dempsey
0 Kudos
Reply