Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.
Announcements
FPGA community forums and blogs have moved to the Altera Community. Existing Intel Community members can sign in with their current credentials.

Incorrect summation of values using openMP threads

Ajay_N_
Beginner
2,392 Views

Hi,

My openMP thread tries to do the below:

arr(1) = arr(1) + (a(j) + b(j))

Where j corresponds to the each openmp thread being executed and arr(1) being the shared memory location. The above expression being associative, after the completion of thread it should ideally give me arr(1) equal to serial execution. Unfortunately that does not seem to be the case and I get different results in the arr(1)  everytime I execute (and occasionally the correct value as well). I have also verified the contents of a(j) and b(j) are exactly the same. Is there anything that I am missing here?
Any insights to the above issue?

Thanks
Ajay


 

0 Kudos
16 Replies
Ajay_N_
Beginner
2,392 Views

Quick edit: the above expression is a multiplication and not addition arr(1) = arr(1) + (a(j) * b(j))

0 Kudos
Steven_L_Intel1
Employee
2,392 Views

Try adding /assume:protect_parens and see what happens.

0 Kudos
Roman1
New Contributor I
2,392 Views

If arr(1) is in a shared memory location, then the summation should be in a critical section.  Is it?  You can't have more than one thread modifying the same memory location, otherwise you can get unpredictable results.

 

0 Kudos
Ajay_N_
Beginner
2,392 Views

Hello Steve, 

Thanks for your reply. I am not aware of this directive, do I have to do this during compilation. I am working on Visual Studio.

- Ajay

0 Kudos
pbkenned1
Employee
2,392 Views

There's a data race on reading/writing arr(1) by multiple threads.  You should use an omp reduction to avoid this, although the clause doesn't accept a subobject like arr(1).  So copy arr(1) to a scalar and put that in the reduction clause.  Something like:

program foorace
use omp_lib

implicit none
integer a(100), b(100), arr(1), i, a1
arr(1) = 0

do i=1,100
 a(i) = i
 b(i) = i
end do

a1 = arr(1)

!$omp parallel do reduction(+:a1)
   do i = 1,100
      a1 = a1 + (a(omp_get_thread_num()) * b(omp_get_thread_num()))
   end do
!$omp end parallel do

print *,'a1 = ',a1

end program foorace

Patrick

0 Kudos
Ajay_N_
Beginner
2,392 Views

Thanks for the replies. I understand the part that multiple threads try to access the same data, but given the expression even if there is a race condition wont the output come same.
So say a = a+b is the expression. So does openmp execute parallel threads on partial execution of the expression.
I mean does
thread 1 compute a = a+b and then thread 2 computes a = a+b then we dont have any issues even they are executed in any order.
Unless
thread 1 loads a into memory and now thread 2 also loads a into memory, and then they execute this is a problematic scenario !! 

So if latter case is how openmp executes, then I need a way to use the reduction in my case because the expression happens in a function

 do i = 1,100
      call subroutine_math(i)
   end do

and subroutine_math does the computation on the global variables. 
Any pointers on how to approach this problem in my context ?

Ajay
 

 

0 Kudos
jimdempseyatthecove
Honored Contributor III
2,392 Views

Use reduction if multiple threads writing to arr(1).

arr(1) = 0.0
!$OMP PARALLEL DO REDUCTION(+:arr(1))
DO j=1,N
arr(1) = arr(1) + (a(j) * b(j))
end DO
 

Jim Dempsey

0 Kudos
Ajay_N_
Beginner
2,392 Views

Hi Jim,

Do you think reduction is applicable to my case ?

Thanks
Ajay

0 Kudos
TimP
Honored Contributor III
2,392 Views
To the extent you have described what you are doing reduction is clearly indicated. The reduction variable should be a local scalar as Patrick showed. If you are looking for performance you will use vector reduction in each thread e.g. dot_product as well as distributing work across threads.
0 Kudos
jimdempseyatthecove
Honored Contributor III
2,392 Views

Tim,

I could see where a reduction of arr(J) might be an issue in particular when J could be modified within the region, however, arr(1) is completely resolvable to a scalar (assuming arr is an array and not a pointer).

In any event, should the compiler complain, the programmer could use EQUIVILENCE arr(1), arrOne to get around this inconvenience.

Jim Dempsey

0 Kudos
Steven_L_Intel1
Employee
2,392 Views

If you want to try /assume:protect_parens, add it under Fortran > Command Line > Additional Options.

0 Kudos
John_Campbell
New Contributor II
2,392 Views

Patrick Kennedy (Intel) wrote:

There's a data race on reading/writing arr(1) by multiple threads.  You should use an omp reduction to avoid this, although the clause doesn't accept a subobject like arr(1).  So copy arr(1) to a scalar and put that in the reduction clause.  Something like:

program foorace
use omp_lib

implicit none
integer a(100), b(100), arr(1), i, a1
arr(1) = 0

do i=1,100
 a(i) = i
 b(i) = i
end do

a1 = arr(1)

!$omp parallel do reduction(+:a1)
   do i = 1,100
      a1 = a1 + (a(omp_get_thread_num()) * b(omp_get_thread_num()))
   end do
!$omp end parallel do

print *,'a1 = ',a1

end program foorace

Patrick

Patrick,

I have not before seen the coding " a1 = a1 + (a(omp_get_thread_num()) * b(omp_get_thread_num()))"

Could you explain what this achieves.
Could it be similar to the following which I think achieves what was intended:

[fortran]a1 = arr(1)
!$omp parallel do reduction(+:a1)
do i = 1,100
a1 = a1 + ( a(i) * b(i) )
end do
!$omp end parallel do
arr(1) = a1[/fortran]

John

0 Kudos
jimdempseyatthecove
Honored Contributor III
2,392 Views

The above is incorrect or may have redundant statement.

!$omp parallel do reduction(+:a1)

The above makes a private (stack copy) of a1 and initialize to 0. Note, reduction operators of + or - are initialize to 0, * initializes to 1. Then at end of parallel region, each thread's value of a1 is thread safely accumulated into the outer scoped a1.

Therefore "a1 = arr(1)" is useless since the threads copies of a1 will be zeroed.
The final "arr(1) = a1" is wrong when the original (non-zero) value of arr(1) is to be included in the sum. In this case use arr(1) = arr(1) + a1.
However, should you want zero initialization then the "a1 = arr(1)" is correct.

Jim Dempsey

0 Kudos
John_Campbell
New Contributor II
2,392 Views

Jim,

Then, would the following best approach what was wanted ?

[fortran]    a1 = 0
!$omp parallel do reduction(+:a1)
    do i = 1,100
        a1 = a1 + ( a(i) * b(i) )
    end do
!$omp end parallel do
    arr(1) = arr(1) + a1[/fortran]

Also, is the following a misprint, or does this structure achieve something I am not aware of ?

   a1 = a1 + (a(omp_get_thread_num()) * b(omp_get_thread_num()))

John

0 Kudos
pbkenned1
Employee
2,392 Views

>>>a1 = a1 + (a(omp_get_thread_num()) * b(omp_get_thread_num()))

The initial description of the problem had this expression: arr(1) = arr(1) + (a(j) * b(j)), with a comment:

"Where j corresponds to the each openmp thread being executed and arr(1) being the shared memory location."

patrick

 

0 Kudos
jimdempseyatthecove
Honored Contributor III
2,392 Views

The initial a1=0 is unnecessary.

!$omp parallel do
do j = 1,100
c(j) = a(j) * b(j)
...

In the above loop, assuming two threads in team and static scheduling, one thread will iterate j over 1:50 and the other over 51:100

You normally would not compute the indexes using a call to omp_get_thread_num(). This said, sometimes you may want to manipulate data based on the thread number. In these situations, you often use omp_get_thread_num() once storing the result into a private variable such as iThread. Note, "thread_num" is a misnomer. The function returns a 0-based thread number of the thread team constituted for the parallel region. If nested parallel regions are in effect, each region that runs concurrently, will have different thread teams, each with 0-based thread numbering.

Jim Dempsey
 

0 Kudos
Reply