Incorrect summation of values using openMP threads

Ajay_N_ · ‎03-07-2014

Hi,

My openMP thread tries to do the below:

arr(1) = arr(1) + (a(j) + b(j))

Where j corresponds to the each openmp thread being executed and arr(1) being the shared memory location. The above expression being associative, after the completion of thread it should ideally give me arr(1) equal to serial execution. Unfortunately that does not seem to be the case and I get different results in the arr(1) everytime I execute (and occasionally the correct value as well). I have also verified the contents of a(j) and b(j) are exactly the same. Is there anything that I am missing here?
Any insights to the above issue?

Thanks
Ajay

Ajay_N_ · ‎03-07-2014

Quick edit: the above expression is a multiplication and not addition arr(1) = arr(1) + (a(j) * b(j))

Steven_L_Intel1 · ‎03-07-2014

Try adding /assume:protect_parens and see what happens.

Roman1 · ‎03-07-2014

If arr(1) is in a shared memory location, then the summation should be in a critical section. Is it? You can't have more than one thread modifying the same memory location, otherwise you can get unpredictable results.

Ajay_N_ · ‎03-07-2014

Hello Steve,

Thanks for your reply. I am not aware of this directive, do I have to do this during compilation. I am working on Visual Studio.

- Ajay

pbkenned1 · ‎03-07-2014

There's a data race on reading/writing arr(1) by multiple threads. You should use an omp reduction to avoid this, although the clause doesn't accept a subobject like arr(1). So copy arr(1) to a scalar and put that in the reduction clause. Something like:

program foorace
use omp_lib

implicit none
integer a(100), b(100), arr(1), i, a1
arr(1) = 0

do i=1,100
a(i) = i
b(i) = i
end do

a1 = arr(1)

!$omp parallel do reduction(+:a1)
   do i = 1,100
      a1 = a1 + (a(omp_get_thread_num()) * b(omp_get_thread_num()))
   end do
!$omp end parallel do

print *,'a1 = ',a1

end program foorace

Patrick

Ajay_N_ · ‎03-07-2014

Thanks for the replies. I understand the part that multiple threads try to access the same data, but given the expression even if there is a race condition wont the output come same.
So say a = a+b is the expression. So does openmp execute parallel threads on partial execution of the expression.
I mean does
thread 1 compute a = a+b and then thread 2 computes a = a+b then we dont have any issues even they are executed in any order.
Unless
thread 1 loads a into memory and now thread 2 also loads a into memory, and then they execute this is a problematic scenario !!

So if latter case is how openmp executes, then I need a way to use the reduction in my case because the expression happens in a function

do i = 1,100
call subroutine_math(i)
end do

and subroutine_math does the computation on the global variables.
Any pointers on how to approach this problem in my context ?

Ajay

jimdempseyatthecove · ‎03-07-2014

Use reduction if multiple threads writing to arr(1).

arr(1) = 0.0
!$OMP PARALLEL DO REDUCTION(+:arr(1))
DO j=1,N
arr(1) = arr(1) + (a(j) * b(j))
end DO

Jim Dempsey

Ajay_N_ · ‎03-07-2014

Hi Jim,

Do you think reduction is applicable to my case ?

Thanks
Ajay

TimP · ‎03-07-2014

To the extent you have described what you are doing reduction is clearly indicated. The reduction variable should be a local scalar as Patrick showed. If you are looking for performance you will use vector reduction in each thread e.g. dot_product as well as distributing work across threads.

jimdempseyatthecove · ‎03-08-2014

Tim,

I could see where a reduction of arr(J) might be an issue in particular when J could be modified within the region, however, arr(1) is completely resolvable to a scalar (assuming arr is an array and not a pointer).

In any event, should the compiler complain, the programmer could use EQUIVILENCE arr(1), arrOne to get around this inconvenience.

Jim Dempsey

Steven_L_Intel1 · ‎03-08-2014

If you want to try /assume:protect_parens, add it under Fortran > Command Line > Additional Options.

John_Campbell · ‎03-09-2014

Patrick Kennedy (Intel) wrote:

There's a data race on reading/writing arr(1) by multiple threads. You should use an omp reduction to avoid this, although the clause doesn't accept a subobject like arr(1). So copy arr(1) to a scalar and put that in the reduction clause. Something like:

program foorace
use omp_lib

implicit none
integer a(100), b(100), arr(1), i, a1
arr(1) = 0

do i=1,100
a(i) = i
b(i) = i
end do

a1 = arr(1)

!$omp parallel do reduction(+:a1)
   do i = 1,100
      a1 = a1 + (a(omp_get_thread_num()) * b(omp_get_thread_num()))
   end do
!$omp end parallel do

print *,'a1 = ',a1

end program foorace

Patrick

Patrick,

I have not before seen the coding " a1 = a1 + (a(omp_get_thread_num()) * b(omp_get_thread_num()))"

Could you explain what this achieves.
Could it be similar to the following which I think achieves what was intended:

[fortran]a1 = arr(1)
!$omp parallel do reduction(+:a1)
do i = 1,100
a1 = a1 + ( a(i) * b(i) )
end do
!$omp end parallel do
arr(1) = a1[/fortran]

John

jimdempseyatthecove · ‎03-10-2014

The above is incorrect or may have redundant statement.

!$omp parallel do reduction(+:a1)

The above makes a private (stack copy) of a1 and initialize to 0. Note, reduction operators of + or - are initialize to 0, * initializes to 1. Then at end of parallel region, each thread's value of a1 is thread safely accumulated into the outer scoped a1.

Therefore "a1 = arr(1)" is useless since the threads copies of a1 will be zeroed.
The final "arr(1) = a1" is wrong when the original (non-zero) value of arr(1) is to be included in the sum. In this case use arr(1) = arr(1) + a1.
However, should you want zero initialization then the "a1 = arr(1)" is correct.

Jim Dempsey

John_Campbell · ‎03-10-2014

Jim,

Then, would the following best approach what was wanted ?

[fortran]    a1 = 0
!$omp parallel do reduction(+:a1)
    do i = 1,100
        a1 = a1 + ( a(i) * b(i) )
    end do
!$omp end parallel do
    arr(1) = arr(1) + a1[/fortran]

Also, is the following a misprint, or does this structure achieve something I am not aware of ?

a1 = a1 + (a(omp_get_thread_num()) * b(omp_get_thread_num()))

John

pbkenned1 · ‎03-10-2014

>>>a1 = a1 + (a(omp_get_thread_num()) * b(omp_get_thread_num()))

The initial description of the problem had this expression: arr(1) = arr(1) + (a(j) * b(j)), with a comment:

"Where j corresponds to the each openmp thread being executed and arr(1) being the shared memory location."

patrick

jimdempseyatthecove · ‎03-10-2014

The initial a1=0 is unnecessary.

!$omp parallel do
do j = 1,100
c(j) = a(j) * b(j)
...

In the above loop, assuming two threads in team and static scheduling, one thread will iterate j over 1:50 and the other over 51:100

You normally would not compute the indexes using a call to omp_get_thread_num(). This said, sometimes you may want to manipulate data based on the thread number. In these situations, you often use omp_get_thread_num() once storing the result into a private variable such as iThread. Note, "thread_num" is a misnomer. The function returns a 0-based thread number of the thread team constituted for the parallel region. If nested parallel regions are in effect, each region that runs concurrently, will have different thread teams, each with 0-based thread numbering.

Jim Dempsey