- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
My openMP thread tries to do the below:
arr(1) = arr(1) + (a(j) + b(j))
Where j corresponds to the each openmp thread being executed and arr(1) being the shared memory location. The above expression being associative, after the completion of thread it should ideally give me arr(1) equal to serial execution. Unfortunately that does not seem to be the case and I get different results in the arr(1) everytime I execute (and occasionally the correct value as well). I have also verified the contents of a(j) and b(j) are exactly the same. Is there anything that I am missing here?
Any insights to the above issue?
Thanks
Ajay
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Quick edit: the above expression is a multiplication and not addition arr(1) = arr(1) + (a(j) * b(j))
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Try adding /assume:protect_parens and see what happens.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
If arr(1) is in a shared memory location, then the summation should be in a critical section. Is it? You can't have more than one thread modifying the same memory location, otherwise you can get unpredictable results.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello Steve,
Thanks for your reply. I am not aware of this directive, do I have to do this during compilation. I am working on Visual Studio.
- Ajay
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
There's a data race on reading/writing arr(1) by multiple threads. You should use an omp reduction to avoid this, although the clause doesn't accept a subobject like arr(1). So copy arr(1) to a scalar and put that in the reduction clause. Something like:
program foorace
use omp_lib
implicit none
integer a(100), b(100), arr(1), i, a1
arr(1) = 0
do i=1,100
a(i) = i
b(i) = i
end do
a1 = arr(1)
!$omp parallel do reduction(+:a1)
do i = 1,100
a1 = a1 + (a(omp_get_thread_num()) * b(omp_get_thread_num()))
end do
!$omp end parallel do
print *,'a1 = ',a1
end program foorace
Patrick
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks for the replies. I understand the part that multiple threads try to access the same data, but given the expression even if there is a race condition wont the output come same.
So say a = a+b is the expression. So does openmp execute parallel threads on partial execution of the expression.
I mean does
thread 1 compute a = a+b and then thread 2 computes a = a+b then we dont have any issues even they are executed in any order.
Unless
thread 1 loads a into memory and now thread 2 also loads a into memory, and then they execute this is a problematic scenario !!
So if latter case is how openmp executes, then I need a way to use the reduction in my case because the expression happens in a function
do i = 1,100
call subroutine_math(i)
end do
and subroutine_math does the computation on the global variables.
Any pointers on how to approach this problem in my context ?
Ajay
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Use reduction if multiple threads writing to arr(1).
arr(1) = 0.0
!$OMP PARALLEL DO REDUCTION(+:arr(1))
DO j=1,N
arr(1) = arr(1) + (a(j) * b(j))
end DO
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Jim,
Do you think reduction is applicable to my case ?
Thanks
Ajay
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Tim,
I could see where a reduction of arr(J) might be an issue in particular when J could be modified within the region, however, arr(1) is completely resolvable to a scalar (assuming arr is an array and not a pointer).
In any event, should the compiler complain, the programmer could use EQUIVILENCE arr(1), arrOne to get around this inconvenience.
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
If you want to try /assume:protect_parens, add it under Fortran > Command Line > Additional Options.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Patrick Kennedy (Intel) wrote:
There's a data race on reading/writing arr(1) by multiple threads. You should use an omp reduction to avoid this, although the clause doesn't accept a subobject like arr(1). So copy arr(1) to a scalar and put that in the reduction clause. Something like:
program foorace
use omp_libimplicit none
integer a(100), b(100), arr(1), i, a1
arr(1) = 0do i=1,100
a(i) = i
b(i) = i
end doa1 = arr(1)
!$omp parallel do reduction(+:a1)
do i = 1,100
a1 = a1 + (a(omp_get_thread_num()) * b(omp_get_thread_num()))
end do
!$omp end parallel doprint *,'a1 = ',a1
end program foorace
Patrick
Patrick,
I have not before seen the coding " a1 = a1 + (a(omp_get_thread_num()) * b(omp_get_thread_num()))"
Could you explain what this achieves.
Could it be similar to the following which I think achieves what was intended:
[fortran]a1 = arr(1)
!$omp parallel do reduction(+:a1)
do i = 1,100
a1 = a1 + ( a(i) * b(i) )
end do
!$omp end parallel do
arr(1) = a1[/fortran]
John
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The above is incorrect or may have redundant statement.
!$omp parallel do reduction(+:a1)
The above makes a private (stack copy) of a1 and initialize to 0. Note, reduction operators of + or - are initialize to 0, * initializes to 1. Then at end of parallel region, each thread's value of a1 is thread safely accumulated into the outer scoped a1.
Therefore "a1 = arr(1)" is useless since the threads copies of a1 will be zeroed.
The final "arr(1) = a1" is wrong when the original (non-zero) value of arr(1) is to be included in the sum. In this case use arr(1) = arr(1) + a1.
However, should you want zero initialization then the "a1 = arr(1)" is correct.
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Jim,
Then, would the following best approach what was wanted ?
[fortran] a1 = 0
!$omp parallel do reduction(+:a1)
do i = 1,100
a1 = a1 + ( a(i) * b(i) )
end do
!$omp end parallel do
arr(1) = arr(1) + a1[/fortran]
Also, is the following a misprint, or does this structure achieve something I am not aware of ?
a1 = a1 + (a(omp_get_thread_num()) * b(omp_get_thread_num()))
John
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
>>>a1 = a1 + (a(omp_get_thread_num()) * b(omp_get_thread_num()))
The initial description of the problem had this expression: arr(1) = arr(1) + (a(j) * b(j)), with a comment:
"Where j corresponds to the each openmp thread being executed and arr(1) being the shared memory location."
patrick
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The initial a1=0 is unnecessary.
!$omp parallel do
do j = 1,100
c(j) = a(j) * b(j)
...
In the above loop, assuming two threads in team and static scheduling, one thread will iterate j over 1:50 and the other over 51:100
You normally would not compute the indexes using a call to omp_get_thread_num(). This said, sometimes you may want to manipulate data based on the thread number. In these situations, you often use omp_get_thread_num() once storing the result into a private variable such as iThread. Note, "thread_num" is a misnomer. The function returns a 0-based thread number of the thread team constituted for the parallel region. If nested parallel regions are in effect, each region that runs concurrently, will have different thread teams, each with 0-based thread numbering.
Jim Dempsey

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page