Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.

OpenMp turncation error

chat1983
Beginner
471 Views

Hi,
Im quite new to openMP and I found that the results of the serial and parallel version of the following code has a difference at the last decimalplace.

I use intel fortran compiler in an intel 64 bit machine under fedora 10.

Can anybody pls tell me a way how can this be elminated.

----------------------------------------------------------------------------
program main
implicit none
integer i,j,l
double precision a(9999)

c initialisation
do i=1,9999
a(i)=i**(.2)
enddo
----------------------------------------------------------
!$omp parallel do default(shared) private(i,j)
do j=1,20
do i=1,9999
a(i)=a(i)+i/3.d0
enddo
enddo
!$omp end parallel do
-----------------------------------------------------------

open(file='pxx',unit=30)
do i=1,9999
write(30,*) i,a(i)
enddo

end
-----------------------------------------------------------


here is some sample result comparison.
9959c9959
< 9959 66399.6377247174
---
> 9959 66399.6377247175
9973c9973
< 9973 66492.9728295009
---
> 9973 66492.9728295008
9980c9980
< 9980 66539.6403811772
---
> 9980 66539.6403811773
9985c9985
< 9985 66572.9743463199
---
> 9985 66572.9743463198
 

0 Kudos
4 Replies
TimP
Honored Contributor III
471 Views
There's a fair chance your j loop is optimized away (pushed inside the i loop and evaluated at compile time) when you disable OpenMP. Did you check for that, and for ways to prevent it, if it's not what you intended?
Also, are you taking care to disable transformations such as i/3d0 => i* .33333333333333333d0, if you don't intend them to happen in one or the other case (-prec-div vs. -noprec-div) ?
As pointed out in the other forum where you showed this example, your parallel code doesn't solve a set problem, as the serial code does. The serial code updates each value in the array at each j iteration from the result of the previous j iteration. The parallelized code gives each thread a subset of the values of j to implement, with an indeterminate number of updates from other threads writing into the shared array at indeterminate points during the computation (race condition).
Even though the parallel version skips steps taken in the serial version (possibly at compile time), it probably takes longer. So, you wouldn't do an experiment like this as a way of either speeding up or attempting to replicate the serial result.
0 Kudos
chat1983
Beginner
471 Views

Hi Tim,

Thank you very much for your reply. As both you guys said, It is a date race condition. I could correct the error in my original program, the one Im converting to parallel.

0 Kudos
jimdempseyatthecove
Honored Contributor III
471 Views
This is not a data race condition. The result printout indicates a single bit difference in the least significant bit of the result. I suggest you consult the archetecture manual to see what influences rounding behavior, in particular, how is rounding of 0.5 of the mantissa is handled.

always round up
always round down
alternately round up/down
pseudo randomly round up/down

When either of the last two methods are employed, then by varying the thread count (slice points) it is possible to generate different rounding results.

Jim Dempsey
0 Kudos
Wendy_Doerner__Intel
Valued Contributor I
471 Views

Please also see the article on obtaining floating point consistency with our compiler.

------

Wendy

Attaching or including files in a post

0 Kudos
Reply