OpenMP and Thread Checker question

Tim_Gallagher · ‎07-02-2010

Hi,

I've just started using OpenMP in my code and the answers are correct (I guess that's a start!). But, I ran the thread checker on it and I'm a bit confused by the results. I get many instances of thing similar to:

Memory write of centroid at
"structuredHex.f90":542 conflicts with a
prior memory write of
var$2919var$2921_dv_template.addr_a0 at
"wenoAMR.F90":94 (output dependence)

The code at "wenoAMR.F90":94 is inside of a parallel DO. The variable centroid is actually a RESULT of a function called from inside the parallel DO loop. Obviously var$2919var$2921_dv_template.addr_a0 is a compiler created variable.

I'm not sure how to fix it since I don't know what that compiler-created variable is.

A likely related question I have is how could I tell OpenMP that a shared variable is read-only during a certain section and so it doesn't need to be locked? For instance,

[bash]PROGRAM test
   IMPLICIT NONE

   REAL, DIMENSION(10,2) :: A
   INTEGER :: I, J

   A(:,1) = (/ (I, I=1,10) /)

!$OMP PARALLEL SHARED(A)
   DO J = 1, 10000000
!$OMP DO
   DO I = 1, SIZE(A)-1
      A(I,2) = A(I+1,1)-A(I-1,1)
   END DO
!$OMP END DO
   END DO
!$OMP END PARALLEL
   PRINT *, A(1:9,2)
END PROGRAM test
[/bash]

In the first loop, I could understand the compiler not knowing that accesses to A are independent, but they are. Thread checker says there is a data race condition on the loop, but there isn't really. As a side note, I ran some timing on this simple test and the OpenMP version takes 5.5 times longer than the serial version (compiled with -O0) so there's something strange there. Adding NOWAIT to the END DO cuts the run time in half, but it's still ~3 times slower with 2 threads. Maybe this is too simple of a test case...

Thanks,

Tim

TimP · ‎07-03-2010

If your usage of SIZE(A) means anything, it forces out of bounds access, so Thread Checker would be correct in pointing out the race condition. Did you mean SIZE(A,DIM=1) ? An example of the distinction is shown in the ifort docs.
NOWAIT would aggravate the race condition; apparently it means processing of the next value of J can begin before the current J iteration is complete. As you apparently intend each value of J to over-write results from the previous value, whether or not you intended the out-of-bounds access, it's difficult to see what you wish to demonstrate.

Tim_Gallagher · ‎07-03-2010

I suppose that's what happens when I write stuff really late at night... Yes, I meant SIZE(A,DIM=1).

What I'm trying to show is that the updating of A(I,2) depends on I+1 and I-1, which would be a data race condition if it was A(I+1,2). But thread checker reports it as a data race condition anyway, even though each thread can safely update it's A(:,2) section because A(:,1) is read-only.

You can ignore the J loop and the point remains, the J loop was just something I threw in so it would take more time to get a timing run out of it.

An example that Thread Checker reports with no problems is:

[fortran]PROGRAM test
   IMPLICIT NONE

   REAL, DIMENSION(10) :: A, B
   INTEGER :: I, J

   B(:) = (/ (I, I=1,10) /)

!$OMP PARALLEL SHARED(A) FIRSTPRIVATE(B) COPYIN(B)
!$OMP DO
   DO I = 1, SIZE(A)-1
      A(I) = B(I+1)-B(I-1)
   END DO
!$OMP END DO NOWAIT
!$OMP END PARALLEL
   PRINT *, A(1:9)
END PROGRAM test
[/fortran]

This is functionally the same code (answers will always be the same), but this is data-race free. The problem is making a copy of B in my actual application -- making a copy of the entire initial conditions array for each thread would explode the memory usage to unacceptable levels.

Does that make more sense now that I'm a little less tired? Sorry for the bad example earlier...

Tim

Tim_Gallagher · ‎07-03-2010

There's still an array out of bounds thing since I starts at 1, it should start at 2...

That's what happens when I compile without -C and the code never crashes...

Tim