OpenMP parallel loop crashes (?)

Marios_G_ · ‎03-31-2014

Hello everybody,

I am trying to make the section of my code to run parallel:

....

      
    EL=0.0d0

 !$OMP parallel DO SHARED(S,COUL) PRIVATE(I1,J1,ID,JD) reduction(+:EL) 
   
    DO J1=1,NY
    DO I1=1,NX
  
 IF ((J1/=J.OR.I1/=I).AND.(J1/=J.OR.I1/=IP(I)).AND.(J1/=J.OR.I1/=IM(I)).AND.(J1/=JP(J).OR.I1/=I).AND.(J1/=JM(J).OR.I1/=I)) THEN 
 

    IF (ABS(FLOAT(I)-FLOAT(I1)) <= ABS(FLOAT(I)+LLEN-FLOAT(I1))) THEN 
    ID= INT(ABS(FLOAT(I)-FLOAT(I1)))
    ELSE 
    ID= INT(ABS(FLOAT(I)+LLEN-FLOAT(I1)))
    END IF
 
    IF (ABS(FLOAT(J)-FLOAT(J1)) <= ABS(FLOAT(J)+LLEN-FLOAT(J1))) THEN 
    JD= INT(ABS(FLOAT(J)-FLOAT(J1)))
    ELSE 
    JD= INT(ABS(FLOAT(J)+LLEN-FLOAT(J1)))
    END IF

 
    EL= EL + LAMDA*COUL(ID,JD)*dble(S(I1,J1))
 !   Cen(I,J)= Cen(I,J) +  LAMDA*COUL(ID,JD)*dble(S(I1,J1))
 
 END IF
 
   
    END DO
    END DO
 
 !$OMP END PARALLEL DO

...

where COUL is a matrix determined earlier in the code.

I get no compilation or build errors but at run time the program exits when it enters the parallel loop. It just crashes with no run-time error!

Any ideas?

Thanks,

Marios

jimdempseyatthecove · ‎03-31-2014

Try turning on array subscripting bounds checks.

If nothing is obvious, insert some PRINT statements to trace the progress.

I assume LLEN and LAMDA are defined.

Jim Dempsey

jimdempseyatthecove · ‎03-31-2014

Also,

If this is a release mode issue, then from VS click on

Debug | Start Without Debugging

This is different than Run

Run will close the CMD window. If errors were displayed, you won't see them.

Start Without Debugging leaves the CMD window open after run. Any error messages displayed in the CMD window can then be read.

Jim Dempsey

Marios_G_ · ‎03-31-2014

I run it in linux with ifort:

ifort -O3 -warn all -xSSE4.2 -parallel -par-report[1] -openmp -o run.out Source1.f90

and got the run-time error message:

Segmentation fault (core dumped)

At least now I do get an error message! Any ideas about how to fix it?

I note that I used the command ulimit -s unlimited prior to compiling

Marios

John_Campbell · ‎03-31-2014

You might want to review the SHARED list or as Jim has indicated, consider what to do if ID or JD = 0

[fortran]

EL=0.0d0

!$OMP parallel DO SHARED(LAMDA,COUL,S,LLEN,I,J,IP,IM,JP,JM) PRIVATE(I1,J1,ID,JD) reduction(+:EL)

DO J1=1,NY
DO I1=1,NX

!   not sure of this test is sufficient
     IF ( (J1/=J    .OR.I1/=I)     .AND. &           ! .not. ( J1==j     .and. I1==I     )
          (J1/=J    .OR.I1/=IP(I)) .AND. &           ! .not. ( J1==j     .and. I1==IP(I) )
          (J1/=J    .OR.I1/=IM(I)) .AND. &           ! .not. ( J1==j     .and. I1==IM(I) )
          (J1/=JP(J).OR.I1/=I)     .AND. &           ! .not. ( J1==JP(J) .and. I1==I     )
          (J1/=JM(J).OR.I1/=I)           ) THEN       ! .not. ( J1==JM(J) .and. I1==I     )
!   could be
     if ( j1==j .and. (i1==i .or. i1==IP(i) .or. i1==IM(i)) ) cycle
     if ( i1==i .and. (j1==j .or. j1==JP(J) .or. j1==JM(j)) ) cycle

         IF (ABS(FLOAT(I)-FLOAT(I1)) <= ABS(FLOAT(I)+LLEN-FLOAT(I1))) THEN
           ID = INT (ABS(FLOAT(I)-FLOAT(I1)))
         ELSE
           ID = INT (ABS(FLOAT(I)+LLEN-FLOAT(I1)))
         END IF

         IF (ABS(FLOAT(J)-FLOAT(J1)) <= ABS(FLOAT(J)+LLEN-FLOAT(J1))) THEN
           JD = INT (ABS(FLOAT(J)-FLOAT(J1)))
         ELSE
           JD = INT (ABS(FLOAT(J)+LLEN-FLOAT(J1)))
         END IF
!
!      Could be written as
         ID = MIN ( ABS(I-I1), ABS(I+LLEN-I1) )
         JD = MIN ( ABS(J-J1), ABS(J+LLEN-J1) )
         if (ID==0 .or. JD==0 ) ?????? for COUL(ID,JD)

         EL = EL + LAMDA*COUL(ID,JD)*dble(S(I1,J1))
!        Cen(I,J)= Cen(I,J) + LAMDA*COUL(ID,JD)*dble(S(I1,J1))

END IF

END DO
END DO

!$OMP END PARALLEL DO

[/fortran]

jimdempseyatthecove · ‎04-01-2014

Do not use ulimit/ulimited for multi-threaded programs. Pick a reasonable size.

Jim Dempsey

Marios_G_ · ‎04-01-2014

Dear John Campbell,

ID and JD can't be both zero, so that's ok. The OpenMP statement is correct, at least I don't get an error message.

The changes you proposed made my code run a bit faster, so thank you!

One note though:

the IF-CYCLE construct shoulbe be like this:

IF (J1==J .AND. (I1==I .OR. I1==IP(I) .OR. I1==IM(I))) CYCLE

IF (I1==I .AND. (J1==JP(J) .OR. J1==JM(J))) CYCLE

since I1==I,J1==J is excluded from by the first IF

The OpenMP statement is correct, at least I don't get an error message. I get a stack overflow message when I execute it in parallel. If I turn on the heap arrays compiler option the program runs normally but it's slower than the sequential. Any ideas about that?

John_Campbell · ‎04-01-2014

The concern that I had related to the use of COUL(ID,JD) when ID or JD are zero, which depends on how it is declared. To not have a problem, it would need to be something like real COUL(0:md,0:md). ( I am assuming COUL is an array and not a function )

With regard to the $OMP parallel DO declaration, my preference is to explicitly declare all variables as shared or private.

Finally, the effectiveness of !$OMP requires that the do loops perform a sufficient amount of computation to overcome the overhead of setting up the threads. The code structure is effectively,

!$OMP parallel DO SHARED(S,COUL) PRIVATE(I1,J1,ID,JD) reduction(+:EL)

DO J1=1,NY

call getavailable thread

call perform the inner loops with allocated thread

END DO ! J1

!$OMP END PARALLEL DO

Where the inner loop cycle is performed by an allocated thread and all private variables must be allocated
This loop is:
   DO I1=1,NX

   IF ((J1/=J.OR.I1/=I).AND.
       (J1/=J.OR.I1/=IP(I)).AND.
       (J1/=J.OR.I1/=IM(I)).AND.
       (J1/=JP(J).OR.I1/=I).AND.
       (J1/=JM(J).OR.I1/=I)) THEN

   IF (ABS(FLOAT(I)-FLOAT(I1)) <= ABS(FLOAT(I)+LLEN-FLOAT(I1))) THEN
   ID= INT(ABS(FLOAT(I)-FLOAT(I1)))
   ELSE
   ID= INT(ABS(FLOAT(I)+LLEN-FLOAT(I1)))
   END IF

   IF (ABS(FLOAT(J)-FLOAT(J1)) <= ABS(FLOAT(J)+LLEN-FLOAT(J1))) THEN
   JD= INT(ABS(FLOAT(J)-FLOAT(J1)))
   ELSE
   JD= INT(ABS(FLOAT(J)+LLEN-FLOAT(J1)))
   END IF
   EL= EL + LAMDA*COUL(ID,JD)*dble(S(I1,J1))
!   Cen(I,J)= Cen(I,J) + LAMDA*COUL(ID,JD)*dble(S(I1,J1))

END IF
END DO

This is essentially only :
   DO I1=1,NX
      EL= EL + LAMDA*COUL(ID,JD)*dble(S(I1,J1))
   END DO

This loop might be much better vectorised.
If the % of IF tests that exclude the computation are very small, then it might be better to replace the if test by a zero factor in COUL, remove LAMDA from the loop and take the performance gains from vectorisation, although the use of ID and JD could limit vectorisation.
( could LAMDA*COUL(ID,JD) be converted to a vector coul_jd(1:NX) outside the DO I1 loop then use a dot_product for this loop ? )

John

jimdempseyatthecove · ‎04-02-2014

In front of:

EL= EL + LAMDA*COUL(ID,JD)*dble(S(I1,J1))

Insert some asserts to bounds check the arrays.
The compiler has an option to do this, the symptom you were seeing was as if the arrays were indexed out of bounds.

IF(ID .LE. LBOUND(COUL, DIM=1)) PRINT *, "ID .LE. LBOUND(COUL, DIM=1)", ID, LBOUND(COUL, DIM=1)
...

*** Do not assume anything about the bounds and validity of COUL and S ***

Also, if COUL and S are DUMMY arguments with explicit shape or explicit size, then assure that the actual arguments (those of the caller) match the requirements of the DUMMY argument.

If the above does not resolve anything then insert a PRINT in an appropriate place to trace the progress in hope of diagnosing the problem.

Jim Dempsey