Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.

OMP Reduction failure

roy-dennington
Beginner
1,046 Views

MacOS (High Sierra) with intel compiler version 19.0.2.184 Build 20190117

I checked the documentation.  I either missed a limitation or it might be a bug.  I know OMP reduction of arrays have evolved, so maybe I missed something.  Any feedback is greatly appreciated.

Thanks,

Roy

I could not get the file uploader to work, so here is "test_hcore.F":

C Test example to demonstrate problem with subroutine hcore using
C an allocated array with reduction in an openmp parallel section.
C This does not reproduce the crash observed in our code, but does
C demonstrate a genuine problem.
C
C Compilation command:
C ifort -O3 test_hcore.F -qopenmp -o test_hcore.exe
C
C Expected results:
C ENUCLR = 5*NUMAT*(NUMAT-1)/2
C DBLKH(I) = (NUMAT-1) for all values of I
C
C Problem: For macos10 (High Sierra) with intel compiler version 19.0.2.184 Build 20190117
C the final elements of DBLKH are zero.  The problem likely existed in the 2018 intel compiler.
C
C This test works as expected for other compilers, including
C macos10 (El Capitan) with intel compiler version 15.0.1.108 Build 20141022.
C It also works if DBLKH is statically allocated or passed in as a dummy argument,
C or if openmp is not used.
C
      PROGRAM TEST
      IMPLICIT REAL*8 (A-H,O-Z)
      INTEGER, PARAMETER :: NUMAT=5
      DIMENSION NFIRST(NUMAT),NLAST(NUMAT),IBAND(NUMAT),NBAND(NUMAT)
      DIMENSION E1B(45),E2A(45)
      ALLOCATABLE :: DBLKH(:)
C
C     EXPLICITLY FILL IN NECESSARY ARRAYS
C
      DO I=1,NUMAT
         NFIRST(I) = (I-1)*4+1
         NLAST(I) = I*4
         IBAND(I) = (I-1)*10+1
         NBAND(I) = 10
      ENDDO !I
      NDSIZE = 10*NUMAT
C
      ALLOCATE( DBLKH(NDSIZE), STAT=IORR )
      IF( IORR.NE.0 ) STOP
      ENUCLR = 0D0
      DBLKH = 0D0
C
C$OMP PARALLEL DO SCHEDULE(AUTO)
C$OMP& PRIVATE(II,JJ,IA,IB,N1,IBND,ILEN,JA,JB,N2,JBND,JLEN)
C$OMP& PRIVATE(E1B,E2A,ENUC)
C$OMP& REDUCTION(+:ENUCLR,DBLKH)
      DO II=2,NUMAT
         IA = NFIRST(II)
         IB = NLAST(II)
         N1 = IB-IA+1
         IBND = IBAND(II)
         ILEN = NBAND(II)
         DO JJ=1,II-1
            JA = NFIRST(JJ)
            JB = NLAST(JJ)
            N2 = JB-JA+1
            JBND = IBAND(JJ)
            JLEN = NBAND(JJ)
C
C           REPLACE CALL TO INTEGRALS WITH ARBITRARY FILL VALUES
C
            E1B = 1.D0
            E2A = 1.D0
            ENUC = 5.D0
C
C           INCLUDE THE ELECTRON-NUCLEAR ATTRACTION TERMS FOR ATOMS II, THEN JJ (E1B, E2A)
C           AND THE NUCLEAR-NUCLEAR TERM ENUC.
C
            IF( ILEN.GT.0 )CALL VECADD(ILEN,DBLKH(IBND),E1B,DBLKH(IBND))
            IF( JLEN.GT.0 )CALL VECADD(JLEN,DBLKH(JBND),E2A,DBLKH(JBND))
            ENUCLR = ENUCLR + ENUC
         ENDDO !JJ
      ENDDO !II
C$OMP END PARALLEL DO
C
C     PRINT RESULT
C
      WRITE(6,*) 'PRINT DBLKH AT END',ENUCLR
      WRITE(6,'(6F12.6)') (DBLKH(I),I=1,NDSIZE)
      WRITE(6,*) 'EXPECTED RESULTS',5.D0*DBLE(NUMAT*(NUMAT-1)/2)
      WRITE(6,'(6F12.6)') (DBLE(NUMAT-1),I=1,NDSIZE)
      END
C**********************************************************************+
      SUBROUTINE VECADD ( N, A, B, C )
      IMPLICIT REAL*8 (A-H,O-Z)
      INTEGER, INTENT(IN) :: N
      REAL*8, INTENT(IN) :: A(N),B(N)
      REAL*8, INTENT(OUT) :: C(N)
C----------------------------------------------------------------------+
C     ADD COMPONENTS OF TWO VECTORS TO PRODUCE A THIRD VECTOR
C     C(I) = A(I) + B(I)
C----------------------------------------------------------------------+
      DO I=1,N
         C(I) = A(I) + B(I)
      ENDDO !I
      END

Output:

 PRINT DBLKH AT END   50.0000000000000     
    4.000000    4.000000    4.000000    4.000000    4.000000    4.000000
    4.000000    4.000000    4.000000    4.000000    4.000000    4.000000
    4.000000    4.000000    4.000000    4.000000    4.000000    4.000000
    4.000000    4.000000    4.000000    4.000000    4.000000    4.000000
    4.000000    4.000000    4.000000    4.000000    4.000000    4.000000
    4.000000    4.000000    4.000000    4.000000    4.000000    4.000000
    4.000000    4.000000    4.000000    4.000000    4.000000    4.000000
    4.000000    0.000000    0.000000    0.000000    0.000000    0.000000
    0.000000    0.000000

 EXPECTED RESULTS   50.0000000000000     
    4.000000    4.000000    4.000000    4.000000    4.000000    4.000000
    4.000000    4.000000    4.000000    4.000000    4.000000    4.000000
    4.000000    4.000000    4.000000    4.000000    4.000000    4.000000
    4.000000    4.000000    4.000000    4.000000    4.000000    4.000000
    4.000000    4.000000    4.000000    4.000000    4.000000    4.000000
    4.000000    4.000000    4.000000    4.000000    4.000000    4.000000
    4.000000    4.000000    4.000000    4.000000    4.000000    4.000000
    4.000000    4.000000    4.000000    4.000000    4.000000    4.000000
    4.000000    4.000000
 

0 Kudos
11 Replies
Umar__Sait
Novice
1,046 Views

I think the variables in the inner loop should not be in the PRIVATE statement. I tested it on 2019 version and got your result first but after removing those variables out of PRIVATE statement the problem was rectified.

0 Kudos
Umar__Sait
Novice
1,046 Views

If you remove the variables of the inner loop out of the PRIVATE sratement it works fine.

0 Kudos
jimdempseyatthecove
Honored Contributor III
1,046 Views

It is unfortunate that Roy edited the original post to remove the error. After doing so, the comments by Umar are befuddling. In the forum, it is better to leave the bad code, and then post affirmative reply to the others that offer a solution. If you still desire to make the edit to the original (or referenced) post, then please leave an edit trail, such as using the strikethrough (S with bar on tool bar). Doing this will make the useful suggestions have meaning to the later readers of this thread.

Jim Dempsey

0 Kudos
roy-dennington
Beginner
1,046 Views

Jim, I did not edit the code, or make the change Umar suggested to the posted code.  The only change I made was to move the very last line ("Output") out of the code block.  That was before Umar posted.

Umar, thank you for your response.  I will test and post.

0 Kudos
roy-dennington
Beginner
1,046 Views

Umar,  I tried but CANNOT reproduce your findings.  Could you tell me specifically which variables you took out of the PRIVATE list?  In this context, shouldn't they be PRIVATE?

Thanks,

Roy

0 Kudos
jimdempseyatthecove
Honored Contributor III
1,046 Views

You actually have a programming error

            IF( ILEN.GT.0 )CALL VECADD(ILEN,DBLKH(IBND),E1B,DBLKH(IBND))
            IF( JLEN.GT.0 )CALL VECADD(JLEN,DBLKH(JBND),E2A,DBLKH(JBND))

...

     SUBROUTINE VECADD ( N, A, B, C )
      IMPLICIT REAL*8 (A-H,O-Z)
      INTEGER, INTENT(IN) :: N
      REAL*8, INTENT(IN) :: A(N),B(N)
      REAL*8, INTENT(OUT) :: C(N)
In your subroutine, A and C, as called in your loop, are aliases this is in violation to the rules of Fortran.

You can get around this by making a generic VECADD, one that takes 3 arguments (adds 3rd arg to 2nd arg) and one that takes 4 arguments (what you currently have).

Jim Dempsey

0 Kudos
roy-dennington
Beginner
1,046 Views

Yes, I see the violation.  It is a carryover from Fortran 77 days when we played "fast and loose".

Here is version 2 of the program that simplifies the problem.  Changes are given in code.

C Test example to demonstrate problem with subroutine hcore using
C an allocated array with reduction in an openmp parallel section.
C This does not reproduce the crash observed in our code, but does
C demonstrate a genuine problem.
C
C Compilation command:
C ifort -O3 test_hcore.F -qopenmp -warn all -o test_hcore.exe
C
C Expected results:
C ENUCLR = 5*NUMAT*(NUMAT-1)/2
C DBLKH(I) = (NUMAT-1) for all values of I
C
C Problem: For macos10 (High Sierra) with intel compiler version 19.0.2.184 Build 20190117
C the final elements of DBLKH are zero.  The problem likely existed in the 2018 intel compiler.
C
C This test works as expected for other compilers, including
C macos10 (El Capitan) with intel compiler version 15.0.1.108 Build 20141022.
C It also works if DBLKH is statically allocated or passed in as a dummy argument,
C or if openmp is not used.
C 
C - Avoid VECADD and make add loops explicit.
C - Add printing of DBLKH at end of loop II.  
C - Print the number of threads and the individual thread IDs
C - Explicitly set the number of processors
C - Simplified by removing NFIRST and NLAST and related indices
C - Explicitly define shared/private in Openmp loop
C - Change loop structure to simplify contributions to DBLKH.
C   This makes the problem much clearer!
C
C * * * NOTE: Eliminating ENUCLR or listing it second instead of first
C       in the reduction statements seems to work!?!?
C
C       Expected results:
C       ENUCLR = 5*NUMAT*(NUMAT-1)/2
C       DBLKH(I) = NUMAT*(NUMAT-1) for all values of I
C
      PROGRAM TEST
      IMPLICIT NONE
      INTEGER, PARAMETER :: NUMAT=5
      INTEGER NBAND(NUMAT),I,J,NDSIZE,II,JJ,ILEN,JLEN,IORR
      INTEGER NPROC,OMP_GET_THREAD_NUM,OMP_GET_NUM_THREADS
      REAL*8 E1B(45),E2A(45),ENUCLR,ENUC
      REAL*8, ALLOCATABLE :: DBLKH(:)
C
C     EXPLICITLY FILL IN NECESSARY ARRAYS
C
      DO I=1,NUMAT
         NBAND(I) = 10
      ENDDO !I
      NDSIZE = 10
C
      NPROC = MIN(12,NUMAT-1)
      CALL OMP_SET_NUM_THREADS(NPROC)
C
      ALLOCATE( DBLKH(NDSIZE), STAT=IORR )
      IF( IORR.NE.0 ) STOP
      ENUCLR = 0D0
      DBLKH = 0D0
C
C$OMP PARALLEL DO SCHEDULE(AUTO)
C$OMP& DEFAULT(NONE) SHARED(NBAND,NDSIZE)
C$OMP& PRIVATE(II,JJ,ILEN,JLEN,I,J)
C$OMP& PRIVATE(E1B,E2A,ENUC)
C$OMP& REDUCTION(+:ENUCLR,DBLKH)         !<-- THIS FAILS!?!?
C#RDD C$OMP& REDUCTION(+:DBLKH,ENUCLR)   !<-- THIS WORKS!
      DO II=2,NUMAT
         ILEN = NBAND(II)
         DO JJ=1,II-1
            JLEN = NBAND(JJ)
C
C           REPLACE CALL TO INTEGRALS WITH ARBITRARY FILL VALUES
C
            E1B = 1.D0
            E2A = 1.D0
            ENUC = 5.D0
C
C           INCLUDE THE ELECTRON-NUCLEAR ATTRACTION TERMS FOR ATOMS II, THEN JJ (E1B, E2A)
C           AND THE NUCLEAR-NUCLEAR TERM ENUC.
C
            DO I=1,ILEN
               DBLKH(I) = E1B(I) + DBLKH(I)
            ENDDO !I
            DO J=1,JLEN
               DBLKH(J) = E2A(J) + DBLKH(J)
            ENDDO !J
            ENUCLR = ENUCLR + ENUC
         ENDDO !JJ
C
         WRITE(6,*) 'DBLKH AT END OF LOOP II',II,OMP_GET_THREAD_NUM(),
     .      OMP_GET_NUM_THREADS()
         WRITE(6,'(6F12.6)') (DBLKH(I),I=1,NDSIZE)
C
      ENDDO !II
C$OMP END PARALLEL DO
C
C     PRINT RESULT
C
      WRITE(6,*) 'PRINT DBLKH AT END',ENUCLR
      WRITE(6,'(6F12.6)') (DBLKH(I),I=1,NDSIZE)
      WRITE(6,*) 'EXPECTED RESULTS',5.D0*DBLE(NUMAT*(NUMAT-1)/2)
      WRITE(6,'(6F12.6)') (DBLE(NUMAT*(NUMAT-1)),I=1,NDSIZE)
      END

 

0 Kudos
roy-dennington
Beginner
1,046 Views

Notice the first seven entries are still zero for each thread.  They appeared to be shifted 7.  Then when the parallel loop ends they shift back and the seven numbers to the right are zero.

Here is the output:

 DBLKH AT END OF LOOP II           3           0           4
 DBLKH AT END OF LOOP II           4           1           4
 DBLKH AT END OF LOOP II           2           2           4
    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000
    0.000000    4.000000    4.000000    4.000000
    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000
    0.000000    2.000000    2.000000    2.000000
    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000
    0.000000    6.000000    6.000000    6.000000
 DBLKH AT END OF LOOP II           5           3           4
    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000
    0.000000    8.000000    8.000000    8.000000
 PRINT DBLKH AT END   50.0000000000000     
   20.000000   20.000000   20.000000    0.000000    0.000000    0.000000
    0.000000    0.000000    0.000000    0.000000
 EXPECTED RESULTS   50.0000000000000     
   20.000000   20.000000   20.000000   20.000000   20.000000   20.000000
   20.000000   20.000000   20.000000   20.000000
 

0 Kudos
roy-dennington
Beginner
1,046 Views

Switching the order of the variables in the REDUCTION statement:

064C$OMP& REDUCTION(+:ENUCLR,DBLKH)         !<-- THIS FAILS!?!?

065C#RDD C$OMP& REDUCTION(+:DBLKH,ENUCLR)   !<-- THIS WORKS!

By removing ENUCLR or changing the order, the code works as expected.

0 Kudos
jimdempseyatthecove
Honored Contributor III
1,046 Views

>>Switching the order of the variables in the REDUCTION statement:

Woah something seriously wrong there.

Great detective work.

Did you submit this reproducer to the Intel support site?

Just ran your code IVF 2019 update 1, Windows x64 Debug mode.

The allocation of DBLKH in the serial section of the code shows DBLKH(1:10).
Running in Debug build with runtime checks (array bounds):

forrtl: severe (408): fort: (11): Subscript #1 of the array DBLKH has value 1
which is less than the lower bound of 8

IOW the array descriptor of the "private/reduction" copy of DBLKH is bonkers.

Jim Dempsey

 

0 Kudos
roy-dennington
Beginner
1,046 Views

I submitted a support ticket.  Jim, thank you for taking the time and effort to help me.

Roy

0 Kudos
Reply