OMP Reduction failure

roy-dennington · ‎03-21-2019

MacOS (High Sierra) with intel compiler version 19.0.2.184 Build 20190117

I checked the documentation. I either missed a limitation or it might be a bug. I know OMP reduction of arrays have evolved, so maybe I missed something. Any feedback is greatly appreciated.

Thanks,

Roy

I could not get the file uploader to work, so here is "test_hcore.F":

C Test example to demonstrate problem with subroutine hcore using
C an allocated array with reduction in an openmp parallel section.
C This does not reproduce the crash observed in our code, but does
C demonstrate a genuine problem.
C
C Compilation command:
C ifort -O3 test_hcore.F -qopenmp -o test_hcore.exe
C
C Expected results:
C ENUCLR = 5*NUMAT*(NUMAT-1)/2
C DBLKH(I) = (NUMAT-1) for all values of I
C
C Problem: For macos10 (High Sierra) with intel compiler version 19.0.2.184 Build 20190117
C the final elements of DBLKH are zero.  The problem likely existed in the 2018 intel compiler.
C
C This test works as expected for other compilers, including
C macos10 (El Capitan) with intel compiler version 15.0.1.108 Build 20141022.
C It also works if DBLKH is statically allocated or passed in as a dummy argument,
C or if openmp is not used.
C
      PROGRAM TEST
      IMPLICIT REAL*8 (A-H,O-Z)
      INTEGER, PARAMETER :: NUMAT=5
      DIMENSION NFIRST(NUMAT),NLAST(NUMAT),IBAND(NUMAT),NBAND(NUMAT)
      DIMENSION E1B(45),E2A(45)
      ALLOCATABLE :: DBLKH(:)
C
C     EXPLICITLY FILL IN NECESSARY ARRAYS
C
      DO I=1,NUMAT
         NFIRST(I) = (I-1)*4+1
         NLAST(I) = I*4
         IBAND(I) = (I-1)*10+1
         NBAND(I) = 10
      ENDDO !I
      NDSIZE = 10*NUMAT
C
      ALLOCATE( DBLKH(NDSIZE), STAT=IORR )
      IF( IORR.NE.0 ) STOP
      ENUCLR = 0D0
      DBLKH = 0D0
C
C$OMP PARALLEL DO SCHEDULE(AUTO)
C$OMP& PRIVATE(II,JJ,IA,IB,N1,IBND,ILEN,JA,JB,N2,JBND,JLEN)
C$OMP& PRIVATE(E1B,E2A,ENUC)
C$OMP& REDUCTION(+:ENUCLR,DBLKH)
      DO II=2,NUMAT
         IA = NFIRST(II)
         IB = NLAST(II)
         N1 = IB-IA+1
         IBND = IBAND(II)
         ILEN = NBAND(II)
         DO JJ=1,II-1
            JA = NFIRST(JJ)
            JB = NLAST(JJ)
            N2 = JB-JA+1
            JBND = IBAND(JJ)
            JLEN = NBAND(JJ)
C
C           REPLACE CALL TO INTEGRALS WITH ARBITRARY FILL VALUES
C
            E1B = 1.D0
            E2A = 1.D0
            ENUC = 5.D0
C
C           INCLUDE THE ELECTRON-NUCLEAR ATTRACTION TERMS FOR ATOMS II, THEN JJ (E1B, E2A)
C           AND THE NUCLEAR-NUCLEAR TERM ENUC.
C
            IF( ILEN.GT.0 )CALL VECADD(ILEN,DBLKH(IBND),E1B,DBLKH(IBND))
            IF( JLEN.GT.0 )CALL VECADD(JLEN,DBLKH(JBND),E2A,DBLKH(JBND))
            ENUCLR = ENUCLR + ENUC
         ENDDO !JJ
      ENDDO !II
C$OMP END PARALLEL DO
C
C     PRINT RESULT
C
      WRITE(6,*) 'PRINT DBLKH AT END',ENUCLR
      WRITE(6,'(6F12.6)') (DBLKH(I),I=1,NDSIZE)
      WRITE(6,*) 'EXPECTED RESULTS',5.D0*DBLE(NUMAT*(NUMAT-1)/2)
      WRITE(6,'(6F12.6)') (DBLE(NUMAT-1),I=1,NDSIZE)
      END
C**********************************************************************+
      SUBROUTINE VECADD ( N, A, B, C )
      IMPLICIT REAL*8 (A-H,O-Z)
      INTEGER, INTENT(IN) :: N
      REAL*8, INTENT(IN) :: A(N),B(N)
      REAL*8, INTENT(OUT) :: C(N)
C----------------------------------------------------------------------+
C     ADD COMPONENTS OF TWO VECTORS TO PRODUCE A THIRD VECTOR
C     C(I) = A(I) + B(I)
C----------------------------------------------------------------------+
      DO I=1,N
         C(I) = A(I) + B(I)
      ENDDO !I
      END

Output:

PRINT DBLKH AT END 50.0000000000000
4.000000 4.000000 4.000000 4.000000 4.000000 4.000000
4.000000 4.000000 4.000000 4.000000 4.000000 4.000000
4.000000 4.000000 4.000000 4.000000 4.000000 4.000000
4.000000 4.000000 4.000000 4.000000 4.000000 4.000000
4.000000 4.000000 4.000000 4.000000 4.000000 4.000000
4.000000 4.000000 4.000000 4.000000 4.000000 4.000000
4.000000 4.000000 4.000000 4.000000 4.000000 4.000000
4.000000 0.000000 0.000000 0.000000 0.000000 0.000000
0.000000 0.000000
EXPECTED RESULTS 50.0000000000000
4.000000 4.000000 4.000000 4.000000 4.000000 4.000000
4.000000 4.000000 4.000000 4.000000 4.000000 4.000000
4.000000 4.000000 4.000000 4.000000 4.000000 4.000000
4.000000 4.000000 4.000000 4.000000 4.000000 4.000000
4.000000 4.000000 4.000000 4.000000 4.000000 4.000000
4.000000 4.000000 4.000000 4.000000 4.000000 4.000000
4.000000 4.000000 4.000000 4.000000 4.000000 4.000000
4.000000 4.000000 4.000000 4.000000 4.000000 4.000000
4.000000 4.000000

Umar__Sait · ‎03-24-2019

I think the variables in the inner loop should not be in the PRIVATE statement. I tested it on 2019 version and got your result first but after removing those variables out of PRIVATE statement the problem was rectified.

Umar__Sait · ‎03-24-2019

If you remove the variables of the inner loop out of the PRIVATE sratement it works fine.

jimdempseyatthecove · ‎03-26-2019

It is unfortunate that Roy edited the original post to remove the error. After doing so, the comments by Umar are befuddling. In the forum, it is better to leave the bad code, and then post affirmative reply to the others that offer a solution. If you still desire to make the edit to the original (or referenced) post, then please leave an edit trail, such as using the strikethrough (S with bar on tool bar). Doing this will make the useful suggestions have meaning to the later readers of this thread.

Jim Dempsey

roy-dennington · ‎03-26-2019

Jim, I did not edit the code, or make the change Umar suggested to the posted code. The only change I made was to move the very last line ("Output") out of the code block. That was before Umar posted.

Umar, thank you for your response. I will test and post.

roy-dennington · ‎03-26-2019

Umar, I tried but CANNOT reproduce your findings. Could you tell me specifically which variables you took out of the PRIVATE list? In this context, shouldn't they be PRIVATE?

Thanks,

Roy

jimdempseyatthecove · ‎03-26-2019

You actually have a programming error

IF( ILEN.GT.0 )CALL VECADD(ILEN,DBLKH(IBND),E1B,DBLKH(IBND))
IF( JLEN.GT.0 )CALL VECADD(JLEN,DBLKH(JBND),E2A,DBLKH(JBND))

...

     SUBROUTINE VECADD ( N, A, B, C )
      IMPLICIT REAL*8 (A-H,O-Z)
      INTEGER, INTENT(IN) :: N
      REAL*8, INTENT(IN) :: A(N),B(N)
      REAL*8, INTENT(OUT) :: C(N)
In your subroutine, A and C, as called in your loop, are aliases this is in violation to the rules of Fortran.

You can get around this by making a generic VECADD, one that takes 3 arguments (adds 3rd arg to 2nd arg) and one that takes 4 arguments (what you currently have).

Jim Dempsey

roy-dennington · ‎03-27-2019

Yes, I see the violation. It is a carryover from Fortran 77 days when we played "fast and loose".

Here is version 2 of the program that simplifies the problem. Changes are given in code.

C Test example to demonstrate problem with subroutine hcore using
C an allocated array with reduction in an openmp parallel section.
C This does not reproduce the crash observed in our code, but does
C demonstrate a genuine problem.
C
C Compilation command:
C ifort -O3 test_hcore.F -qopenmp -warn all -o test_hcore.exe
C
C Expected results:
C ENUCLR = 5*NUMAT*(NUMAT-1)/2
C DBLKH(I) = (NUMAT-1) for all values of I
C
C Problem: For macos10 (High Sierra) with intel compiler version 19.0.2.184 Build 20190117
C the final elements of DBLKH are zero.  The problem likely existed in the 2018 intel compiler.
C
C This test works as expected for other compilers, including
C macos10 (El Capitan) with intel compiler version 15.0.1.108 Build 20141022.
C It also works if DBLKH is statically allocated or passed in as a dummy argument,
C or if openmp is not used.
C 
C - Avoid VECADD and make add loops explicit.
C - Add printing of DBLKH at end of loop II.  
C - Print the number of threads and the individual thread IDs
C - Explicitly set the number of processors
C - Simplified by removing NFIRST and NLAST and related indices
C - Explicitly define shared/private in Openmp loop
C - Change loop structure to simplify contributions to DBLKH.
C   This makes the problem much clearer!
C
C * * * NOTE: Eliminating ENUCLR or listing it second instead of first
C       in the reduction statements seems to work!?!?
C
C       Expected results:
C       ENUCLR = 5*NUMAT*(NUMAT-1)/2
C       DBLKH(I) = NUMAT*(NUMAT-1) for all values of I
C
      PROGRAM TEST
      IMPLICIT NONE
      INTEGER, PARAMETER :: NUMAT=5
      INTEGER NBAND(NUMAT),I,J,NDSIZE,II,JJ,ILEN,JLEN,IORR
      INTEGER NPROC,OMP_GET_THREAD_NUM,OMP_GET_NUM_THREADS
      REAL*8 E1B(45),E2A(45),ENUCLR,ENUC
      REAL*8, ALLOCATABLE :: DBLKH(:)
C
C     EXPLICITLY FILL IN NECESSARY ARRAYS
C
      DO I=1,NUMAT
         NBAND(I) = 10
      ENDDO !I
      NDSIZE = 10
C
      NPROC = MIN(12,NUMAT-1)
      CALL OMP_SET_NUM_THREADS(NPROC)
C
      ALLOCATE( DBLKH(NDSIZE), STAT=IORR )
      IF( IORR.NE.0 ) STOP
      ENUCLR = 0D0
      DBLKH = 0D0
C
C$OMP PARALLEL DO SCHEDULE(AUTO)
C$OMP& DEFAULT(NONE) SHARED(NBAND,NDSIZE)
C$OMP& PRIVATE(II,JJ,ILEN,JLEN,I,J)
C$OMP& PRIVATE(E1B,E2A,ENUC)
C$OMP& REDUCTION(+:ENUCLR,DBLKH)         !<-- THIS FAILS!?!?
C#RDD C$OMP& REDUCTION(+:DBLKH,ENUCLR)   !<-- THIS WORKS!
      DO II=2,NUMAT
         ILEN = NBAND(II)
         DO JJ=1,II-1
            JLEN = NBAND(JJ)
C
C           REPLACE CALL TO INTEGRALS WITH ARBITRARY FILL VALUES
C
            E1B = 1.D0
            E2A = 1.D0
            ENUC = 5.D0
C
C           INCLUDE THE ELECTRON-NUCLEAR ATTRACTION TERMS FOR ATOMS II, THEN JJ (E1B, E2A)
C           AND THE NUCLEAR-NUCLEAR TERM ENUC.
C
            DO I=1,ILEN
               DBLKH(I) = E1B(I) + DBLKH(I)
            ENDDO !I
            DO J=1,JLEN
               DBLKH(J) = E2A(J) + DBLKH(J)
            ENDDO !J
            ENUCLR = ENUCLR + ENUC
         ENDDO !JJ
C
         WRITE(6,*) 'DBLKH AT END OF LOOP II',II,OMP_GET_THREAD_NUM(),
     .      OMP_GET_NUM_THREADS()
         WRITE(6,'(6F12.6)') (DBLKH(I),I=1,NDSIZE)
C
      ENDDO !II
C$OMP END PARALLEL DO
C
C     PRINT RESULT
C
      WRITE(6,*) 'PRINT DBLKH AT END',ENUCLR
      WRITE(6,'(6F12.6)') (DBLKH(I),I=1,NDSIZE)
      WRITE(6,*) 'EXPECTED RESULTS',5.D0*DBLE(NUMAT*(NUMAT-1)/2)
      WRITE(6,'(6F12.6)') (DBLE(NUMAT*(NUMAT-1)),I=1,NDSIZE)
      END

roy-dennington · ‎03-27-2019

Notice the first seven entries are still zero for each thread. They appeared to be shifted 7. Then when the parallel loop ends they shift back and the seven numbers to the right are zero.

Here is the output:

DBLKH AT END OF LOOP II 3 0 4
DBLKH AT END OF LOOP II 4 1 4
DBLKH AT END OF LOOP II 2 2 4
0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
0.000000 4.000000 4.000000 4.000000
0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
0.000000 2.000000 2.000000 2.000000
0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
0.000000 6.000000 6.000000 6.000000
DBLKH AT END OF LOOP II 5 3 4
0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
0.000000 8.000000 8.000000 8.000000
PRINT DBLKH AT END 50.0000000000000
20.000000 20.000000 20.000000 0.000000 0.000000 0.000000
0.000000 0.000000 0.000000 0.000000
EXPECTED RESULTS 50.0000000000000
20.000000 20.000000 20.000000 20.000000 20.000000 20.000000
20.000000 20.000000 20.000000 20.000000

roy-dennington · ‎03-27-2019

Switching the order of the variables in the REDUCTION statement:

064C$OMP& REDUCTION(+:ENUCLR,DBLKH) !<-- THIS FAILS!?!?

065C#RDD C$OMP& REDUCTION(+:DBLKH,ENUCLR) !<-- THIS WORKS!

By removing ENUCLR or changing the order, the code works as expected.

jimdempseyatthecove · ‎03-28-2019

>>Switching the order of the variables in the REDUCTION statement:

Woah something seriously wrong there.

Great detective work.

Did you submit this reproducer to the Intel support site?

Just ran your code IVF 2019 update 1, Windows x64 Debug mode.

The allocation of DBLKH in the serial section of the code shows DBLKH(1:10).
Running in Debug build with runtime checks (array bounds):

forrtl: severe (408): fort: (11): Subscript #1 of the array DBLKH has value 1
which is less than the lower bound of 8

IOW the array descriptor of the "private/reduction" copy of DBLKH is bonkers.

Jim Dempsey

roy-dennington · ‎03-28-2019

I submitted a support ticket. Jim, thank you for taking the time and effort to help me.

Roy