Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.

OpenMP bug in fort

kay-diederichs
New Contributor I
2,366 Views

Dear all,

I have a larger program from which I distilled the following small reproducer. The program writes nbuf-sized chunks of output to "test.dat", but somehow the PRIVATE ibuf is set to 0 after the OMP DO loop of the parallel region, so that output lines that should go to "test.dat" are lost.

!$    USE OMP_LIB

      IMPLICIT NONE

      INTEGER, PARAMETER :: nbuf=100

      INTEGER nrow,ncol,i,j,ibuf

      REAL, ALLOCATABLE :: matrix(:,:)

      REAL cc,cct(nbuf)

      

      nrow=99

      ncol=65

      ALLOCATE(matrix(nrow,ncol))

      CALL RANDOM_NUMBER(matrix)

      OPEN(2,file='test.dat')

! calculate something, and write it out in batches of size nbuf 

!$OMP PARALLEL PRIVATE(i,j,cc,cct,ibuf) SHARED(nrow,ncol,matrix)

      ibuf=0

!$OMP DO 

      DO i=1,ncol-1

         DO j=i+1,ncol

           call get_stuff(matrix(1:nrow,i),matrix(1:nrow,j),nrow,cc)

           ibuf=ibuf+1

           cct(ibuf)=cc

           IF (ibuf==nbuf) THEN

!$OMP CRITICAL

             DO ibuf=1,nbuf

               WRITE(2,'(f0.3)') cct(ibuf) 

             END DO

!$OMP END CRITICAL

             ibuf=0

           END IF

         END DO

       END DO

!$OMP END DO   

! remaining buffers:

!$OMP CRITICAL

! if MOD(nount,nbuf)/=0 then the ibuf values should not be all 0

!$ print '(a,i3,a,i3,a)','thread=',omp_get_thread_num(),' ibuf=',ibuf,' (should not be all 0)'

      DO i=1,ibuf

        WRITE(2,'(f0.3)') cct(i) 

      END DO

!$OMP END CRITICAL

!$OMP END PARALLEL

 

CLOSE(2)

END

 

SUBROUTINE get_stuff(a,b,n,cc)

INTEGER, INTENT(IN) :: n

REAL, INTENT(IN) :: a(n),b(n)

REAL, INTENT(OUT) :: cc

cc=SUM(a+b)/n/2

END SUBROUTINE get_stuff

 

This, when compiled with ifort -fopenmp from oneAPI 2023.0.0, wrongly prints ibuf=0 for all threads. Consequently, the number of lines in test.dat is wrong.

Other compilers, like Fortran, get this right, and I'm surprised that ifort doesn't. Or am I overlooking something?

Thanks,

Kay

 

 

Labels (3)
0 Kudos
1 Solution
Ron_Green
Moderator
2,186 Views

It is an IFORT bug.  I can write it up.

I didn't like that get_stuff was an old school F77 External procedure, but it does not matter for the bug report.  I also created a version of the test with an INTERFACE for get_stuff - still wrong result.  Then I put get_stuff into a module and USE'd it - still wrong results.

 

Ifort and IFX handle OpenMP entirely differently.  

 

Is there any reason you cannot use IFX at this point?  We missed the code freeze for 2023 Update 1, which will be coming out in about a month.   Update 2 is many months out.  So no quick fix is possible and I cannot find a workaround.   

View solution in original post

0 Kudos
17 Replies
kay-diederichs
New Contributor I
2,362 Views

The spellchecker messed up my posting; sorry for that!
It should be "ifort" in the title ("OpenMP bug in ifort"), and "gfortran" rather than "Fortran" near the end ("Other compilers, like gfortran, get this right, ..".

And I should learn to use the formatting tool for the source code ...

0 Kudos
Arjen_Markus
Honored Contributor I
2,327 Views

As the variable IBUF is defined to be PRIVATE, it will actually consist of separate variables, one for each thread. As the threads finish in no predictable order, whatever the value is for the IBUF variable in the last thread to finish would be the value that might be passed on to the IBUF in the surrounding subprogram. In other words: it may be any value, unless the compiler is retaining the original value. I have not studied the source code in much detail, but I think you have to do it in another way using a shared variable, as this in general will not have a predictable outcome. To illustrate that, try a program that simply sets a private variable to the thread number and then does some work (so that threads get a chance to do something simultaneously). The end value of that private variable is bound to be arbitrary, unless of course I am making a huge thinking error (it is known to happen :)).

 

0 Kudos
kay-diederichs
New Contributor I
2,316 Views

@Arjen_Markus wrote:

As the variable IBUF is defined to be PRIVATE, it will actually consist of separate variables, one for each thread. As the threads finish in no predictable order, whatever the value is for the IBUF variable in the last thread to finish would be the value that might be passed on to the IBUF in the surrounding subprogram. In other words: it may be any value, unless the compiler is retaining the original value. I have not studied the source code in much detail, but I think you have to do it in another way using a shared variable, as this in general will not have a predictable outcome. To illustrate that, try a program that simply sets a private variable to the thread number and then does some work (so that threads get a chance to do something simultaneously). The end value of that private variable is bound to be arbitrary, unless of course I am making a huge thinking error (it is known to happen :)).

 


Hi Arjan,

thank you for looking into this. The point is that the PRIVATE variable IBUF is not used outside the PARALLEL region (lines 14-41; the properly formatted program is below) but only inside it. The OMP DO worksharing construct (lines 16-32) does not change this fact. Therefore each thread does have its own copy when it is needed in the PARALLEL region (namely within the OMP DO, and after it). 

Thus the fact that IBUF is 0 at line 36, for all the threads, is unexpected. 

I tried IFX instead of IFORT; IFX does not show this bug.

!$    USE OMP_LIB
      IMPLICIT NONE
      INTEGER, PARAMETER :: nbuf=100
      INTEGER nrow,ncol,i,j,ibuf
      REAL, ALLOCATABLE :: matrix(:,:)
      REAL cc,cct(nbuf)

      nrow=99
      ncol=65
      ALLOCATE(matrix(nrow,ncol))
      CALL RANDOM_NUMBER(matrix)
      OPEN(2,file='test.dat')
! calculate something, and write it out in batches of size nbuf 
!$OMP PARALLEL PRIVATE(i,j,cc,cct,ibuf) SHARED(nrow,ncol,matrix)
      ibuf=0
!$OMP DO 
      DO i=1,ncol-1
         DO j=i+1,ncol
           call get_stuff(matrix(1:nrow,i),matrix(1:nrow,j),nrow,cc)
           ibuf=ibuf+1
           cct(ibuf)=cc
           IF (ibuf==nbuf) THEN
!$OMP CRITICAL
             DO ibuf=1,nbuf
               WRITE(2,'(f0.3)') cct(ibuf) 
             END DO
!$OMP END CRITICAL
             ibuf=0
           END IF
         END DO
       END DO
!$OMP END DO   
! remaining buffers:
!$OMP CRITICAL
! if MOD(nount,nbuf)/=0 then the ibuf values should not be all 0
!$ print '(a,i3,a,i3,a)','thread=',omp_get_thread_num(),' ibuf=',ibuf,' (should not be all 0)'
      DO i=1,ibuf
        WRITE(2,'(f0.3)') cct(i) 
      END DO
!$OMP END CRITICAL
!$OMP END PARALLEL

CLOSE(2)
END
 
SUBROUTINE get_stuff(a,b,n,cc)
INTEGER, INTENT(IN) :: n
REAL, INTENT(IN) :: a(n),b(n)
REAL, INTENT(OUT) :: cc
cc=SUM(a+b)/n/2
END SUBROUTINE get_stuff

Thank you,

Kay

0 Kudos
Arjen_Markus
Honored Contributor I
2,308 Views

Ah! I had missed that all important part. Yes, there is no reason for IBUF to be set to zero at the end of the DO-loop.

0 Kudos
jimdempseyatthecove
Honored Contributor III
2,288 Views

In looking at you code:

!$OMP PARALLEL PRIVATE(i,j,cc,cct,ibuf) SHARED(nrow,ncol,matrix)
      ibuf=0
!$OMP DO 
      DO i=1,ncol-1
         DO j=i+1,ncol
            ...

There is no indication of if the compiler would collapse the nested do loops or not, or if the runtime would use static scheduling or dynamic scheduling. This said, it is unknown if would or would not have a muiltiple of nbuf of data to process or not. In the case of "yes" then on exit of the !$OMP DO, you would expect each thread to see ibuf==0, and in the case of "no", you would expect each thread to show the residual date placed into its private buffer.

On the line following: !$OMP END DO insert:

print *, omp_get_thread_num(), ibuf

 

See what you get.

Jim Dempsey

0 Kudos
kay-diederichs
New Contributor I
2,286 Views

Jim, the line you suggest is already there - it is line 36. (Due to it starting with !$ , it is only compiled if -qopenmp is used.)

Try to compile (fort -qopenmp) and run the program. I get e.g.

-bash-4.2$ ifort -qopenmp t.f90
-bash-4.2$ ./a.out 
thread=  0 ibuf=  0 !(should not be all 0)
thread=  7 ibuf=  0 !(should not be all 0)
thread=  4 ibuf=  0 !(should not be all 0)
thread= 18 ibuf=  0 !(should not be all 0)
thread= 28 ibuf=  0 !(should not be all 0)
thread= 19 ibuf=  0 !(should not be all 0)
thread=  2 ibuf=  0 !(should not be all 0)
thread= 16 ibuf=  0 !(should not be all 0)
thread=  5 ibuf=  0 !(should not be all 0)
thread=  3 ibuf=  0 !(should not be all 0)
thread=  8 ibuf=  0 !(should not be all 0)
thread=  6 ibuf=  0 !(should not be all 0)
thread= 11 ibuf=  0 !(should not be all 0)
thread= 29 ibuf=  0 !(should not be all 0)
thread= 10 ibuf=  0 !(should not be all 0)
thread= 14 ibuf=  0 !(should not be all 0)
thread= 12 ibuf=  0 !(should not be all 0)
thread=  1 ibuf=  0 !(should not be all 0)
thread= 31 ibuf=  0 !(should not be all 0)
thread= 15 ibuf=  0 !(should not be all 0)
thread= 24 ibuf=  0 !(should not be all 0)
thread=  9 ibuf=  0 !(should not be all 0)
thread= 17 ibuf=  0 !(should not be all 0)
thread= 13 ibuf=  0 !(should not be all 0)
thread= 25 ibuf=  0 !(should not be all 0)
thread= 26 ibuf=  0 !(should not be all 0)
thread= 30 ibuf=  0 !(should not be all 0)
thread= 20 ibuf=  0 !(should not be all 0)
thread= 27 ibuf=  0 !(should not be all 0)
thread= 23 ibuf=  0 !(should not be all 0)
thread= 21 ibuf=  0 !(should not be all 0)
thread= 22 ibuf=  0 !(should not be all 0)

If I move the line

!$ print '(a,i3,a,i3,a)','thread=',omp_get_thread_num(),' ibuf=',ibuf,' !(should not be all 0)'

two lines upwards, i.e. before the !$ OMP CRITICAL, the output is the same (as it should). 

0 Kudos
jimdempseyatthecove
Honored Contributor III
2,277 Views

To help investigate

integer :: iBegin, iEnd, nWrites
...
!$OMP PARALLEL PRIVATE(i,j,cc,cct,ibuf, iBegin, iEnd, nWrites) SHARED(nrow,ncol,matrix)
      ibuf=0
      iBegin = huge(iBegin)
      iEnd = 0
      nWrites = 0
!$OMP DO 
      DO i=1,ncol-1
         DO j=i+1,ncol
           iBegin = min(iBegin,i)
           iEnd = i
           call get_stuff(matrix(1:nrow,i),matrix(1:nrow,j),nrow,cc)
           ibuf=ibuf+1
           cct(ibuf)=cc
           IF (ibuf==nbuf) THEN
             nWrites = nWrites+1
!$OMP CRITICAL
             DO ibuf=1,nbuf
               WRITE(2,'(f0.3)') cct(ibuf) 
             END DO
!$OMP END CRITICAL
             ibuf=0
           END IF
         END DO
       END DO
!$OMP END DO  
!$OMP CRITICAL 
print *,omp_get_thread_num(), iBegin, iEnd, nWrites
!$OMP END CRITICAL

Jim Dempsey

 

0 Kudos
jimdempseyatthecove
Honored Contributor III
2,276 Views

You may also need to add COLLAPSE(1) to the !$OMP DO in the event that the compiler collapses the loops....

OR add additional diagnostic variables to track jBegin and jEnd.

Jim Dempsey

 

0 Kudos
kay-diederichs
New Contributor I
2,255 Views

COLLAPSE does not work for my triangular loop (and more generally, loops where the inner loop index depends on the outer loop index). 

I added diagnostic variables, as you suggested. They confirm that the individual threads cover some iterations, and that some do write out their buffer; others (the ones with high thread indices) don't because the work is unevenly distributed. None of this is unexpected.

What is unexpected is that at line 36, ibuf=0 for all of them.

Kay Diederichs

0 Kudos
jimdempseyatthecove
Honored Contributor III
2,228 Views

Kay,

Did you note the "1" in COLLAPSE(1)?

IOW, explicitly telling the compiler optimizer .NOT. to collapse the loops.

 

>>36, ibuf=0 for all of them.

Then this appears to be a bug. To verify, go a little deeper into checking.

integer :: my_ibuf(1024), iThread ! replace 1024 with an appropriate number
...
my_ibuf = 0
!$OMP PARALLEL PRIVATE(i,j,cc,cct,ibuf, iThread) SHARED(nrow,ncol,matrix, my_ibuf)
      ibuf=0
      iThread = omp_get_thread_num()
!$OMP DO 
      DO i=1,ncol-1
         DO j=i+1,ncol
           iBegin = min(iBegin,i)
           iEnd = i
           call get_stuff(matrix(1:nrow,i),matrix(1:nrow,j),nrow,cc)
           ibuf=ibuf+1
           my_ibuf(iThread) = ibuf
           cct(ibuf)=cc
           IF (ibuf==nbuf) THEN
!$OMP CRITICAL
             DO ibuf=1,nbuf
               WRITE(2,'(f0.3)') cct(ibuf) 
             END DO
!$OMP END CRITICAL
             ibuf=0
             my_ibuf(iThread) = ibuf
           END IF
         END DO
       END DO
!$OMP END DO  
!$OMP CRITICAL 
print *,iThread, my_ibuf(iThread), ibuf
!$OMP END CRITICAL

Should my_ibuf(iThread) differ from ibuf then this is definitely a bug.

Note, inserting the diagnostic code may make the bug disappear.

If the two values concur, then it would appear that ibuf is correctly 0.

If the my_ibuf gets what you need, then use that as a work around until the bug gets fixed.

If the results are both 0 and you are convinced they are incorrect, then attribure my_ibuf with volatile

   integer, volatile :: my_ibuf(1024)

The compiler is then forced to write into the variable as opposed to potentially optimizing it away.

(and you cannot use a volatile variable in the PRIVATE clause)

 

Jim Dempsey

0 Kudos
jimdempseyatthecove
Honored Contributor III
2,227 Views

FWIW you will get ibuf==0 if nbuf==1 or nbuf== exact multiple of stuff processed by thread.

(nbuf was not specified as shared, it should default to shared, you could check its value inside the parallel region before the !$OMP DO)

 

Jim Dempsey

 

0 Kudos
kay-diederichs
New Contributor I
2,211 Views

Jim,

thanks for your input.

- concerning the COLLAPSE(1): I had been wondering what this could be good for. Are you assuming that ifort may wrongly collapse loops, and one should try to prevent it from doing so? That would be another bug.

- concerning the question if all IBUF values can simultaneously be 0: my code has the comment

! if MOD(nount,nbuf)/=0 then the ibuf values should not be all 0

which is a relic from previous work in reducing the reproducer. For the current program, MOD(ncol*(ncol-1)/2,nbuf)=MOD(65*32,100)>0 so not all IBUF values can be 0.

- NBUF is neither PRIVATE nor SHARED since it is declared as INTEGER, PARAMETER, and thus is treated by the compiler like a constant.

- your suggestion to declare a SHARED my_ibuf array for checking the value of IBUF shows a divergence of the two: whereas all IBUF values are 0, the my_ibuf values are sensible. This confirms the bug in ifort.

- I had posted this as a question in the comp.lang.fortran newsgroup. I just read the answer by Reinhold Bader: "I could reproduce this with the "classic" compiler ifort. It seems to be triggered by your use of ibuf as a loop index in the CRITICAL region (if a separately declared privatized variable is used, the bug evaporates)."

Since my use of ibuf as a loop index does not violate the OpenMP specs, this is a bug that should be fixed.

I wonder how to notify Intel about it since I do not currently have a paid version of the compiler.

Best wishes,

Kay

0 Kudos
jimdempseyatthecove
Honored Contributor III
2,206 Views

>> Are you assuming that ifort may wrongly collapse loops, and one should try to prevent it from doing so? That would be another bug.

That is correct, we (you) are hunting for a bug, so we want to eliminate possibilities. When an elimination results in a change in effect, then that which was assumed to be true has been shown to be not true.

 

>>! if MOD(nount,nbuf)/=0 then the ibuf values should not be all 0

From my understanding, nount is the total number of outputs not output of each thread. the nount contribution per thread may (and likely is) unbalanced.

>>ibuf used as loop index

While this may be the cause of the bug surfacing as within the loop ibuf was likely regiserized, the out of loop ibuf should have been updated from the register version. If I were to guess, the registerization of ibuf persists outside the loop and the compiler inverted the position of the code preceeding the loop to following the loop (or something like that) such that it could re-use the register copy of ibuf.

Definitely a bug.

 

>>I wonder how to notify Intel about it since I do not currently have a paid version of the compiler.

Intel is very good at watching this forum.

If you can post your (a) simple reproducer, it will get attention.

 

Jim Dempsey

0 Kudos
kay-diederichs
New Contributor I
2,204 Views

Thanks! I guess the 51-line reproducer above should do.

0 Kudos
Ron_Green
Moderator
2,187 Views

It is an IFORT bug.  I can write it up.

I didn't like that get_stuff was an old school F77 External procedure, but it does not matter for the bug report.  I also created a version of the test with an INTERFACE for get_stuff - still wrong result.  Then I put get_stuff into a module and USE'd it - still wrong results.

 

Ifort and IFX handle OpenMP entirely differently.  

 

Is there any reason you cannot use IFX at this point?  We missed the code freeze for 2023 Update 1, which will be coming out in about a month.   Update 2 is many months out.  So no quick fix is possible and I cannot find a workaround.   

0 Kudos
kay-diederichs
New Contributor I
2,174 Views

Ron,

thanks for taking care of that.

BTW a workaround is to replace the variable IBUF in the first CRITICAL section, where it is used as loop index, with a new PRIVATE variable (say, K).

Best wishes, Kay

0 Kudos
Ron_Green
Moderator
2,054 Views

I forgot to add this: Bug ID is CMPLRIL0-35219


0 Kudos
Reply