Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.

OMP ordered not ordered and hangs

Honored Contributor III

Using 2020.0.166 IFORT omp ordered (threads) not ordered and hangs


            ! create the test data file
            Open(newunit=unitBuildTestFile, FILE=FileName, ACTION='WRITE', ASYNCHRONOUS='NO', ACCESS=Acc, FORM=Frm, STATUS='REPLACE')
            Stride = 1000
            nObjectsWritten = 0
            do while(nObjectsWritten < nObjectsTotal)
                iFullBegin = 1
                iFullEnd = min(size(WS), nObjectsTotal-nObjectsWritten)
                !$omp parallel private(iThread, iBegin, iEnd, i, j)
                nThreads = omp_get_num_threads()
                iThread = omp_get_thread_num()
                do while(iFullBegin<=iFullEnd)
                    !$omp do ordered schedule(static,1)
                    do i=0,nThreads-1
                        iBegin = iFullBegin + Stride * iThread
                        iEnd = min(iBegin+Stride-1,iFullEnd)
                        if(iBegin <= iEnd) then
                            print *,iThread,"Object_t_FillBuffer",iBegin, min(iEnd, iBegin+nObjectsTotal-nObjectsWritten-1)
                            call Object_t_FillBuffer(WS(iBegin:min(iEnd, iBegin+nObjectsTotal-nObjectsWritten-1)))
                            !$omp atomic
                            nObjectsWritten = nObjectsWritten + iEnd-iBegin+1
                        end if
                        !$omp ordered
                        print *,iThread,"Object_t_FlushBuffer"
                        if(iBegin <= iEnd) call Object_t_FlushBuffer(unitBuildTestFile)
                        !print *,iThread,"Object_t_WaitBuffer"
                        !if(iBegin <= iEnd) call Object_t_WaitBuffer(unitBuildTestFile)
                        !$omp end ordered
                        !$omp ordered
                        !print *,iThread,"Object_t_WaitBuffer"
                        !if(iBegin <= iEnd) call Object_t_WaitBuffer(unitBuildTestFile)
                        !$omp end ordered
                    end do
                    !$omp single
                    iFullBegin = iFullBegin + Stride * nThreads
                    !$omp end single
                end do
                !$omp end parallel
                WS(:)%SequenceNumber = WS(:)%SequenceNumber + size(WS)
            end do
Writing \Temp\testFormatted.dat...FORMATTED
           2 Object_t_FillBuffer                  2001                  3000
           3 Object_t_FillBuffer                  3001                  4000
           0 Object_t_FillBuffer                     1                  1000
           1 Object_t_FillBuffer                  1001                  2000
           7 Object_t_FillBuffer                  7001                  8000
           5 Object_t_FillBuffer                  5001                  6000
           6 Object_t_FillBuffer                  6001                  7000
           4 Object_t_FillBuffer                  4001                  5000
           0 Object_t_FlushBuffer
           2 Object_t_FlushBuffer
           3 Object_t_FlushBuffer
           4 Object_t_FlushBuffer
           1 Object_t_FlushBuffer


From the above, you can see the static schedule assigned threads in the expected order:

0=1:1000, 1=1001:2000,...7=7001:8000

However the ordered section processed threads in order: 0, 2, 3, 4, 1 and then hung.

Running on oneAPI 2022.0.0.161

It does not hang, but the output order is not as expected.

Shouldn't the ordered section execute in thread order?

(I will post the oneAPI output as an edit to this, it is on a different system)

Jim Dempsey

Edit: Output from oneAPI IFORT:

           0 Object_t_FillBuffer                 12001                 13000
           6 Object_t_FillBuffer                 18001                 19000
           9 Object_t_FillBuffer                 21001                 22000
          11 Object_t_FillBuffer                 23001                 24000
          10 Object_t_FillBuffer                 22001                 23000
           5 Object_t_FillBuffer                 17001                 18000
           1 Object_t_FillBuffer                 13001                 14000
           3 Object_t_FillBuffer                 15001                 16000
           7 Object_t_FillBuffer                 19001                 20000
           2 Object_t_FillBuffer                 14001                 15000
           4 Object_t_FillBuffer                 16001                 17000
           0 Object_t_FlushBuffer
           2 Object_t_FlushBuffer
           1 Object_t_FlushBuffer
           3 Object_t_FlushBuffer
           8 Object_t_FillBuffer                 20001                 21000
           5 Object_t_FlushBuffer
           6 Object_t_FlushBuffer
           4 Object_t_FlushBuffer
          10 Object_t_FlushBuffer
           7 Object_t_FlushBuffer
           9 Object_t_FlushBuffer
          11 Object_t_FlushBuffer
           8 Object_t_FlushBuffer

notice the static scheduled thread pick order is correct, however the FlushBuffer thread sequence is not ordered.

Jim Dempsey


0 Kudos
5 Replies
New Contributor I

Dear Jim, please, it would help to have a self-contained test.

0 Kudos
Honored Contributor III

Here is reproducer:

Jim Dempsey

0 Kudos
Honored Contributor III

FWIW creating my own ordered works just fine:

            ! create the test data file
            Open(newunit=unitBuildTestFile, FILE=FileName, ACTION='WRITE', ASYNCHRONOUS='NO', ACCESS=Acc, FORM=Frm, STATUS='REPLACE')
            Stride = 1000
            nObjectsWritten = 0
            do while(nObjectsWritten < nObjectsTotal)
                iFullBegin = 1
                iFullEnd = min(size(WS), nObjectsTotal-nObjectsWritten)
                !$omp parallel private(iThread, iBegin, iEnd, i, j)
                nThreads = omp_get_num_threads()
                iThread = omp_get_thread_num()
                do while(iFullBegin<=iFullEnd)
                    ordered_order_flush = 0
                    !$omp do ordered schedule(static,1)
                    do i=0,nThreads-1
                        iBegin = iFullBegin + Stride * iThread
                        iEnd = min(iBegin+Stride-1,iFullEnd)
                        if(iBegin <= iEnd) then
                            print *,iThread,"Object_t_FillBuffer",iBegin, min(iEnd, iBegin+nObjectsTotal-nObjectsWritten-1)
                            call Object_t_FillBuffer(WS(iBegin:min(iEnd, iBegin+nObjectsTotal-nObjectsWritten-1)))
                            !$omp atomic
                            nObjectsWritten = nObjectsWritten + iEnd-iBegin+1
                        end if
                        !dir$ if(.false.)
                            !$omp ordered
                            print *,iThread,"Object_t_FlushBuffer"
                            if(iBegin <= iEnd) call Object_t_FlushBuffer(unitBuildTestFile)
                            !$omp end ordered
                        !dir$ else
                            do while(mod(ordered_order_flush,nThreads) /= iThread)
                                !$omp flush (ordered_order_flush)
                            end do
                            print *,iThread,"Object_t_FlushBuffer"
                            if(iBegin <= iEnd) call Object_t_FlushBuffer(unitBuildTestFile)
                            !$omp atomic
                            ordered_order_flush = ordered_order_flush + 1
                        !dir$ endif
                        !print *,iThread,"Object_t_WaitBuffer"
                        !if(iBegin <= iEnd) call Object_t_WaitBuffer(unitBuildTestFile)
                    end do
                    !$omp single
                    iFullBegin = iFullBegin + Stride * nThreads
                    !$omp end single
                end do
                !$omp end parallel
                WS(:)%SequenceNumber = WS(:)%SequenceNumber + size(WS)
            end do
           0 Object_t_FillBuffer                     1                  1000
           2 Object_t_FillBuffer                  2001                  3000
           6 Object_t_FillBuffer                  6001                  7000
           4 Object_t_FillBuffer                  4001                  5000
           1 Object_t_FillBuffer                  1001                  2000
           3 Object_t_FillBuffer                  3001                  4000
           5 Object_t_FillBuffer                  5001                  6000
           0 Object_t_FlushBuffer
           1 Object_t_FlushBuffer
           2 Object_t_FlushBuffer
           3 Object_t_FlushBuffer
           4 Object_t_FlushBuffer
           7 Object_t_FillBuffer                  7001                  8000
           5 Object_t_FlushBuffer
           6 Object_t_FlushBuffer
           7 Object_t_FlushBuffer


0 Kudos

I started looking at this because I saw the word "hang". I compiled with ifort 2021.5.0 and I don't get a hang on Linux or on Windows.

But then you know that... I just reread the thread. 


0 Kudos

The use of the ORDERED clause caught my interest.

@jimdempseyatthecove If you are still working on this, you might want to check out the openmp-examples-4.5.0 document. See page 190. There are examples of the use of ORDERED including an example of using multiple ORDERED constructs in a loop region.


0 Kudos