Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.
Announcements
The Intel sign-in experience is changing in February to support enhanced security controls. If you sign in, click here for more information.

OMP ordered not ordered and hangs

jimdempseyatthecove
Black Belt
469 Views

Using 2020.0.166 IFORT omp ordered (threads) not ordered and hangs

 

            ! create the test data file
            Open(newunit=unitBuildTestFile, FILE=FileName, ACTION='WRITE', ASYNCHRONOUS='NO', ACCESS=Acc, FORM=Frm, STATUS='REPLACE')
            Stride = 1000
            nObjectsWritten = 0
            do while(nObjectsWritten < nObjectsTotal)
                iFullBegin = 1
                iFullEnd = min(size(WS), nObjectsTotal-nObjectsWritten)
                !$omp parallel private(iThread, iBegin, iEnd, i, j)
                nThreads = omp_get_num_threads()
                iThread = omp_get_thread_num()
                do while(iFullBegin<=iFullEnd)
                    !$omp do ordered schedule(static,1)
                    do i=0,nThreads-1
                        iBegin = iFullBegin + Stride * iThread
                        iEnd = min(iBegin+Stride-1,iFullEnd)
                        if(iBegin <= iEnd) then
                            print *,iThread,"Object_t_FillBuffer",iBegin, min(iEnd, iBegin+nObjectsTotal-nObjectsWritten-1)
                            call Object_t_FillBuffer(WS(iBegin:min(iEnd, iBegin+nObjectsTotal-nObjectsWritten-1)))
                            !$omp atomic
                            nObjectsWritten = nObjectsWritten + iEnd-iBegin+1
                        end if
                        !$omp ordered
                        print *,iThread,"Object_t_FlushBuffer"
                        if(iBegin <= iEnd) call Object_t_FlushBuffer(unitBuildTestFile)
                        !print *,iThread,"Object_t_WaitBuffer"
                        !if(iBegin <= iEnd) call Object_t_WaitBuffer(unitBuildTestFile)
                        !$omp end ordered
                        !$omp ordered
                        !print *,iThread,"Object_t_WaitBuffer"
                        !if(iBegin <= iEnd) call Object_t_WaitBuffer(unitBuildTestFile)
                        !$omp end ordered
                    end do
                    !$omp single
                    iFullBegin = iFullBegin + Stride * nThreads
                    !$omp end single
                end do
                !$omp end parallel
                WS(:)%SequenceNumber = WS(:)%SequenceNumber + size(WS)
            end do
...
Writing \Temp\testFormatted.dat...FORMATTED
           2 Object_t_FillBuffer                  2001                  3000
           3 Object_t_FillBuffer                  3001                  4000
           0 Object_t_FillBuffer                     1                  1000
           1 Object_t_FillBuffer                  1001                  2000
           7 Object_t_FillBuffer                  7001                  8000
           5 Object_t_FillBuffer                  5001                  6000
           6 Object_t_FillBuffer                  6001                  7000
           4 Object_t_FillBuffer                  4001                  5000
           0 Object_t_FlushBuffer
           2 Object_t_FlushBuffer
           3 Object_t_FlushBuffer
           4 Object_t_FlushBuffer
           1 Object_t_FlushBuffer

 

From the above, you can see the static schedule assigned threads in the expected order:

0=1:1000, 1=1001:2000,...7=7001:8000

However the ordered section processed threads in order: 0, 2, 3, 4, 1 and then hung.

Running on oneAPI 2022.0.0.161

It does not hang, but the output order is not as expected.

Shouldn't the ordered section execute in thread order?

(I will post the oneAPI output as an edit to this, it is on a different system)

Jim Dempsey

Edit: Output from oneAPI IFORT:

...
           0 Object_t_FillBuffer                 12001                 13000
           6 Object_t_FillBuffer                 18001                 19000
           9 Object_t_FillBuffer                 21001                 22000
          11 Object_t_FillBuffer                 23001                 24000
          10 Object_t_FillBuffer                 22001                 23000
           5 Object_t_FillBuffer                 17001                 18000
           1 Object_t_FillBuffer                 13001                 14000
           3 Object_t_FillBuffer                 15001                 16000
           7 Object_t_FillBuffer                 19001                 20000
           2 Object_t_FillBuffer                 14001                 15000
           4 Object_t_FillBuffer                 16001                 17000
           0 Object_t_FlushBuffer
           2 Object_t_FlushBuffer
           1 Object_t_FlushBuffer
           3 Object_t_FlushBuffer
           8 Object_t_FillBuffer                 20001                 21000
           5 Object_t_FlushBuffer
           6 Object_t_FlushBuffer
           4 Object_t_FlushBuffer
          10 Object_t_FlushBuffer
           7 Object_t_FlushBuffer
           9 Object_t_FlushBuffer
          11 Object_t_FlushBuffer
           8 Object_t_FlushBuffer

notice the static scheduled thread pick order is correct, however the FlushBuffer thread sequence is not ordered.

Jim Dempsey

 

0 Kudos
5 Replies
jdelia
Novice
439 Views

Dear Jim, please, it would help to have a self-contained test.

jimdempseyatthecove
Black Belt
426 Views

Here is reproducer:

Jim Dempsey

jimdempseyatthecove
Black Belt
422 Views

FWIW creating my own ordered works just fine:

            ! create the test data file
            Open(newunit=unitBuildTestFile, FILE=FileName, ACTION='WRITE', ASYNCHRONOUS='NO', ACCESS=Acc, FORM=Frm, STATUS='REPLACE')
            Stride = 1000
            nObjectsWritten = 0
            do while(nObjectsWritten < nObjectsTotal)
                iFullBegin = 1
                iFullEnd = min(size(WS), nObjectsTotal-nObjectsWritten)
                !$omp parallel private(iThread, iBegin, iEnd, i, j)
                nThreads = omp_get_num_threads()
                iThread = omp_get_thread_num()
                do while(iFullBegin<=iFullEnd)
                    ordered_order_flush = 0
                    !$omp do ordered schedule(static,1)
                    do i=0,nThreads-1
                        iBegin = iFullBegin + Stride * iThread
                        iEnd = min(iBegin+Stride-1,iFullEnd)
                        if(iBegin <= iEnd) then
                            print *,iThread,"Object_t_FillBuffer",iBegin, min(iEnd, iBegin+nObjectsTotal-nObjectsWritten-1)
                            call Object_t_FillBuffer(WS(iBegin:min(iEnd, iBegin+nObjectsTotal-nObjectsWritten-1)))
                            !$omp atomic
                            nObjectsWritten = nObjectsWritten + iEnd-iBegin+1
                        end if
                        !dir$ if(.false.)
                            !$omp ordered
                            print *,iThread,"Object_t_FlushBuffer"
                            if(iBegin <= iEnd) call Object_t_FlushBuffer(unitBuildTestFile)
                            !$omp end ordered
                        !dir$ else
                            do while(mod(ordered_order_flush,nThreads) /= iThread)
                                !$omp flush (ordered_order_flush)
                                continue
                            end do
                            print *,iThread,"Object_t_FlushBuffer"
                            if(iBegin <= iEnd) call Object_t_FlushBuffer(unitBuildTestFile)
                            !$omp atomic
                            ordered_order_flush = ordered_order_flush + 1
                        !dir$ endif
                        !print *,iThread,"Object_t_WaitBuffer"
                        !if(iBegin <= iEnd) call Object_t_WaitBuffer(unitBuildTestFile)
                    end do
                    !$omp single
                    iFullBegin = iFullBegin + Stride * nThreads
                    !$omp end single
                end do
                !$omp end parallel
                WS(:)%SequenceNumber = WS(:)%SequenceNumber + size(WS)
            end do
...
           0 Object_t_FillBuffer                     1                  1000
           2 Object_t_FillBuffer                  2001                  3000
           6 Object_t_FillBuffer                  6001                  7000
           4 Object_t_FillBuffer                  4001                  5000
           1 Object_t_FillBuffer                  1001                  2000
           3 Object_t_FillBuffer                  3001                  4000
           5 Object_t_FillBuffer                  5001                  6000
           0 Object_t_FlushBuffer
           1 Object_t_FlushBuffer
           2 Object_t_FlushBuffer
           3 Object_t_FlushBuffer
           4 Object_t_FlushBuffer
           7 Object_t_FillBuffer                  7001                  8000
           5 Object_t_FlushBuffer
           6 Object_t_FlushBuffer
           7 Object_t_FlushBuffer

 

Barbara_P_Intel
Moderator
334 Views

I started looking at this because I saw the word "hang". I compiled with ifort 2021.5.0 and I don't get a hang on Linux or on Windows.

But then you know that... I just reread the thread. 

 

Barbara_P_Intel
Moderator
279 Views

The use of the ORDERED clause caught my interest.

@jimdempseyatthecove If you are still working on this, you might want to check out the openmp-examples-4.5.0 document. See page 190. There are examples of the use of ORDERED including an example of using multiple ORDERED constructs in a loop region.

 

Reply