Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.
29273 Discussions

Problems with the coarray event mechanism

Arjen_Markus
Honored Contributor II
2,130 Views

I wanted to experiment with coarray events (to see if I could get images to take tasks from one another in the right order - a pipeline architecture) but I ran into problems: the "event wait" command does not succeed. I may be misunderstanding the feature, but the sample program is very simple and posting does work.

I use Intel Fortran oneAPI 2025.0.0 on Windows, but I also tried it on Linux with the same result.

Here is the program:

program events
    use iso_fortran_env
    implicit none

    type(event_type) :: ev[*]
    integer          :: count, stat

    if ( this_image() == 1 ) then
        write(*,*) 'Image 1: Wait for one second'
        call sleep( 1 )

        write(*,*) 'Image 1: Wait for event from image 2'
        call event_query( ev, count, stat )
        write(*,*) 'Image 1: ', count, stat
        event wait (ev)
        call event_query( ev, count, stat )
        write(*,*) 'Image 1 (after): ', count, stat

        write(*,*) 'Image 1: Event received!'

    elseif ( this_image() == 2 ) then
        write(*,*) 'Image 2: Wait for five seconds'
        call sleep( 5 )

        write(*,*) 'Image 2: Send event'
        event post (ev)
        call event_query( ev, count, stat )
        write(*,*) 'Image 2 (after): ', count, stat
        write(*,*) 'Image 2: Event sent'

    else
        write(*,*) 'Image ', this_image(), ' simply waits (well, post an event)'
        call sleep( 2 )
        event post (ev)
    endif

    sync all
    stop
end program events

The symptoms are:

  • The second image posts the event.
  • The first image simply waits forever
  • If I click ctrl-C, then I have to do it again to stop the program, but doing so still does not kill off the process. For that I have to use the Task Manager.

I have added all manner of write statements to make sure that I understand what is and is not happening. Here is the output of one run on my machine:

 Image           22  simply waits (well, post an event)
 Image            6  simply waits (well, post an event)
 Image            7  simply waits (well, post an event)
 Image           23  simply waits (well, post an event)
 Image           19  simply waits (well, post an event)
 Image           13  simply waits (well, post an event)
 Image           18  simply waits (well, post an event)
 Image           21  simply waits (well, post an event)
 Image            5  simply waits (well, post an event)
 Image           20  simply waits (well, post an event)
 Image           24  simply waits (well, post an event)
 Image            8  simply waits (well, post an event)
 Image           17  simply waits (well, post an event)
 Image 2: Wait for five seconds
 Image           10  simply waits (well, post an event)
 Image           11  simply waits (well, post an event)
 Image           12  simply waits (well, post an event)
 Image            4  simply waits (well, post an event)
 Image 1: Wait for one second
 Image           16  simply waits (well, post an event)
 Image            3  simply waits (well, post an event)
 Image           15  simply waits (well, post an event)
 Image           14  simply waits (well, post an event)
 Image            9  simply waits (well, post an event)
 Image 1: Wait for event from image 2
 Image 1:            0           0
 Image 2: Send event
 Image 2 (after):            1           0
 Image 2: Event sent
[mpiexec@WSU-5CG2435LML] Sending Ctrl-C to processes as requested
[mpiexec@WSU-5CG2435LML] Press Ctrl-C again to force abort
^C

 

0 Kudos
1 Solution
jimdempseyatthecove
Honored Contributor III
2,050 Views

In image 2, try event_post(ev[1])

Note, event_post(ev) performs event_post(ev[this_image()]

Jim Dempsey

View solution in original post

12 Replies
jimdempseyatthecove
Honored Contributor III
2,051 Views

In image 2, try event_post(ev[1])

Note, event_post(ev) performs event_post(ev[this_image()]

Jim Dempsey

Arjen_Markus
Honored Contributor II
2,002 Views

Indeed, that makes sense and what's at least as important, it solves the problem :). Thanks, now I can work out how to do such a pipeline.

Arjen_Markus
Honored Contributor II
1,927 Views

Here is an update - based on Jim's correction I went on to create a program that uses events to send little "tasks" to its images - the pipeline I mentioned. It turns out to work fine on Linux, but on Windows I have to kill it via Control-C or the task manager. It is a silly program that by itself does nothing useful (at least I cannot think of a proper use), but it does illustrate the kind of architecture I wanted to experiment with: there are three tasks to be fulfilled and they have to be fulfilled for a series of inputs in the right order. See the attached source file.

On Linux (with the stated versions of the compiler) it works fine, but omn Windows it does not. Here is the last part of the output to screen:

 Image 3: waiting done
 Image 3: send event to image 2
 Image 2: waiting on image 3 done
 Image 2: wait for image 1
 Image 2: waiting done - image 1
 Image 1: waiting done - image 2
 Image 1: finished!
 End of program - synchronise on all - image           1
 Image 2: send event to image 3
 Image 3: wait for image 2
 Image 3: waiting done
 Image 2: wait for image 3
 Image 3: send event to image 2
 Image 2: waiting on image 3 done
 Image 2: finished!
 End of program - synchronise on all - image           2
 Image 3: finished!
 End of program - synchronise on all - image           3
 Result:         147         943
 [mpiexec@WSU-5CG2435LML] Sending Ctrl-C to processes as requested
[mpiexec@WSU-5CG2435LML] Press Ctrl-C again to force abort
[mpiexec@WSU-5CG2435LML] Error: Downstream from host WSU-5CG2435LML exited abnormally
[mpiexec@WSU-5CG2435LML] Trying to close other downstreams
[mpiexec@WSU-5CG2435LML] HYD_sock_write (..\windows\src\hydra_sock.c:387): write error (errno = 2)
[mpiexec@WSU-5CG2435LML] wmain (mpiexec.c:2023): assert (pg->intel.exitcodes != NULL) failed

As you can see, it prints the results but then does not finish properly. The print statement occurs AFTER the "sync all" statement.  I checked that all images produced the message "End of program", so that is not the problem :).

0 Kudos
jimdempseyatthecove
Honored Contributor III
1,849 Views

A likely problem is:

Image 1, on last line of file, posts the event to image 2, image 2 replies to image 1, finishes task and posts to image 3, 3 replies to image 2,  completes task and ....

BEFORE image 1 read encounters the EOF, and sets done. This would cause a lock-up with image 2 waiting for an event that will not come.

 

What I suggest you try is to create a "signatured" EOF record on EOF and pass that through the pipeline. (or add an EOF flag to the buffer you pass between images).  When they see this they shut down (or enter a general wait state for the next file to be processed).

 

Jim Dempsey

0 Kudos
Arjen_Markus
Honored Contributor II
1,776 Views

Definitely worth an attempt :), but that would not explain why the program succeeds on Linux and passes the sync all barrier on Windows (as it definitely prints the result in both environments). Still, the proper ending is a bit tricky. I will have a closer look at this.

0 Kudos
jimdempseyatthecove
Honored Contributor III
1,687 Views

>>but that would not explain why the program succeeds on Linux and passes the sync all barrier on Windows

Likely a race condition versus latency in messaging system between the two systems. You are lucky it was exposed on one of the systems. (iow it may intermittently fail on the Linux system too)

 

Jim Dempsey

0 Kudos
andrew_4619
Honored Contributor III
1,838 Views

All Kudos to you guys, I find coarrays a bit scary and haven't tried doing anything.  Serious question does anyone  do any serious programs with them? The few  posts I have seen on here are people experimenting.

0 Kudos
jimdempseyatthecove
Honored Contributor III
1,810 Views

Coarrays are ony scary until you get your first few applications done.

Events (and lack of proper sequencing) are pretty much the same regardless of implementation. 

 

RE: pipelining

Consider wrapping the "load" (that which is to be processed) in a container that has control information. Minimally you would want something like:

 

Action: process data, END OF FILE, (? error code?)

Sequence: original sequence number (e.g. input record number)

Data: blob of data (which may include its own header information)

 

The input pipe is generally sequential.

The interior pipes may be sequential or parallel. (parallel can be done with OpenMP or as different images).

The output pipe may receive packets out of order, but may need to write them in order.

 

Jim Dempsey

0 Kudos
Arjen_Markus
Honored Contributor II
1,776 Views

I have found that people want to rely on MPI much more than on coarrays. Coarrays could do with better exposure, which is one reason I am experimenting with different set-ups.

Arjen_Markus
Honored Contributor II
1,592 Views

I have re-found an article by Brad Richardson and others on the subject: Scheduling and Performance of Asynchronous Tasks in Fortran 2018 with FEATS - https://link.springer.com/article/10.1007/s42979-024-02682-y. They refer to a number of real-world applications of coarrays and the article itself is about a framework for task-parallel applications where the coarray/event communication details are actually hidden.

0 Kudos
Arjen_Markus
Honored Contributor II
1,448 Views

FWIW, I have run the program several times on Linux and it succeeds every time. I have used different numbers of images as well. On Windows it gets stuck every time. I also built in WSL with gfortran+libcaf and there it works fine too. One run with 24 images did finish but gave some run-time error at the very end, when all the images were being finalised, but that is after the actual end of the program.

I will continue to explore why this is happening.

0 Kudos
jimdempseyatthecove
Honored Contributor III
1,399 Views

Andrew,

The following is not how I would do a pipeline (I would use a collection of buffers, round-robin style to avoid unnecessary copying of the data). The following, follows the gist of how you coded it (untested code).

! pipeline.f90 --
!     Experiment with a pipeline:
!     Image 1 reads a line from a file and passes it on to image 2.
!     Image 2 converts the lines to upper case and passes it to image 3
!     Image 3 counts the number of vowels
!     The end result is printed as the number of lines and the number of vowels
!
!     Because the work per step is rather limited, the images wait for one or
!     more seconds to make it look better.
!
!     Question: could there be a race condition because of the check on done?
!
program pipeline
    use iso_fortran_env
    implicit none

    type(event_type)  :: ev_convert[*], ev_vowels[*]
    logical           :: done[*]
    character(len=80) :: line[*]
    integer           :: number_lines, number_vowels[*]
    integer           :: ierr

    number_lines  = 0
    number_vowels = 0
    done          = .false.
    select case ( this_image() )
    case( 1 )
        open( 10, file = 'pipeline.f90' )

        do
            read( 10, '(a)', iostat = ierr ) line
            if ( ierr == 0 ) then
                write(*,*) trim(line)
                number_lines = number_lines + 1
                call sleep( 1 )

                ! Tell image 2 to process the new line
                event post (ev_convert[2])

                ! Wait for it to acknowledge this, then proceed to the next line
                write(*,*) 'Image 1: wait for image 2'
                event wait (ev_convert)
                write(*,*) 'Image 1: waiting done - image 2'
            else
                ! End of file, so we are done here, tell image 2 about it
                write(*,*) 'Image 1: finished!'
                done = .true.

                ! Tell image 2 we are done
                event post (ev_convert[2])
                exit
            endif
        enddo

    case( 2 )
        do
            write(*,*) 'Image 2: wait for image 1'
            event wait (ev_convert)
            write(*,*) 'Image 2: waiting done - image 1'
            if(done[1]) then
                write(*,*) 'Image 2: finished!'
                done[2] = .true.
                ! Tell image 3 we are done
                write(*,*) 'Image 2: send event to image 3'
                event post (ev_vowels[3])
                exit
            else
                line = line[1]
                event post (ev_convert[1])  ! acknowledge

                ! Do the work - conversion to upper case
                call convert_line( line )
                call sleep( 2 )

                ! Tell image 3 to process the converted line
                write(*,*) 'Image 2: send event to image 3'
                event post (ev_vowels[3])

                ! Wait for it to acknowledge this, then proceed to the next line
                write(*,*) 'Image 2: wait for image 3'
                event wait (ev_vowels)
                write(*,*) 'Image 2: waiting on image 3 done'
            endif
        enddo

    case( 3 )
        do
            write(*,*) 'Image 3: wait for image 2'
            event wait (ev_vowels)
            write(*,*) 'Image 3: waiting done'
            if (done[2] ) then
                write(*,*) 'Image 3: finished!'
                done[3] = .true.
                exit
            else
               line = line[2]
                write(*,*) 'Image 3: send event to image 2'
                event post (ev_vowels[2])

               ! Do the work - count the vowels
                call count_vowels( line, number_vowels )
                call sleep( 2 )
            endif
        enddo

    case default
        ! Nothing to be done
    end select

    write(*,*) 'End of program - synchronise on all - image', this_image()
    sync all

    if ( this_image() == 1 ) then
        write(*,*) 'Result:', number_lines, number_vowels
    endif

    stop
contains

subroutine convert_line( line )
    character(len=*), intent(inout) :: line

    integer                         :: i, ascii

    integer, parameter              :: lower_a = iachar('a'), upper_a = iachar('A'), lower_z = iachar('z')

    do i = 1,len(line)
        ascii = iachar( line(i:i) )
        if ( ascii >= lower_a .and. ascii <= lower_z ) then
            ascii     = ascii + upper_a - lower_a
            line(i:i) = achar(ascii)
        endif
    enddo
end subroutine convert_line

subroutine count_vowels( line, number_vowels )
    character(len=*), intent(inout) :: line
    integer, intent(inout)          :: number_vowels

    integer                         :: i
    character(len=*), parameter     :: vowels = 'AEIOU'

    do i = 1,len(line)
        if ( index( vowels, line(i:i) ) > 0 ) then
            number_vowels = number_vowels + 1
        endif
    enddo
end subroutine count_vowels

end program pipeline

 

0 Kudos
Reply