- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The following example code divides a large work (NFC_tot) into chunks. The Images then pick up a piece and start working whenever available and also update a variable (which resides only in Image-1) to let other images know that piece is done. This update is done inside a critical section, as shown in the code. Image 1 does not work in this dummy example.
The problem with ifort coarray: only one image (randomly) picks up work and starts working. While other images never pick a work. As a result, only one image does all the pieces serially (As shown in the output).
gfortran (with OpenCoarrays) does not produce this bug.
I am using Ifort Classic (version-ifort (IFORT) 2021.7.0 20220726).
gfortran version( 9.4.0)
Code:
program IntelCoarrayBug
implicit none
integer :: NFC_tot
! -------------------------------------- Dynamic Schedule -------------------------------------- !
integer, allocatable, dimension(:,:), codimension[:] :: ChnkArr
integer :: MyChnkStrt[*]
integer :: Nchunk
integer, dimension(2) :: MyAssignedChunk
integer :: ch, strt, stp, &
& img, Chnklen, ChnkStride, ChnkStrt, &
& ChnkStop, Indxloop, PickFlag
! -------------------------------------- Dynamic Schedule -------------------------------------- !
NFC_tot = 1280
Nchunk = 22
Chnklen = NFC_tot / Nchunk
if ( this_image() == 1 ) write(*, 10) NFC_tot, Nchunk, Chnklen, num_images()
allocate( ChnkArr(3, Nchunk)[*] )
SYNC ALL
OnlyImg1: if ( this_image() == 1 ) then
! ---------========= Devide the FCs to be covered into chunks of size Nchunk =========--------- !
chnkLoop: do ch = 1, Nchunk
strt = ((ch - 1) * Chnklen) + 1
stp = strt + Chnklen - 1
if ( (ch == Nchunk) .and. (stp /= NFC_tot) ) stp = NFC_tot
ChnkArr(:, ch) = (/0, strt, stp/) !First element 0 => this chunk(strt->stop) not done
! 1 => this chunk(strt->stop) done
end do chnkLoop
! -------------================ Starting Position for each Image ================------------- !
ChnkStride = Nchunk / ( num_images() - 1 )
do img = 1, (num_images() - 1)
ChnkStrt = ((img - 1) * ChnkStride) + 1
ChnkStop = ChnkStrt + ChnkStride - 1
if ( (img == (num_images()-1) ) .and. (ChnkStop /= Nchunk) ) ChnkStop = Nchunk
! ** img+1 ** !
MyChnkStrt[img+1] = ChnkStrt
! ** img+1 ** !
end do
! -------------================ Starting Position for each Image ================------------- !
! ---------========== print the Chunk and Processor assignment information ==========---------- !
write(*, *)
write(*, 20) this_image()
writeLoop: do ch = 1, Nchunk
write(*, 30) ChnkArr(:, ch), (ChnkArr(3, ch)-ChnkArr(2, ch)+1)
end do writeLoop
write(*, *)
end if OnlyImg1
SYNC ALL
! * Broadcast to all images * !
call co_broadcast(ChnkArr, 1)
write(*, 40) this_image(), MyChnkStrt
call execute_command_line(' ')
SYNC ALL
! ---------========== print the Chunk and Processor assignment information ==========---------- !
if ( this_image() == 1 ) write(*, '(/)')
SYNC ALL
Indxloop = MyChnkStrt
! Image 1 will not work in this dummy test example, but all the other Images read value from ChnkArr
! which resides in image 1, and also update it when a particular chunk is done (with value 1) as
! implemented below
NotImage1: if ( this_image() /= 1 ) then
DynSchdl: do
AccessImg1: CRITICAL
PickFlag = ChnkArr(1, Indxloop) [1]
if ( PickFlag == 0 ) ChnkArr(1, Indxloop) [1] = 1
END CRITICAL AccessImg1
StartWork: if ( PickFlag == 0 ) then
MyAssignedChunk = ChnkArr(2:3, Indxloop)
write(*, 50) this_image(), ChnkArr(2, Indxloop), ChnkArr(3, Indxloop)
! Dummy work
call sleep( this_image() + &
& int( 0.01*(MyAssignedChunk(2)-MyAssignedChunk(1)) ) )
write(*, 60) this_image(), ChnkArr(2, Indxloop), ChnkArr(3, Indxloop)
end if StartWork
! Go to next chunk of ChnkArr in a circular manner and pickup work if that is not done
Indxloop = mod( Indxloop, Nchunk ) + 1
if ( Indxloop == MyChnkStrt ) EXIT DynSchdl
end do DynSchdl
end if NotImage1
SYNC ALL
10 FORMAT('NFC_tot = ', I6, ' , Nchunk = ', I6, ' , Chnklen = ', I6, ' Number of Images = ', I6)
20 FORMAT('Writing Chunks info from image ', I5)
30 FORMAT('[ ', 2(I6, ' , '), I6, ' ], len = ', I6)
40 FORMAT('Image No. = ', I5, ' Starting Point = ', I6)
50 FORMAT('Image No. = ', I5, ' : Started working on ', I6, ' => ', I6)
60 FORMAT('Image No. = ', I5, ' : Finished working on ', I6, ' => ', I6)
end program IntelCoarrayBug
ifort compilation: ifort -coarray=distributed -coarray-num-images=6 IntelCoarrayBug.f90
ifort result: ./a.out :
.
.
.
Image No. = 6 : Started working on 929 => 986
Image No. = 6 : Finished working on 929 => 986
Image No. = 6 : Started working on 987 => 1044
Image No. = 6 : Finished working on 987 => 1044
Image No. = 6 : Started working on 1045 => 1102
Image No. = 6 : Finished working on 1045 => 1102
Image No. = 6 : Started working on 1103 => 1160
Image No. = 6 : Finished working on 1103 => 1160
Image No. = 6 : Started working on 1161 => 1218
Image No. = 6 : Finished working on 1161 => 1218
.
.
.
gfortran compilation: gfortran -fcoarray=lib -L/opt/opencoarrays/lib IntelCoarrayBug.f90 -lcaf_mpi
gfortran result: mpirun -np 6 ./a.out :
.
.
.
Image No. = 5 : Started working on 697 => 754
Image No. = 4 : Started working on 465 => 522
Image No. = 6 : Started working on 929 => 986
Image No. = 3 : Started working on 233 => 290
Image No. = 2 : Started working on 1 => 58
Image No. = 2 : Finished working on 1 => 58
Image No. = 2 : Started working on 59 => 116
Image No. = 3 : Finished working on 233 => 290
Image No. = 3 : Started working on 291 => 348
Image No. = 4 : Finished working on 465 => 522
.
.
.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Try replacing:
AccessImg1: CRITICAL
PickFlag = ChnkArr(1, Indxloop) [1]
if ( PickFlag == 0 ) ChnkArr(1, Indxloop) [1] = 1
END CRITICAL AccessImg1
with:
call atomic_fetch_or(ChnkArr(1, indxloop)[1]), 1, PickFlag)
Jim Dempsey
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I just compiled and ran it successfully like this:
$ ifort --version
ifort (IFORT) 2021.7.0 20220726
Copyright (C) 1985-2022 Intel Corporation. All rights reserved.
$ ifort -coarray=shared -coarray-num-images=6 IntelCoarrayBug.f90
$ a.out
<snip>
Image No. = 6 : Started working on 929 => 986
Image No. = 3 : Started working on 233 => 290
Image No. = 5 : Started working on 697 => 754
Image No. = 3 : Finished working on 233 => 290
Image No. = 3 : Started working on 291 => 348
Image No. = 5 : Finished working on 697 => 754
Image No. = 5 : Started working on 755 => 812
Image No. = 6 : Finished working on 929 => 986
Image No. = 6 : Started working on 987 => 1044
Image No. = 2 : Started working on 1 => 58
Image No. = 3 : Finished working on 291 => 348
<snip>
Can you try "shared"?
Works with ifx using "shared", too!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Barbara_P_Intel ,
Thanks a lot for your response.
I will install the new version and try ifx also.
It also works for '-coarray=distributed' when the number of processors given is large enough.
For a small number of images, only one image does the work serially in my case.
Also, work distribution is not uniform for Intel Coarray case.
This output from my actual application:
Intel Coarray:
Image No. 3: No. of call :1
Image No. 7: No. of call :0
Image No. 9: No. of call :5
Image No. 11: No. of call :7
Image No. 15: No. of call :17
Image No. 2: No. of call :7
Image No. 4: No. of call :12
Image No. 6: No. of call :16
Image No. 8: No. of call :17
Image No. 10: No. of call :0
Image No. 12: No. of call :17
Image No. 14: No. of call :7
Image No. 16: No. of call :10
Image No. 5: No. of call :8
Image No. 13: No. of call :0
gfortran OpenCoarrays:
Image No. 2: No. of call :9
Image No. 3: No. of call :8
Image No. 4: No. of call :8
Image No. 5: No. of call :8
Image No. 6: No. of call :9
Image No. 7: No. of call :8
Image No. 8: No. of call :8
Image No. 9: No. of call :8
Image No. 10: No. of call :8
Image No. 11: No. of call :8
Image No. 12: No. of call :9
Image No. 13: No. of call :9
Image No. 14: No. of call :8
Image No. 15: No. of call :8
Image No. 16: No. of call :8
Image 7 and Image 13 never get any call in this particular run, for example. Only some Images get most of the call. Execution time is much higher as a result.
Can you give any insight about this?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Try replacing:
AccessImg1: CRITICAL
PickFlag = ChnkArr(1, Indxloop) [1]
if ( PickFlag == 0 ) ChnkArr(1, Indxloop) [1] = 1
END CRITICAL AccessImg1
with:
call atomic_fetch_or(ChnkArr(1, indxloop)[1]), 1, PickFlag)
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The Intel Fortran compilers implement CAF (coarray Fortran) using Intel MPI. For "distributed" some setup is involved. This article describes the setup, defining what servers you want to run on and how to compile.
hmmm... the article needs updating. I'll do that. In the meantime, skip Step 2.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I learned something new about the CAF implementation from the compiler developer that I thought I should share.
When you compile with "distributed" and with no configuration file like you are doing,
ifort -coarray=distributed -coarray-num-images=6 coarray.f90
at runtime the executable defaults to "shared" on the current machine.
I compiled with "distributed" (as above) and ran it. The output is fine and used all 6 images.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi all,
Thank you all for your comment.
I think the critical section is still needed as atomic_fetch_or can not avoid the race condition between images. ( See the output of the program below). I see no improvement with calling atomic_fetch_or inside the critical section.
I am trying to avoid atomic operations as other implementations do not have this feature yet, so portability may be an issue.
Thanks for the link to the article on intel caf.
program atomic_fetch_test
implicit none
integer :: atom [*]
integer :: read_atom
atom = 0
SYNC ALL
if ( this_image() /= 1 ) then
call atomic_fetch_or( atom[1], 1, read_atom )
write(*, *) 'Image number ', this_image(), ' reads atom value: ', read_atom
call execute_command_line(' ')
end if
SYNC ALL
if ( this_image() == 1) write(*, *) 'atom value on image 1 after operation of ', num_images(), &
& ' images : ', atom[1] , ' correct result = ', (num_images() - 1)
! result is undefined between runs
end program atomic_fetch_test
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Sorry,
It is bit wise or operation.
atomic_fetch_or serves the purpose correctly.
Thank you all.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
>>I think the critical section is still needed as atomic_fetch_or can not avoid the race condition between images.
Your test code is incorrect. There should be no race condition.
The atomic_fetch_or does not prevent a rank from or-ing a 1 with a 1. It is not intended to do this. The intention is for only one thread to receive the 0.
program atomic_fetch_test
implicit none
integer :: atom [*]
integer :: count [*]
integer :: read_atom
atom = 0
count = 0
SYNC ALL
if ( this_image() /= 1 ) then
call atomic_fetch_or( atom[1], 1, read_atom )
if(read_atom == 0) call atomic_add(count[1], 1)
write(*, *) 'Image number ', this_image(), ' reads atom value: ', read_atom
call execute_command_line(' ')
end if
SYNC ALL
if ( this_image() == 1) write(*, *) 'count value on image 1 after operation of ', num_images(), &
& ' images : ', count[1] , ' correct result = 1'
end program atomic_fetch_test
Expected results:
Only one rank reports 0 for read_atom, all others report 1
At end of program, count == 1
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Should this report a warning (or log to event file)?
Jim Dempsey

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page