- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
With ifort 15.0.0 for Linux, I've encountered an issue where images >= 2 are
unable to access (read) a coarray with cosubscript 1 until image 1 encounters
a subsequent image control statement. Since I'm quite new to coarrays, I'd
like to know if I'm misunderstanding a subtlety regarding possible coarray
behavior, or if this behavior is unexpected:
The following program demonstrates the issue:
program test_coarray implicit none integer :: i, X(2,2)
Images 2-4 apparenlty don't complete the read of X[1] until after image 1 has
reached the second SYNC IMAGES:
$ ifort -coarray=shared -coarray-num-images=4 test_coarray.f90 $ ./a.out 1 : after first sync 9.998000000000000E-003 2 : after first sync 1.099700000000000E-002 3 : after first sync 1.099700000000000E-002 4 : after first sync 1.099700000000000E-002 1 : before second sync 14.7887510000000 1 : after second sync 14.7887510000000 2 : before second sync 14.7897500000000 2 : after second sync 14.7897500000000 2 finished: 14.7897500000000 result: 5.00000000000000 3 : before second sync 14.7907500000000 3 : after second sync 14.7907500000000 4 : before second sync 14.7897500000000 4 : after second sync 14.7897500000000 4 finished: 14.7897500000000 result: 5.00000000000000 1 finished: 14.7887510000000 result: 4.32834934995579 3 finished: 14.7907500000000 result: 5.00000000000000
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I don't see any control synchronization issues with ifort 15.0.1. Perhaps what you observed was by chance? The order of syncing/finishing is indeterminate, but occasionally images 2-4 will make progress before image 1 reaches the second sync.
[U538012]$ ifort -V
Intel(R) Fortran Intel(R) 64 Compiler XE for applications running on Intel(R) 64, Version 15.0.1.133 Build 20141023
Copyright (C) 1985-2014 Intel Corporation. All rights reserved.
[pbkenned@dpdmic09 U538012]$ ifort -coarray=shared -coarray-num-images=4 U538012.f90 -o U538012.x
[pbkenned@dpdmic09 U538012]$ ./U538012.x
[U538012]$ ./U538012.x
3 : after first sync 8.997000000000000E-003
4 : after first sync 8.998000000000001E-003
1 : after first sync 1.199700000000000E-002
2 : after first sync 1.099800000000000E-002
1 : before second sync 13.1350020000000
2 : before second sync 13.1370010000000
2 : after second sync 13.1370010000000
3 : before second sync 13.1350020000000
3 : after second sync 13.1350020000000
4 : before second sync 13.1340020000000
4 : after second sync 13.1340020000000
4 finished: 13.1340020000000 result: 5.00000000000000
1 : after second sync 13.1350020000000
1 finished: 13.1350020000000 result: 4.32834934995579
2 finished: 13.1370010000000 result: 5.00000000000000
3 finished: 13.1350020000000 result: 5.00000000000000
Patrick
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Patrick,
While it your output demonstrates that image 4 may have made progress slightly before image 1 executed SYNC IMAGES(*), it still demonstrates the general problem in this code segment executed by images 2-4:
else SYNC IMAGES(1) call CPU_TIME(time) write (*,*) THIS_IMAGE(), ': after first sync', time X = X[1] call CPU_TIME(time) write (*,*) THIS_IMAGE(), ': before second sync', time
It takes images 2-4 about 13 seconds to execute the two write statements and the assignment between the calls to CPU_TIME(); this is apparent when the output is reordered thus:
1 : after first sync 1.199700000000000E-002
1 : before second sync 13.1350020000000
2 : after first sync 1.099800000000000E-002
2 : before second sync 13.1370010000000
3 : after first sync 8.997000000000000E-003
3 : before second sync 13.1350020000000
4 : after first sync 8.998000000000001E-003
4 : before second sync 13.1340020000000
It is at least counterintuitive that such one-sided "gets" of relatively little data would take that long. I've observed similar behavior in a larger code where where the "work" loop image 1 did took minutes---and images > 1 seem to stall until the time of the next synchronization with image 1 before completing their one-sided "gets" to retrieve the data needed to begin their work (if you wish, I could arrange for you to receive the code & data).
Many thanks for looking into this!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Nathan,
Thanks for the feedback, I understand what you're saying. I simplified the program by only doing one SYNC IMAGES() for each image:
[U538012]$ cat U538012-one-sync-img.f90
program test_coarray
implicit none
integer :: i, X(2,2)
double precision :: time, val = 1.0
if (THIS_IMAGE() == 1) then
X = 1
SYNC IMAGES(*)
call CPU_TIME(time)
write (*,*) THIS_IMAGE(), ': before image 1 loop', time
do i = 1,2**30
val = val + COS(DBLE(i))
end do
call CPU_TIME(time)
write (*,*) THIS_IMAGE(), ': after image 1 loop', time
else
SYNC IMAGES(1)
call CPU_TIME(time)
write (*,*) THIS_IMAGE(), ': before X = X[1]', time
X = X(:,:)[1]
call CPU_TIME(time)
write (*,*) THIS_IMAGE(), ': after X = X[1]', time
end if
call cpu_time(time)
write (*,*) THIS_IMAGE(), 'finished:', time, 'result:', SUM(X)+val
end program test_coarray
[U538012]$ ifort -coarray=shared -coarray-num-images=4 U538012-one-sync-img.f90 -o U538012-one-sync-img.x
[U538012]$ ./U538012-one-sync-img.x
1 : before image 1 loop 4.998000000000000E-003
2 : before X = X[1] 4.998000000000000E-003
3 : before X = X[1] 5.998000000000000E-003
4 : before X = X[1] 6.998000000000000E-003
1 : after image 1 loop 13.1639980000000
1 finished: 13.1639980000000 result: 4.32834934995579
2 : after X = X[1] 13.1629980000000
2 finished: 13.1629980000000 result: 5.00000000000000
3 : after X = X[1] 13.1639980000000
3 finished: 13.1639980000000 result: 5.00000000000000
4 : after X = X[1] 13.1649980000000
4 finished: 13.1649980000000 result: 5.00000000000000
[U538012]$
It's clear the images 2-4 all wait for image 1 to complete the 'work' loop and finish the 'SUM(X)+val' calculation before progressing. I'm not sure if this is expected or not, but certainly images 2-4 would not compute identical values for the 'result' if image 1 hadn't completed its 'work' loop. I'll inquire with the developers.
Thanks,
Patrick
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Apparently there have been some issues with SYNC IMAGES(*), in particular, when referencing your own image, ie, the test case image one code. I was encouraged to report this, which I have done, tracking ID DPD200365175. I'll keep this thread updated with any progress.
Patrick
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Nathan,
I noticed the same thing. The solution seems to be to set the asynchronous progress for the Intel MPI-library. You can do this by setting the environment variable MPICH_ASYNC_PROGRESS:
export MPICH_SYNC_PROGRESS=1
and your example code runs as expected:
$ MPICH_ASYNC_PROGRESS=1 ./forum
3 : before X = X[1] 7.998000000000000E-003
3 : after X = X[1] 7.998000000000000E-003
3 finished: 7.998000000000000E-003 result: 5.00000000000000
2 : before X = X[1] 6.998000000000000E-003
2 : after X = X[1] 7.998000000000000E-003
2 finished: 7.998000000000000E-003 result: 5.00000000000000
1 : before image 1 loop 7.998000000000000E-003
4 : before X = X[1] 5.998000000000000E-003
4 : after X = X[1] 5.998000000000000E-003
4 finished: 5.998000000000000E-003 result: 5.00000000000000
1 : after image 1 loop 28.5566580000000
1 finished: 28.5576570000000 result: 4.32834934995579
However, it does create another problem: the SYNC MEMORY statement (not in your code, but it is in mine) often hangs the application (apparently after references to coarray variables). I could circumvent the problem by using SYNC IMAGES(THIS_IMAGE()) as an alternative for SYNC MEMORY, which should have at least the same effect as a SYNC MEMORY statement according to the standard.
And one more warning: the asynchronous progress also changes the thread support of the used MPI-library from MPI_THREAD_FUNNELED to MPI_THREAD_MULTIPLE. This could cause lower MPI performance, if you're combining MPI and Coarray Fortran in one application.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi John,
Many thanks for sharing that info. Indeed, after setting the MPICH_ASYNC_PROGRESS environment variable, I was able to remove an extra set of SYNC ALL statements from our larger coarray program. While it's probably a separate issue, this larger program works only with I_MPI_FABRICS=shm:tcp , as "shm:dapl" or "shm:ofa" result in incorrect results or run-time segfaults.
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page