Communication between asynchronous coarray images

David_DiLaura · ‎09-03-2012

Colleagues,

I'm attempting to upgrade a radiative transfer engineering application to take advantage of Coarray Fortran. (And I'm using Steve's recently posted code to determine the number of physical cores, since hyperthreading slows this application down). The code processes several thousand surfaces and performance improves considerably if I estimage the workload of each surface and produce separate subsets of surfaces, one subset for each image to process. But predicting workload is very difficult and the ratio of the greatest to least work between images remains as high as 2:1 A better distribution of work would reduce the overall execution time.

Idealy, I would like the images to run completely asynchronously, checking a constantly updated shared logical vector that records which surfaces have been processed. If each image begines at an appropriate offset into the list and flags a surface as done, then work load would NOT have to be predicted in advace. The images would simply work their way through the list, each surrface taking whatever time it requires, and each image updating (asynchronously) the logical vector. Something like the following:

AlreadyDone(1:10000)[1] = .false.
sync all

do i = NumCurrentImage,10000
if( AlreadyDone(i)[1] ) cycle
AlreadyDone(i)[1] = .true.
.
.
(computational work done here)
.
.
end do

But the 1st image runs away with the process. The 2nd and subsequent images work on their own first surface but when they check the vector all surfaces have been flagged as done. I take it that communcation between images is not fast enough to be used in this way. The difficulty can be demonstrated with a bit of code that generate a log file of the work done by each image (flux is just a throw-away to give the loop something to do):

AlreadyDone(1:10000)[1] = .false.
open(888,file = 'LogOfSurfaceWork['//char(48+NumCurrentImage)//'].dat' )
sync all
do i = this_image(),10000
write(888,*) i, AlreadyDone(i)[1]
if( AlreadyDone(i)[1] ) cycle
AlreadyDone(i)[1] = .true.
flux(i) = log(float(NumCurrentImage*i*j))*NumCurrentImage
end do
close(888)

If, instead of always reading/writing only the 1st instance of AlreadyDone, I have each image check the local instance and update all instances in the other images, then work is shared between images as I would expect. But the communcation time completely swamps any advantage of having multiple images. Have I missed something? (Or done something stupid?) Is there a better way to do what I'm attempting?

I am running Win7, Studio 2010, with Intel® Inspector XE 2011 Update 10, (build 233717)

P.S. Formating in the new Forum is a nightmare.

Steven_L_Intel1 · ‎09-05-2012

There is certainly a fair bit of "startup" time involved with coarrays, so if image 1 is allowed to continue doing work, it may indeed finish. You can add a SYNC ALL barrier to wait for all images to get to that point, but is it really worth it? If there is not enough work to do to make the general overhead of using coarrays worth it, then it's better to have the master image do it all. If there is more work, then the other images can chip in. Formatting issues are being addressed.