- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
In the trivial code below, all images perform some task, which output needs to be gathered on the first image (for further processing, for instance). Two strategies are implemented: the first one consists in having image 1 go hunt for the results on other images; the second one consists in all images writing to the local coarray of image 1.
My questions are the following:
- Is one of these two approaches more efficient over the other one? Intuitively one might say (2) ... but on the other hand nothing is really intuitive or obvious with parallel codes...
- Is the second one even standard compliant? In particular, I have in mind this paragraph in the Intel Compiler Fortran 14 documentation: "If a variable is defined on an image in a segment, it must not be referenced, defined, or become undefined in a segment on another image unless the segments are ordered.". Line 36 of the code forms a segment that is not ordered when executed by the code images, and yet these access the same array component of the same coarray variable... but not the same array location... In other words, if the programmer is careful to avoid overlap (that is, images writing by mistake to the same location) this *should* work... but maybe I am just lucky here (as it seems to work)?
Comments and suggestions would be very much appreciated.
Thanks!
PROGRAM P ! Declarations. IMPLICIT NONE TYPE T_DATA_CONTAINER INTEGER,ALLOCATABLE :: GLOBAL_RESULTS(:) INTEGER :: LOCAL_RESULTS END TYPE T_DATA_CONTAINER TYPE(T_DATA_CONTAINER) :: A
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Generally I see coarray programs take the path of having image 1 collect all the results. My feeling is that this is better than having other images update image 1. Note that it is best if you can do fewer cross-image transactions - read or write array sections rather than individual elements if possible.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks for sharing your insights on this Steve. I will do timing tests to assess the best option out of the two; I was more concerned about the possibility that (2) was illegal - from your answer I understand it's ok but maybe not the best in terms of efficiency.
Although I haven't tried to kick the tires with the latest beta yet (16), the Intel coarray implementation is slowly but surely gathering momentum (there are still bugs and performance issues of course) and this is great. Now we just need teams, events, and parallel I/O asap, ha ha :-) , as this would simplify considerably (considerably!) code architecture.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I have worked with and tested coarray programs that produce and need to gather up large amounts computed results. I have found direct communication between coarray program images to be significantly slower than the (clumsy-at-first-glace) method of having each image write its data to a throw-away temporary binary file and then have image 1 open/read each temp file and process the accumulated data as required. Image 1 is sync'd with the others at the point just after they've written and closed their temp file.
The difference in execution times I have observed are significant -- though they probably depend on local machine particulars and the amount of inter-image data that needs to be shared. Nevertheless, you may want to try this to see if you get the same performance that I have observed.
David
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I think your second approach is legal, just not optimal.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
David DiLaura wrote:
...I have found direct communication between coarray program images to be significantly slower than the (clumsy-at-first-glace) method of having each image write its data to a throw-away temporary binary file and then have image 1 open/read each temp file and process the accumulated data as required...
Note that "whether a named file on one image is the same as a file with the same name on another image is processor dependent". Consequently I don't think the approach described above is portable.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Ian,
Each file must be uniquely named. In my work I generate a name peculiar to the project and end it with 'QQ' (as is typically done to keep file names from coinciding with common words/phrases) and then the image number is turned into character(s) and appended to the end of the file name. Each image thus has it's own file, helping (some what) the efficiently.
David
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
That fragment of standard text means that there is no guarantee that a file operated on by one image is accessible from another.
In terms of implementation, there is no requirement that the images all be executing on machines that all see the same file system.
Equally there is no requirement that all images be executing on machines with completely isolated file systems. Uniqueness of the names of files being written to is hence required for portability, but it is not sufficient for connecting to files across images.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I believe the writing strategie (your second one) could be of advantage in other situations, when you would like to buffer the transfered values in the PGAS memory (coarrays) of a foreign image. (In your above example image 1 would be that foreign image). I mean, as far as my current understanding goes, if you go hunt a value from a foreign image, your current image has to wait with further processing until it has received that value. On the other hand, if you write to a foreign image, none image has necessarily to wait until transmission has completed. Actually, I use this, but don't use the values in PGAS memory (coarrays) directly in my program logic code. Rather I do copy the coarray values to completely local memory (non-coarray variables) before using them in my program logic code. Thus, that foreign image can do some further processing while the transmission takes place.
BTW, using derived type coarrays with static array components (instead of the allocatable one in your above example), does make the image-to-image writing much easier, because you don't have to care about the allocations on foreign images. Futher, this should also improve performance because your coarray would become symmetric. (see Aleksandar Donevs 'Rationale for Co-Arrays in Fortran 2008' for a good explanation).
best regards
michael

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page