- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Dear,
Below is a Coarray Fortran program that gives me some troubles:
A large vector (x) is updated in two different ways. For large sizes of x, the updates of x is wrong WHEN each image is on a different node. WHEN all images are on the same node, results are always fine, whatever the size of x.
When size(x)=10^6, exchanging the full array across images on different nodes led to wrong results. However, exchanging small subsets of x led to correct results.
When size(x)>2*10^7, exchanging the full array across images on different nodes led to wrong results, AND exchanging subsets of x (size(subset) > 6*10^6) led to wrong results too.
My troubles seem to be linked to the size of the array that is exchanged across images on different nodes. So, am I doing something wrong? Could it be a bug?
I use ifort 17.0.0 with -coarray=distributed.
Here is the program that mimicks the problem (it may be stupid, with too many sync all, .... , but it is to replicate my issue):
program testcoarray implicit none integer(kind=4)::i,j,k,neq integer(kind=4)::startrow
And here are the output for neq=1000000
*With all images on the same node:
Size of the array: 1000000
Number of images : 4First update : 2500000.00000000
Second update: 2500000.00000000
Correct value: 2500000.00000000
*With each image on a different node:
Size of the array: 1000000
Number of images : 4First update : 750000.000000000
Second update: 2500000.00000000
Correct value: 2500000.00000000
And here are the output for neq=25806732
*With all images on the same node:
Size of the array: 25806732
Number of images : 4First update : 64516830.0000000
Second update: 64516830.0000000
Correct value: 64516830.0000000
*With each image on a different node:
Size of the array: 25806732
Number of images : 4First update : 19355049.0000000
Second update: 6451727.00000000
Correct value: 64516830.0000000
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I did test your program successfully with gfortran 8.0.1 (experimental version) and OpenCoarrays 2.0.0 on a shared memory laptop computer. The results are:
Size of the array: 1000000 Number of images : 4 First update : 2500000.0000000000 Second update: 2500000.0000000000 Correct value: 2500000.0000000000
and
Size of the array: 25806732 Number of images : 4 First update : 64516830.000000000 Second update: 64516830.000000000 Correct value: 64516830.000000000
From this, I would say your program seems to be correct. Could be a compiler bug. But I would also ask what values the this_image() and num_images() intrinsics do give with your program executing each image on different computing nodes?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thank you Michael S. for your tests.
Regarding this_image() and num_images() on different compute nodes, both intrinsics give the expected values (i.e., num_images() returns 4 on all nodes, and this_images returns the ID of the image (from 1 to 4)). I tested it with success.
I will install OpenCoarray and test my test program on our HPC, before reporting a potential bug...
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
If you want to install (actually, it is not necessarily required to install it) OpenCoarrays on a cluster you may be required to use a simple 'trick', as it is described here:
https://groups.google.com/forum/#!topic/opencoarrays/sdUECeRNJo8
In case you need help with the installation, feel free to ask at the OpenCoarrays forum: https://groups.google.com/forum/#!forum/opencoarrays/join
cheers
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I installed OpenCoarrays using gcc 7.1.0 with the trick from your link (Thank you Michael S. for the trick!), compiled my program, and tested it on the HPC.
I assigned one image per node, and got the correct result!
Size of the array: 25806732
Number of images : 4First update : 64516830.000000000
Second update: 64516830.000000000
Correct value: 64516830.000000000
So, it really seems to be a bug of the the Intel compiler 17.0.0!
Thank you Michael S. for your help!
Jeremie

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page