I have a question about ordering of images when -coarray=distributed compiler option is used and the program is run on a cluster using IntelMPI libraries.
Assuming that the number of images is the same as the number of CPUs, are the images running on CPUs within the same node indexed by consecutive numbers?
I would like to exploit this structure (if there is one) to reduce RAM used by a program while keeping fast performance. There are a few large matrices which all the images need access to, and which are accessed many times. It is unsatisfactory to load these matrices on one image only because then the access from images in different nodes would be slow, but it is also not great to have a local copy of the data on every image (particularly if I also want to be able to run the program on PCs with limited RAM). What I would like is to have the data loaded by one image in every node, and all the images could then access the data at an image in the same node, which should be (almost?) as fast as if every image had its own copy of the data , but would require much less memory than that.
I'm sorry if this is something trivial, I couldn't find anything about ordering of images on clusters when CAF and IntelMPI are used - maybe there is no structure at all...
Nevermind, sorted the problem out. If someone else needs to do something similar, a helpful place to start is this guide which I didn't find before: https://software.intel.com/en-us/articles/distributed-memory-coarray-fortran-with-the-intel-fortran-compiler-for-linux-essential