Need Details/How To on using Studio 12.0 CoArrays (compile, run, etc.) ...

richard-walsh · ‎03-09-2011

All,
I recently installed the latest Intel Compiler suite on our
several cluster systems (version 12.0 released in January
of 2011). The installations completed without issue. Intel
is our default compiler, but we use OpenMPI as our default
version of MPI instead of Intel's MPI.
My interest is in getting a detailed/complete discussion of how
use CoArray Fortran (CAF) from within this 12.0 release which
is the first to fully support CAF, although this version is only
support using Intel's MPI as a communications conduit. I assume
that there is a How To somewhere on getting this to work, but
I have not found it.
Such a document would have to include:
1. Options to ifort to allow CAF constructs to be
interpreted by the compiler.
2. How to make sure the for the purpose of Intel
CAF runs Intel's MPI is used instead of our default.
3. How to properly invoke the CAF-ready executable
to use Intel's MPI. We would like to be able to do
this via the PBS Pro batch job scheduler.
I believe that the person at Intel in Don Gunning's
compiler group that understands this is Ron Green if
that is any help.
Moreover, this question comes from me and the original
author of CoArray Fortran Dr. Robert Numrich. We are
teaching a class on CoArray this week and intend to use both the
Cray XE6 and our SGI IB cluster (with the Intel 12.0 compiler
suite) to complete exercises in the language.
Please respond to my email at the CUNY HPC Center at
richard.walsh@csi.cuny.edu
Sincerely,
Richard Walsh
Parallel Applications and Systems Manager
CUNY HPC Center
612-382-4620
Richard Walsh
Parallel Applications and Systems Manager
CUNY HPC Center, Staten Island, NY
718-982-3319
612-382-4620

Ron_Green · ‎03-09-2011

Richard,

I am traveling this week and working long hours with customers this week. I will be in touch shortly OR have another person on my team get in touch with you.

ron

richard-walsh · ‎03-09-2011

Hey Ron,

Sounds good ... we are eager to have more than one platform

from which to run and develop CAF code here at the CUNY HPC

Center. We have this for UPCwith Berkeley UPC and the Cray's UPC.

Intel's option seems to bethe best choice currently for generic

cluster platforms.

By the way ...

Are you the Ron Green that took the UPC and CAF class from

me a few years back in Washington DC at the PGAS conference

there?

Thanks,

rbw

richard-walsh · ‎03-09-2011

Ron,

We have made some progress on this by manually starting

up the Intel MPI compute node daemons in the PBS jobs

script, although we are not sure how to shut them down after

startup.

Anyway, looking forward to your reply on the:

1. CAF compilation options and their meaning, and

2. A PBS script example that starts up the MPI daemons,

runs the CAF job, and then kills the daemons started.

Thanks,

rbw

Steven_L_Intel1 · ‎03-09-2011

The options are described in the documentation, though you should also read the compiler release notes as one of them changed a bit since the manuals were frozen.

Ron may be able to better comment on your other questions, though as I understand it, starting and stopping the daemons is outside the control of the Fortran program.

Ron_Green · ‎03-14-2011

Patrick Kennedy just wrote a good article on distributed memory CAF along with process pinning:

http://software.intel.com/en-us/articles/distributed-memory-coarray-programs-with-process-pinning/

We'd appreciate your comments on this article. Let us know if it at least provides enough 'getting started' information.

ron

richard-walsh · ‎03-14-2011

Ron/All,

Wish it were as simple as "reading the description of the options" in 'man ifort'

as someone suggested. We are not running Intel as the default MPI and there

have to build our Intel mpdboot ring manually in the PBS script. When we get

it figured out we will post a solution, but in the mean time ...

We are still struggling with this. Here one of my co-workers demonstrates that

an MPI code works (runs the job on the expect nodes and cores while correctly

understanding its rank), but a similar CAF program does not ... it seems to ignore

the mpd ring and and mpd.hosts file and selects its core count on the basis of

total number of cores per node.

Can you comment on this ... we will look at the posting that was just made to

see if it offers anything ...

Intel MPI.

1) mpi c code.
code:

/* C Example */
#include
#include

int main (argc, argv)
int argc;
char *argv[];
{
int rank, size;

char hostbuf[256];
gethostname(hostbuf,sizeof(hostbuf));

MPI_Init (&argc, &argv); /* starts MPI */
MPI_Comm_rank (MPI_COMM_WORLD, &rank); /* get current process id */
MPI_Comm_size (MPI_COMM_WORLD, &size); /* get number of processes */
printf( "Hello world from process %d of %d on %s\n", rank, size, hostbuf );
MPI_Finalize();
return 0;
}

compile:
/share/apps/intel/impi/4.0.1.007/intel64/bin/mpicc./hello.c -o exe

create mpd.hosts:
r1i0n1:4
r1i0n2:4
r1i0n3:4
r1i0n4:4

start mpd daemons ring:
mpdboot -n 5 --file=./mpd.hosts (starts 5 daemons -- one for each entry from mpd.hosts + one on masternode)

check:
mpdtrace -l
service0_50821 (10.148.0.1)
r1i0n4_57525 (10.148.0.13)
r1i0n3_51312 (10.148.0.12)
r1i0n1_58487 (10.148.0.10)
r1i0n2_42433 (10.148.0.11)

So ring is there and it is functional.

run helloworld:
mpiexec -l -machinefile mpd.hosts -n 16 ./exe
10: Hello world from process 10 of 16 on r1i0n3
11: Hello world from process 11 of 16 on r1i0n3
9: Hello world from process 9 of 16 on r1i0n3
4: Hello world from process 4 of 16 on r1i0n2
8: Hello world from process 8 of 16 on r1i0n3
7: Hello world from process 7 of 16 on r1i0n2
6: Hello world from process 6 of 16 on r1i0n2
5: Hello world from process 5 of 16 on r1i0n2
1: Hello world from process 1 of 16 on r1i0n1
2: Hello world from process 2 of 16 on r1i0n1
0: Hello world from process 0 of 16 on r1i0n1
15: Hello world from process 15 of 16 on r1i0n4
14: Hello world from process 14 of 16 on r1i0n4
13: Hello world from process 13 of 16 on r1i0n4
3: Hello world from process 3 of 16 on r1i0n1
12: Hello world from process 12 of 16 on r1i0n4

There are exactly 4 lines for each entry from mpd.hosts. Just as one should expect.

2) CAF code.
code:
program hello_image
character(len=80) host
integer status
integer me
integer N

N= num_images()
me = this_image()

status = hostnm(host)
if (status == 0) then
print *, "Hello from image ", me, " out of ", N, " on host ", trim(host)
end if
end program hello_image

compile it with:
ifort -coarray=distributed hello_image.f90 -o exe

set FOR_COARRAY_NUM_IMAGES variable
export FOR_COARRAY_NUM_IMAGES=16

create the same mpd.hosts as before:
r1i0n1:4
r1i0n2:4
r1i0n3:4
r1i0n4:4

check that mpd daemons ring is still there:
service0_50821 (10.148.0.1)
r1i0n4_57525 (10.148.0.13)
r1i0n3_51312 (10.148.0.12)
r1i0n1_58487 (10.148.0.10)
r1i0n2_42433 (10.148.0.11)

start CAF executable:
mpiexec -l -machinefile mpd.hosts ./test
0: Hello from image 12 out of 16 on host r1i0n2
0: Hello from image 14 out of 16 on host r1i0n2
0: Hello from image 15 out of 16 on host r1i0n2
0: Hello from image 9 out of 16 on host r1i0n2
0: Hello from image 16 out of 16 on host r1i0n2
0: Hello from image 10 out of 16 on host r1i0n2
0: Hello from image 11 out of 16 on host r1i0n2
0: Hello from image 13 out of 16 on host r1i0n2
0: Hello from image 2 out of 16 on host r1i0n1
0: Hello from image 6 out of 16 on host r1i0n1
0: Hello from image 7 out of 16 on host r1i0n1
0: Hello from image 1 out of 16 on host r1i0n1
0: Hello from image 4 out of 16 on host r1i0n1
0: Hello from image 3 out of 16 on host r1i0n1
0: Hello from image 5 out of 16 on host r1i0n1
0: Hello from image 8 out of 16 on host r1i0n1

It takes 8 cores from first node listed in mpd.hosts and 8 nodes from second. Others are ignored...

Seems like this behavior is not consistent and correct.

Eugene Dedits and Richard Walsh