- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
All,I recently installed the latest Intel Compiler suite on ourseveral cluster systems (version 12.0 released in Januaryof 2011). The installations completed without issue. Intelis our default compiler, but we use OpenMPI as our defaultversion of MPI instead of Intel's MPI.My interest is in getting a detailed/complete discussion of howuse CoArray Fortran (CAF) from within this 12.0 release whichis the first to fully support CAF, although this version is onlysupport using Intel's MPI as a communications conduit. I assumethat there is a How To somewhere on getting this to work, butI have not found it.Such a document would have to include:1. Options to ifort to allow CAF constructs to beinterpreted by the compiler.2. How to make sure the for the purpose of IntelCAF runs Intel's MPI is used instead of our default.3. How to properly invoke the CAF-ready executableto use Intel's MPI. We would like to be able to dothis via the PBS Pro batch job scheduler.I believe that the person at Intel in Don Gunning'scompiler group that understands this is Ron Green ifthat is any help.Moreover, this question comes from me and the originalauthor of CoArray Fortran Dr. Robert Numrich. We areteaching a class on CoArray this week and intend to use both theCray XE6 and our SGI IB cluster (with the Intel 12.0 compilersuite) to complete exercises in the language.Please respond to my email at the CUNY HPC Center atSincerely,Richard WalshParallel Applications and Systems ManagerCUNY HPC Center612-382-4620Richard Walsh
Parallel Applications and Systems Manager
CUNY HPC Center, Staten Island, NY
718-982-3319
612-382-4620
Link Copied
6 Replies
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Richard,
I am traveling this week and working long hours with customers this week. I will be in touch shortly OR have another person on my team get in touch with you.
ron
I am traveling this week and working long hours with customers this week. I will be in touch shortly OR have another person on my team get in touch with you.
ron
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hey Ron,
Sounds good ... we are eager to have more than one platform
from which to run and develop CAF code here at the CUNY HPC
Center. We have this for UPCwith Berkeley UPC and the Cray's UPC.
Intel's option seems to bethe best choice currently for generic
cluster platforms.
By the way ...
Are you the Ron Green that took the UPC and CAF class from
me a few years back in Washington DC at the PGAS conference
there?
Thanks,
rbw
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Ron,
We have made some progress on this by manually starting
up the Intel MPI compute node daemons in the PBS jobs
script, although we are not sure how to shut them down after
startup.
Anyway, looking forward to your reply on the:
1. CAF compilation options and their meaning, and
2. A PBS script example that starts up the MPI daemons,
runs the CAF job, and then kills the daemons started.
Thanks,
rbw
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The options are described in the documentation, though you should also read the compiler release notes as one of them changed a bit since the manuals were frozen.
Ron may be able to better comment on your other questions, though as I understand it, starting and stopping the daemons is outside the control of the Fortran program.
Ron may be able to better comment on your other questions, though as I understand it, starting and stopping the daemons is outside the control of the Fortran program.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Patrick Kennedy just wrote a good article on distributed memory CAF along with process pinning:
http://software.intel.com/en-us/articles/distributed-memory-coarray-programs-with-process-pinning/
We'd appreciate your comments on this article. Let us know if it at least provides enough 'getting started' information.
ron
http://software.intel.com/en-us/articles/distributed-memory-coarray-programs-with-process-pinning/
We'd appreciate your comments on this article. Let us know if it at least provides enough 'getting started' information.
ron
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Ron/All,
Wish it were as simple as "reading the description of the options" in 'man ifort'
as someone suggested. We are not running Intel as the default MPI and there
have to build our Intel mpdboot ring manually in the PBS script. When we get
it figured out we will post a solution, but in the mean time ...
We are still struggling with this. Here one of my co-workers demonstrates that
an MPI code works (runs the job on the expect nodes and cores while correctly
understanding its rank), but a similar CAF program does not ... it seems to ignore
the mpd ring and and mpd.hosts file and selects its core count on the basis of
total number of cores per node.
Can you comment on this ... we will look at the posting that was just made to
see if it offers anything ...
Intel MPI.
1) mpi c code.
code:
/* C Example */
#include
#include
int main (argc, argv)
int argc;
char *argv[];
{
int rank, size;
char hostbuf[256];
gethostname(hostbuf,sizeof(hostbuf));
MPI_Init (&argc, &argv); /* starts MPI */
MPI_Comm_rank (MPI_COMM_WORLD, &rank); /* get current process id */
MPI_Comm_size (MPI_COMM_WORLD, &size); /* get number of processes */
printf( "Hello world from process %d of %d on %s\n", rank, size, hostbuf );
MPI_Finalize();
return 0;
}
compile:
/share/apps/intel/impi/4.0.1.007/intel64/bin/mpicc./hello.c -o exe
create mpd.hosts:
r1i0n1:4
r1i0n2:4
r1i0n3:4
r1i0n4:4
start mpd daemons ring:
mpdboot -n 5 --file=./mpd.hosts (starts 5 daemons -- one for each entry from mpd.hosts + one on masternode)
check:
mpdtrace -l
service0_50821 (10.148.0.1)
r1i0n4_57525 (10.148.0.13)
r1i0n3_51312 (10.148.0.12)
r1i0n1_58487 (10.148.0.10)
r1i0n2_42433 (10.148.0.11)
So ring is there and it is functional.
run helloworld:
mpiexec -l -machinefile mpd.hosts -n 16 ./exe
10: Hello world from process 10 of 16 on r1i0n3
11: Hello world from process 11 of 16 on r1i0n3
9: Hello world from process 9 of 16 on r1i0n3
4: Hello world from process 4 of 16 on r1i0n2
8: Hello world from process 8 of 16 on r1i0n3
7: Hello world from process 7 of 16 on r1i0n2
6: Hello world from process 6 of 16 on r1i0n2
5: Hello world from process 5 of 16 on r1i0n2
1: Hello world from process 1 of 16 on r1i0n1
2: Hello world from process 2 of 16 on r1i0n1
0: Hello world from process 0 of 16 on r1i0n1
15: Hello world from process 15 of 16 on r1i0n4
14: Hello world from process 14 of 16 on r1i0n4
13: Hello world from process 13 of 16 on r1i0n4
3: Hello world from process 3 of 16 on r1i0n1
12: Hello world from process 12 of 16 on r1i0n4
There are exactly 4 lines for each entry from mpd.hosts. Just as one should expect.
2) CAF code.
code:
program hello_image
character(len=80) host
integer status
integer me
integer N
N= num_images()
me = this_image()
status = hostnm(host)
if (status == 0) then
print *, "Hello from image ", me, " out of ", N, " on host ", trim(host)
end if
end program hello_image
compile it with:
ifort -coarray=distributed hello_image.f90 -o exe
set FOR_COARRAY_NUM_IMAGES variable
export FOR_COARRAY_NUM_IMAGES=16
create the same mpd.hosts as before:
r1i0n1:4
r1i0n2:4
r1i0n3:4
r1i0n4:4
check that mpd daemons ring is still there:
service0_50821 (10.148.0.1)
r1i0n4_57525 (10.148.0.13)
r1i0n3_51312 (10.148.0.12)
r1i0n1_58487 (10.148.0.10)
r1i0n2_42433 (10.148.0.11)
start CAF executable:
mpiexec -l -machinefile mpd.hosts ./test
0: Hello from image 12 out of 16 on host r1i0n2
0: Hello from image 14 out of 16 on host r1i0n2
0: Hello from image 15 out of 16 on host r1i0n2
0: Hello from image 9 out of 16 on host r1i0n2
0: Hello from image 16 out of 16 on host r1i0n2
0: Hello from image 10 out of 16 on host r1i0n2
0: Hello from image 11 out of 16 on host r1i0n2
0: Hello from image 13 out of 16 on host r1i0n2
0: Hello from image 2 out of 16 on host r1i0n1
0: Hello from image 6 out of 16 on host r1i0n1
0: Hello from image 7 out of 16 on host r1i0n1
0: Hello from image 1 out of 16 on host r1i0n1
0: Hello from image 4 out of 16 on host r1i0n1
0: Hello from image 3 out of 16 on host r1i0n1
0: Hello from image 5 out of 16 on host r1i0n1
0: Hello from image 8 out of 16 on host r1i0n1
It takes 8 cores from first node listed in mpd.hosts and 8 nodes from second. Others are ignored...
Seems like this behavior is not consistent and correct.
Eugene Dedits and Richard Walsh
1) mpi c code.
code:
/* C Example */
#include
#include
int main (argc, argv)
int argc;
char *argv[];
{
int rank, size;
char hostbuf[256];
gethostname(hostbuf,sizeof(hostbuf));
MPI_Init (&argc, &argv); /* starts MPI */
MPI_Comm_rank (MPI_COMM_WORLD, &rank); /* get current process id */
MPI_Comm_size (MPI_COMM_WORLD, &size); /* get number of processes */
printf( "Hello world from process %d of %d on %s\n", rank, size, hostbuf );
MPI_Finalize();
return 0;
}
compile:
/share/apps/intel/impi/4.0.1.007/intel64/bin/mpicc./hello.c -o exe
create mpd.hosts:
r1i0n1:4
r1i0n2:4
r1i0n3:4
r1i0n4:4
start mpd daemons ring:
mpdboot -n 5 --file=./mpd.hosts (starts 5 daemons -- one for each entry from mpd.hosts + one on masternode)
check:
mpdtrace -l
service0_50821 (10.148.0.1)
r1i0n4_57525 (10.148.0.13)
r1i0n3_51312 (10.148.0.12)
r1i0n1_58487 (10.148.0.10)
r1i0n2_42433 (10.148.0.11)
So ring is there and it is functional.
run helloworld:
mpiexec -l -machinefile mpd.hosts -n 16 ./exe
10: Hello world from process 10 of 16 on r1i0n3
11: Hello world from process 11 of 16 on r1i0n3
9: Hello world from process 9 of 16 on r1i0n3
4: Hello world from process 4 of 16 on r1i0n2
8: Hello world from process 8 of 16 on r1i0n3
7: Hello world from process 7 of 16 on r1i0n2
6: Hello world from process 6 of 16 on r1i0n2
5: Hello world from process 5 of 16 on r1i0n2
1: Hello world from process 1 of 16 on r1i0n1
2: Hello world from process 2 of 16 on r1i0n1
0: Hello world from process 0 of 16 on r1i0n1
15: Hello world from process 15 of 16 on r1i0n4
14: Hello world from process 14 of 16 on r1i0n4
13: Hello world from process 13 of 16 on r1i0n4
3: Hello world from process 3 of 16 on r1i0n1
12: Hello world from process 12 of 16 on r1i0n4
There are exactly 4 lines for each entry from mpd.hosts. Just as one should expect.
2) CAF code.
code:
program hello_image
character(len=80) host
integer status
integer me
integer N
N= num_images()
me = this_image()
status = hostnm(host)
if (status == 0) then
print *, "Hello from image ", me, " out of ", N, " on host ", trim(host)
end if
end program hello_image
compile it with:
ifort -coarray=distributed hello_image.f90 -o exe
set FOR_COARRAY_NUM_IMAGES variable
export FOR_COARRAY_NUM_IMAGES=16
create the same mpd.hosts as before:
r1i0n1:4
r1i0n2:4
r1i0n3:4
r1i0n4:4
check that mpd daemons ring is still there:
service0_50821 (10.148.0.1)
r1i0n4_57525 (10.148.0.13)
r1i0n3_51312 (10.148.0.12)
r1i0n1_58487 (10.148.0.10)
r1i0n2_42433 (10.148.0.11)
start CAF executable:
mpiexec -l -machinefile mpd.hosts ./test
0: Hello from image 12 out of 16 on host r1i0n2
0: Hello from image 14 out of 16 on host r1i0n2
0: Hello from image 15 out of 16 on host r1i0n2
0: Hello from image 9 out of 16 on host r1i0n2
0: Hello from image 16 out of 16 on host r1i0n2
0: Hello from image 10 out of 16 on host r1i0n2
0: Hello from image 11 out of 16 on host r1i0n2
0: Hello from image 13 out of 16 on host r1i0n2
0: Hello from image 2 out of 16 on host r1i0n1
0: Hello from image 6 out of 16 on host r1i0n1
0: Hello from image 7 out of 16 on host r1i0n1
0: Hello from image 1 out of 16 on host r1i0n1
0: Hello from image 4 out of 16 on host r1i0n1
0: Hello from image 3 out of 16 on host r1i0n1
0: Hello from image 5 out of 16 on host r1i0n1
0: Hello from image 8 out of 16 on host r1i0n1
It takes 8 cores from first node listed in mpd.hosts and 8 nodes from second. Others are ignored...
Seems like this behavior is not consistent and correct.
Eugene Dedits and Richard Walsh

Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page