Intel® MPI Library
Get help with building, analyzing, optimizing, and scaling high-performance computing (HPC) applications.
2154 Discussions

Where is the /Qcoarray:distributed option?

obmeninfor
Beginner
2,437 Views

Hi All,

I have installed the Intel Cluster Studio XE 2012 for Windows (file "w_ics_2012.0.033.exe") using Evaluation license file received from Intel, but I cant evaluate the cluster work of Fortran in the Properties of my new project (RClick --> Properties --> Configuration Properties --> Fortran --> Language --> Enable Coarrays) I don't see the option for Distributed Memory (/Qcoarray:distributed) only "No" and "For Shared Memory (/Qcoarray:shared)" for both Win32 and x64 solution platforms.

My cluster system consists of 2 computers:
1) Head node: Windows Server 2008 R2 with SP1 + HPC Pack 2008 R2 with SP3 + Visual Studio 2010 with SP1;
2) Workstation node: Windows 7 (x64) with SP1 + HPC Pack 2008 R2 with SP3.

The Intel Cluster Studio was being installed on the head node, but automatically it was installed on the workstation node too.

If I insert the /Qcoarray:distributed option manually (RClick --> Properties --> Configuration Properties --> Fortran --> Command Line --> Additional Options: /Qcoarray:distributed), a test program works on the head node only, although the corresponding machines.Windows file (the environment (system) variable FOR_COARRAY_MACHINEFILE is assigned) has 2 lines with the computer node names.

The result of command "clusrun smpd status" is

----- Summary ----
2 Nodes succeeded
0 Nodes failed

What is wrong and what should I do to see the "/Qcoarray:distributed" option?

0 Kudos
31 Replies
Steven_L_Intel1
Employee
1,783 Views
Please ask Windows Fortran questions in the Windows Fortran forum - questions posted elsewhere may not get proper attention. For information on how to build and run a coarray program on a Windows cluster, please read this article (linked from the compiler release notes), though you may have already seen this. A critical aspect is that the full path to the executable must be valid on all nodes of the cluster. This means that the same drive letters must be also defined on the cluster nodes.

If you still need help, please post in the Windows Fortran forum.
0 Kudos
obmeninfor
Beginner
1,783 Views

Thanks for your advice, but I think my question concerns the Cluster Studio environment/installer or/and integration of the Cluster Studio into the Visual Studio rather than the Fortran compiler the compiler is working well.

In addition to the integration: if I assign a file name in the MPI Configuration File option (RClick --> Properties --> Configuration Properties --> Fortran --> Language -->MPI Configuration File), for example, MPIConfigFile, the compiler looks for MPIConfigFile\\Node0\CcpSpoolDir\Coar1\x64\Debug\Coar1.exe, where \\Node0\CcpSpoolDir\ is a shared directory on the head node accessible to the workstation node, and \\Node0\CcpSpoolDir\Coar1\x64\Debug\ is the correct executable file (Coar1.exe) path. The result of compilation: Can't open config file MPIConfigFile\\Node0\CcpSpoolDir\Coar1\x64\Debug\Coar1.exe: No such file or directory.

0 Kudos
Steven_L_Intel1
Employee
1,783 Views
Intel Cluster Studio is a bundle of products, one of which is Intel Visual Forttan. It is the Fortran product that provides the VS integration and the coarray features.

I would urge you to try the experiments from the command line as described in the article I pointed to. I'm not sure how thoroughly we tested the VS integration for distributed coarray support. I will ask our developer who worked most with this support to read this thread and see what she can suggest.
0 Kudos
obmeninfor
Beginner
1,783 Views

Of course I had read that article before I asked the question here (corresponding link to the article is in the documentation).

Below are the results of some experiments with the "coarray_samples" sample included in the software. WinSer2008R2 is the head node, Win7 is the workstation node. The "Additional Options" for the compiler: /Qcoarray:distributed /Qcoarray-num-images:8

1) Start from the Visual Studio: Debug --> Start Debugging
Result: task hangs.
Ctrl^C gives:

mpiexec aborting job...

job aborted:
rank: node: exit code[: error message]
0: WinSer2008R2.mynet.dom: 123: mpiexec aborting job
1: Win7: 123
2: WinSer2008R2.mynet.dom: 123
3: Win7: 123
4: WinSer2008R2.mynet.dom: 123
5: Win7: 123
6: WinSer2008R2.mynet.dom: 123
7: Win7: 123

2) Command: mpiexec -host WinSer2008R2 -n 3 -genv FOR_ICAF_STATUS launched -genv I_MPI_DEBUG +5 \\WinSer2008R2\CcpSpoolDir\coarray_samples\x64\Debug\coarray_samples.exe
Result: OK:

[1#6880:5060@WinSer2008R2] MPI startup(): shm data transfer mode
[2#2040:6184@WinSer2008R2] MPI startup(): shm data transfer mode
[0#5100:7192@WinSer2008R2] MPI startup(): shm data transfer mode
[2#2040:6184@WinSer2008R2] MPI startup(): process is pinned to CPU02 on node WinSer2008R2
[0#5100:7192@WinSer2008R2] MPI startup(): process is pinned to CPU00 on node WinSer2008R2
[1#6880:5060@WinSer2008R2] MPI startup(): process is pinned to CPU01 on node WinSer2008R2

[0#5100:7192@WinSer2008R2] Rank Pid Node name Pin cpu
[0#5100:7192@WinSer2008R2] 0 5100 WinSer2008R2 0
[0#5100:7192@WinSer2008R2] 1 6880 WinSer2008R2 1
[0#5100:7192@WinSer2008R2] 2 2040 WinSer2008R2 2
[0#5100:7192@WinSer2008R2] MPI startup(): I_MPI_DEBUG=+5
[0#5100:7192@WinSer2008R2] MPI startup(): NUMBER_OF_PROCESSORS=4
[0#5100:7192@WinSer2008R2] MPI startup(): PROCESSOR_IDENTIFIER=Intel64 Family 6 Model 23 Stepping 7, GenuineIntel

Hello from image 2 out of 3 total images
Hello from image 1 out of 3 total images
Hello from image 3 out of 3 total images

3) Command:mpiexec -host Win7 -n 3 -genv FOR_ICAF_STATUS launched -genv I_MPI_DEBUG +5 \\WinSer2008R2\CcpSpoolDir\coarray_samples\x64\Debug\coarray_samples.exe
Result: OK:

[2#14200:10520@Win7] MPI startup(): shm data transfer mode
[0#14816:3836@Win7] MPI startup(): shm data transfer mode
[1#11572:4816@Win7] MPI startup(): shm data transfer mode
[2#14200:10520@Win7] MPI startup(): set domain to {4,5} on node Win7
[0#14816:3836@Win7] MPI startup(): set domain to {0,1} on node Win7
[1#11572:4816@Win7] MPI startup(): set domain to {2,3} on node Win7

[0#14816:3836@Win7] RankPid Node name Pin cpu
[0#14816:3836@Win7] 014816 Win7 {0,1}
[0#14816:3836@Win7] 111572 Win7 {2,3}
[0#14816:3836@Win7] 214200 Win7 {4,5}
[0#14816:3836@Win7] MPI startup(): I_MPI_DEBUG=+5
[0#14816:3836@Win7] MPI startup(): NUMBER_OF_PROCESSORS=8
[0#14816:3836@Win7] MPI startup(): PROCESSOR_IDENTIFIER=Intel64 Family 6 Model 42 Stepping 7, GenuineIntel

Hello from image 2 out of 3 total images
Hello from image 3 out of 3 total images
Hello from image 1 out of 3 total images

4) Command:mpiexec -hosts 2 WinSer2008R2 3 Win7 3 -genv FOR_ICAF_STATUS launched -genv I_MPI_DEBUG +5 \\WinSer2008R2\CcpSpoolDir\coarray_samples\x64\Debug\coarray_samples.exe
Result: task hangs.
Ctrl^C gives:

[0#7356:5672@WinSer2008R2] MPI startup(): shm and tcp data transfer modes
[1#7588:7328@WinSer2008R2] MPI startup(): shm and tcp data transfer modes
[2#7752:7672@WinSer2008R2] MPI startup(): shm and tcp data transfer modes
[3#15240:15828@Win7] MPI startup(): shm and tcp data transfer modes
[5#13376:14660@Win7] MPI startup(): shm and tcp data transfer modes
[4#13488:13232@Win7] MPI startup(): shm and tcp data transfer modes
mpiexec aborting job...

job aborted:
rank: node: exit code[: error message]
0: WinSer2008R2.mynet.dom: 123: mpiexec aborting job
1: WinSer2008R2.mynet.dom: 123
2: WinSer2008R2.mynet.dom: 123
3: Win7: 123
4: Win7: 123
5: Win7: 123


As it seems from item 4, I have some problems with tcp. What should I need to check and adjust?

Thanks

0 Kudos
Lorri_M_Intel
Employee
1,783 Views
Hi --

I just wanted to let you know that Steve pointed me at this thread; I was the lucky developer who hooked
up DCAF on Windows.

I have reproduced the situation you've found, and am looking at how to resolve it. Note, I've reproduced it in a straight mpiprogram; no coarrays to be seen, so that complication is removed.

You did put this in the right forum; there are some really good people here, and actually, you might see a question I post too, looking for help to resolve this.

As an aside,I can use a machinefile if there is only one node in the file; it doesn't have to be this current node, so yeah, I have to agree with you that there is an interesting configuration issue.

By the way; this link (also in this forum)has some interesting info:
http://software.intel.com/en-us/forums/showthread.php?t=81922

I'll post more as I learn more ---

Thanks for using the Windows DCAF -

--Lorri


0 Kudos
obmeninfor
Beginner
1,783 Views

Hi Lorri,

Thank you for your time.
I have tried the straight mpi program too (although it is not a subject of this thread) test.f90 included in the software but the result shown below is the same: two nodes (item 3) do not work together.

1) Command: mpiexec -host WinSer2008R2 -n 3 -genv I_MPI_DEBUG +5 \\WinSer2008R2\CcpSpoolDir\test\x64\Debug\test.exe
Result: OK:

[1#5168:2528@WinSer2008R2] MPI startup(): shm data transfer mode
[0#5612:2732@WinSer2008R2] MPI startup(): shm data transfer mode
[2#5744:3288@WinSer2008R2] MPI startup(): shm data transfer mode
[2#5744:3288@WinSer2008R2] MPI startup(): Internal info: pinning initialization was done
[0#5612:2732@WinSer2008R2] MPI startup(): Internal info: pinning initialization was done
[1#5168:2528@WinSer2008R2] MPI startup(): Internal info: pinning initialization was done

[0#5612:2732@WinSer2008R2] MPI startup(): Rank Pid Node name Pin cpu
[0#5612:2732@WinSer2008R2] MPI startup(): 0 5612 WinSer2008R2 0
[0#5612:2732@WinSer2008R2] MPI startup(): 1 5168 WinSer2008R2 1
[0#5612:2732@WinSer2008R2] MPI startup(): 2 5744 WinSer2008R2 2
[0#5612:2732@WinSer2008R2] MPI startup(): I_MPI_DEBUG=+5
[0#5612:2732@WinSer2008R2] MPI startup(): I_MPI_PIN_MAPPING=3:0 0,1 1,2 2
[0#5612:2732@WinSer2008R2] MPI startup(): PMI_RANK=0

Hello world: rank 0 of 3 running on
WinSer2008R2.mynet.dom

Hello world: rank 1 of 3 running on
WinSer2008R2.mynet.dom

Hello world: rank 2 of 3 running on
WinSer2008R2.mynet.dom


2) Command: mpiexec -host Win7 -n 3 -genv I_MPI_DEBUG +5 \\WinSer2008R2\CcpSpoolDir\test\x64\Debug\test.exe
Result: OK:

[1#11724:10472@Win7] MPI startup(): shm data transfer mode
[0#11556:5968@Win7] MPI startup(): shm data transfer mode
[2#9576:1500@Win7] MPI startup(): shm data transfer mode
[1#11724:10472@Win7] MPI startup(): Internal info: pinning initialization was done
[0#11556:5968@Win7] MPI startup(): Internal info: pinning initialization was done
[2#9576:1500@Win7] MPI startup(): Internal info: pinning initialization was done
[0#11556:5968@Win7] MPI startup(): Rank Pid Node name Pin cpu
[0#11556:5968@Win7] MPI startup(): 0 11556 Win7 {0,1}
[0#11556:5968@Win7] MPI startup(): 1 11724 Win7 {2,3}
[0#11556:5968@Win7] MPI startup(): 2 9576 Win7 {4,5}
[0#11556:5968@Win7] MPI startup(): I_MPI_DEBUG=+5
[0#11556:5968@Win7] MPI startup(): I_MPI_PIN_MAPPING=3:0 0,1 2,2 4
[0#11556:5968@Win7] MPI startup(): PMI_RANK=0

Hello world: rank 0 of 3 running on
Win7.mynet.dom

Hello world: rank 1 of 3 running on
Win7.mynet.dom

Hello world: rank 2 of 3 running on
Win7.mynet.dom

3) Command: mpiexec -hosts 2 WinSer2008R2 3 Win7 3 -genv I_MPI_DEBUG +5 \\WinSer2008R2\CcpSpoolDir\test\x64\Debug\test.exe
Result: task hangs.
Ctrl^C gives:

[2#4792:4804@WinSer2008R2] MPI startup(): shm and tcp data transfer modes
[0#3356:3696@WinSer2008R2] MPI startup(): shm and tcp data transfer modes
[1#5956:6004@WinSer2008R2] MPI startup(): shm and tcp data transfer modes
[4#10340:5736@Win7] MPI startup(): shm and tcp data transfer modes
[5#9228:9816@Win7] MPI startup(): shm and tcp data transfer modes
[3#11112:8964@Win7] MPI startup(): shm and tcp data transfer modes
mpiexec aborting job...

job aborted:
rank: node: exit code[: error message]
0: WinSer2008R2.mynet.dom: 123: mpiexec aborting job
1: WinSer2008R2.mynet.dom: 123
2: WinSer2008R2.mynet.dom: 123
3: Win7: 123
4: Win7: 123
5: Win7: 123

In accordance with the advice in http://software.intel.com/en-us/forums/showthread.php?t=81922 you had referred to, I used -genv I_MPI_PLATFORM 0, and added the DNS suffix to the node names in the mpiexec command it did not help.


Thanks

0 Kudos
James_T_Intel
Moderator
1,783 Views
Hi obmeninfor,

I've seen similar behavior while doing some testing for a different issue. Could you try running the following commands?

[plain]mpiexec -genvnone -hosts 2 WinSer2008R2 3 Win7 3 -genv I_MPI_DEBUG +5 hostname
mpiexec -genvnone -hosts 2 WinSer2008R2 3 Win7 3 -genv I_MPI_DEBUG +5 \WinSer2008R2CcpSpoolDirtestx64Debugtest.exe

[/plain]
Adding -genvnone is a quick check to prevent copying the environment variables from one system to another. If the MPI installations are in different locations on each computer, the environment variables from one will prevent it from being located on the other. See the thread http://software.intel.com/en-us/forums/showthread.php?t=85990&o=a&s=lr for more detail on the mismatch.

The first command will just insure that you can run across multiple hosts simultaneously. The second will insure that the processes can communicate with each other. Please let me know what happens from these commands.

Sincerely,
James Tullos
Technical Consulting Engineer
Intel Cluster Tools
0 Kudos
obmeninfor
Beginner
1,783 Views

Hi James,

Thank you for your advice.
The results of the commands:

1.mpiexec -genvnone -hosts 2 WinSer2008R2 3 Win7 3 -genv I_MPI_DEBUG +5 hostname
Result:

WinSer2008R2
WinSer2008R2
WinSer2008R2
Win7
Win7
Win7

2. mpiexec -genvnone -hosts 2 WinSer2008R2 3 Win7 3 -genv I_MPI_DEBUG +5 \\WinSer2008R2\CcpSpoolDir\test\x64\Debug\test.exe
Result: task hangs.
Ctrl^C gives:

[2#1052:4080@WinSer2008R2] MPI startup(): shm and tcp data transfer modes
[0#780:3824@WinSer2008R2] MPI startup(): shm and tcp data transfer modes
[1#4788:4988@WinSer2008R2] MPI startup(): shm and tcp data transfer modes
[3#12960:13116@Win7] MPI startup(): shm and tcp data transfer modes
[5#3604:9880@Win7] MPI startup(): shm and tcp data transfer modes
[4#12716:10456@Win7] MPI startup(): shm and tcp data transfer modes
mpiexec aborting job...

job aborted:
rank: node: exit code[: error message]
0: WinSer2008R2.mynet.dom: 123: mpiexec aborting job
1: WinSer2008R2.mynet.dom: 123
2: WinSer2008R2.mynet.dom: 123
3: Win7: 123
4: Win7: 123
5: Win7: 123


The MPIlocation is "c:\Program Files (x86)\Intel\MPI" on each node.


Thanks

0 Kudos
James_T_Intel
Moderator
1,783 Views
Hi obmeninfor,

I believe that the problem you are experiencing is due to your firewall. As one more check, please allow the program test.exe through your firewall on both computers, and try running the second command again. You can leave off the -genvnone option, it should have no effect here.

Sincerely,
James Tullos
Technical Consulting Engineer
Intel Cluster Tools
0 Kudos
obmeninfor
Beginner
1,783 Views

Hi James,

The firewall rule for the program was Enabled on WinSer2008R2. Adding the similar rule on Win7 changed the output but didn't change the result.

Command: mpiexec -hosts 2 WinSer2008R2 3 Win7 3 -genv I_MPI_DEBUG +5 \\WinSer2008R2\CcpSpoolDir\test\x64\Debug\test.exe
Result: task hangs.
Ctrl^C gives:

[0#5924:6008@WinSer2008R2] MPI startup(): shm and tcp data transfer modes
[2#288:1144@WinSer2008R2] MPI startup(): shm and tcp data transfer modes
[1#5596:3536@WinSer2008R2] MPI startup(): shm and tcp data transfer modes
[4#3768:5408@Win7] MPI startup(): shm and tcp data transfer modes
[5#4036:736@Win7] MPI startup(): shm and tcp data transfer modes
[3#1236:4204@Win7] MPI startup(): shm and tcp data transfer modes

[2#288:1144@WinSer2008R2] MPI startup(): Internal info: pinning initialization was done
[4#3768:5408@Win7] MPI startup(): Internal info: pinning initialization was done
[0#5924:6008@WinSer2008R2] MPI startup(): Internal info: pinning initialization was done
[1#5596:3536@WinSer2008R2] MPI startup(): Internal info: pinning initialization was done
[5#4036:736@Win7] MPI startup(): Internal info: pinning initialization was done
[3#1236:4204@Win7] MPI startup(): Internal info: pinning initialization was done
mpiexec aborting job...

job aborted:
rank: node: exit code[: error message]
0: WinSer2008R2.mynet.dom: 123: mpiexec aborting job
1: WinSer2008R2.mynet.dom: 123
2: WinSer2008R2.mynet.dom: 123
3: Win7: 123
4: Win7: 123
5: Win7: 123


Thanks

0 Kudos
James_T_Intel
Moderator
1,783 Views
Hi obmeninfor,

Do you also have your firewalls set to allow smpd and mpiexec? Are you using the native Windows* firewall, or a different one?

Sincerely,
James Tullos
Technical Consulting Engineer
Intel Cluster Tools
0 Kudos
obmeninfor
Beginner
1,783 Views

Hi James,

I removed-genv I_MPI_DEBUG +5 from the command and the straight mpi (test.exe) began to work (without the firewalls set to smpd and mpiexec)! Thank you very much for your previous advice about the firewall.

The behavior of the coarray_samples program (see above) changed as well, but one problem remains the program does not terminate:

1) start from VS: Debug --> Start Debugging
Result: the program prints 8 lines Hello (itswork) and hangs on both computers:

Hello from image 3 out of 8 total images
Hello from image 1 out of 8 total images
Hello from image 7 out of 8 total images
Hello from image 5 out of 8 total images
Hello from image 2 out of 8 total images
Hello from image 6 out of 8 total images
Hello from image 8 out of 8 total images
Hello from image 4 out of 8 total images

Ctrl^C gives:

mpiexec aborting job...

job aborted:
rank: node: exit code[: error message]
0: WinSer2008R2.mynet.dom: 123: mpiexec aborting job
1: Win7: 123
2: WinSer2008R2.mynet.dom: 123
3: Win7: 123
4: WinSer2008R2.mynet.dom: 123
5: Win7: 123
6: WinSer2008R2.mynet.dom: 123
7: Win7: 123

2) command: mpiexec -hosts 2 WinSer2008R2 4 Win7 4 -genv FOR_ICAF_STATUS launched \\WinSer2008R2\CcpSpoolDir\coarray_samples\x64\Debug\coarray_samples.exe
Result: the program does its main work (8 lines Hello ) and hangs on both computers:

Hello from image 3 out of 8 total images
Hello from image 1 out of 8 total images
Hello from image 2 out of 8 total images
Hello from image 7 out of 8 total images
Hello from image 8 out of 8 total images
Hello from image 6 out of 8 total images
Hello from image 5 out of 8 total images
Hello from image 4 out of 8 total images

Ctrl^C gives:

mpiexec aborting job...

job aborted:
rank: node: exit code[: error message]
0: WinSer2008R2.mynet.dom: 123: mpiexec aborting job
1: WinSer2008R2.mynet.dom: 123
2: WinSer2008R2.mynet.dom: 123
3: WinSer2008R2.mynet.dom: 123
4: Win7: 123
5: Win7: 123
6: Win7: 123
7: Win7: 123

3) command: mpiexec -hosts 2 WinSer2008R2 4 Win7 4 -genv FOR_ICAF_STATUS launched -genv I_MPI_DEBUG +5 \\WinSer2008R2\CcpSpoolDir\coarray_samples\x64\Debug\coarray_samples.exe
Result: the program does its main work (8 lines Hello ) and hangs on both computers:

[3#6788:6184@WinSer2008R2] MPI startup(): shm and tcp data transfer modes
[2#6500:4260@WinSer2008R2] MPI startup(): shm and tcp data transfer modes
[0#5300:6652@WinSer2008R2] MPI startup(): shm and tcp data transfer modes
[1#5448:5812@WinSer2008R2] MPI startup(): shm and tcp data transfer modes
[5#7796:7544@Win7] MPI startup(): shm and tcp data transfer modes
[7#7920:7668@Win7] MPI startup(): shm and tcp data transfer modes
[6#3568:6996@Win7] MPI startup(): shm and tcp data transfer modes
[4#7816:7808@Win7] MPI startup(): shm and tcp data transfer modes
[5#7796:7544@Win7] MPI startup(): set domain to {2,3} on node Win7
[6#3568:6996@Win7] MPI startup(): set domain to {4,5} on node Win7
[3#6788:6184@WinSer2008R2] MPI startup(): process is pinned to CPU03 on node WinSer2008R2
[1#5448:5812@WinSer2008R2] MPI startup(): process is pinned to CPU01 on node WinSer2008R2
[2#6500:4260@WinSer2008R2] MPI startup(): process is pinned to CPU02 on node WinSer2008R2
[0#5300:6652@WinSer2008R2] MPI startup(): process is pinned to CPU00 on node WinSer2008R2
[7#7920:7668@Win7] MPI startup(): set domain to {6,7} on node Win7
[4#7816:7808@Win7] MPI startup(): set domain to {0,1} on node Win7

[0#5300:6652@WinSer2008R2] Rank Pid Node name Pin cpu
[0#5300:6652@WinSer2008R2] 0 5300 WinSer2008R2 0
[0#5300:6652@WinSer2008R2] 1 5448 WinSer2008R2 1
[0#5300:6652@WinSer2008R2] 2 6500 WinSer2008R2 2
[0#5300:6652@WinSer2008R2] 3 6788 WinSer2008R2 3
[0#5300:6652@WinSer2008R2] 4 7816 Win7 {0,1}
[0#5300:6652@WinSer2008R2] 5 7796 {2,3}
[0#5300:6652@WinSer2008R2] 6 3568 {4,5}
[0#5300:6652@WinSer2008R2] 7 7920 {6,7}
[0#5300:6652@WinSer2008R2] MPI startup(): I_MPI_DEBUG=+5
[0#5300:6652@WinSer2008R2] MPI startup(): NUMBER_OF_PROCESSORS=4
[0#5300:6652@WinSer2008R2] MPI startup(): PROCESSOR_IDENTIFIER=Intel64 Family 6 Model 23 Stepping 7, GenuineIntel

Hello from image 1 out of 8 total images
Hello from image 2 out of 8 total images
Hello from image 4 out of 8 total images
Hello from image 3 out of 8 total images
Hello from image 8 out of 8 total images
Hello from image 6 out of 8 total images
Hello from image 5 out of 8 total images
Hello from image 7 out of 8 total images

Ctrl^C gives:

mpiexec aborting job...

job aborted:
rank: node: exit code[: error message]
0: WinSer2008R2.mynet.dom: 123: mpiexec aborting job
1: WinSer2008R2.mynet.dom: 123
2: WinSer2008R2.mynet.dom: 123
3: WinSer2008R2.mynet.dom: 123
4: Win7: 123
5: Win7: 123
6: Win7: 123
7: Win7: 123


The"Allow"firewall rules to smpd and mpiexec do not help. I am using the native Windows firewall.

Please let me know whatshouldI do else?

Thanks

0 Kudos
James_T_Intel
Moderator
1,783 Views
Hi obmeninfor,

What happens if you run from the command line without mpiexec? I have not worked with coarrays before, but the sample does not run for me if I use mpiexec, it does run without it. This is only on a single computer, I will try it on multiple.

Sincerely,
James Tullos
Technical Consulting Engineer
Intel Cluster Tools
0 Kudos
obmeninfor
Beginner
1,783 Views

Hi James,

I.Below is the result of \\WinSer2008R2\CcpSpoolDir\coarray_samples\x64\Debug\coarray_samples.exe
from the command line. The program does its main work (8 lines Hello ) and hangs on both computers:

Hello from image 5 out of 8 total images
Hello from image 3 out of 8 total images
Hello from image 1 out of 8 total images
Hello from image 8 out of 8 total images
Hello from image 2 out of 8 total images
Hello from image 7 out of 8 total images
Hello from image 4 out of 8 total images
Hello from image 6 out of 8 total images

Ctrl^C gives:

mpiexec aborting job...
forrtl: error <200>: program aborting due to control-C event
In coarray image 1\nImage PCRoutine Line
Source
libifcoremdd.dll 00000000100E0407 Unknown Unknown Unknown
libifcoremdd.dll 00000000100DA252 Unknown Unknown Unknown
libifcoremdd.dll 00000000100C3261Unknown Unknown Unknown
libifcoremdd.dll 0000000010028316 Unknown Unknown Unknown
libifcoremdd.dll 000000001003BC54Unknown Unknown Unknown
kernel32.dll0000000076AA47C3Unknown Unknown Unknown
kernel32.dll0000000076A6652D Unknown Unknown Unknown
ntdll.dll0000000076CFC521Unknown Unknown Unknown

c:\Program Files\Microsoft HPC Pack 2008 R2\Data\SpoolDir\coarray_samples\x64\Debug>

job aborted:
rank: node: exit code[: error message]
0: WinSer2008R2.mynet.dom: 123: mpiexec aborting job
1: Win7: 123
2: WinSer2008R2.mynet.dom: 123
3: Win7: 123
4: WinSer2008R2.mynet.dom: 123
5: Win7: 123
6: WinSer2008R2.mynet.dom: 123
7: Win7: 123


II. About the-genv I_MPI_DEBUG +5 option in mpiexec -hosts 2 WinSer2008R2 3 Win7 3 -genv I_MPI_DEBUG +5 \\WinSer2008R2\CcpSpoolDir\test\x64\Debug\test.exe:why doesitcause the hanging up of the program on both computers?


Thanks

0 Kudos
James_T_Intel
Moderator
1,783 Views
Hi obmeninfor,

Setting I_MPI_DEBUG to 5 should not cause a hang. This is possibly indicative of a deeper problem.What are your environment variables (just run set in a command prompt)?

As a side note, I am able torun the coarray sample program on a pair of Windows* 7 virtual machines with no problems. I did have to specifically tell the firewall to allow the coarray program, but with the firewall blocking it the program would hang at start, not at exit.

I have used two different methods for compiling and running the program. The first was

[plain]ifort /Qcoarray=distributed /Qcoarray-num-images=8 hello_image.f90 -o hello_image1.exe
ifort /Qcoarray=distributed /Qcoarray-config-file=cafconfig.txt hello_image.f90 -o hello_image2.exe[/plain]

The file cafconfig.txt contained the following:
[plain]-n 8 -machinefile mpd.hosts hello_image2.exe[/plain]

And mpd.hosts contained the names of the two computers, one per line. FOR_COARRAY_MACHINEFILE was set to point to the mpd.hosts file. Both of these forms ran with no problems. Could you try compiling from the command line (just to be certain there are no stray flags causing a problem from VS)? Either of these methods should lead to the same result.

Sincerely,
JamesTullos
Technical Consulting Engineer
Intel Cluster Tools
0 Kudos
obmeninfor
Beginner
1,783 Views

Hi James,

The environment variables are set by c:\Program Files (x86)\Intel\icsxe\2012.0.033\bin\ictvars.bat. I have only added FOR_COARRAY_MACHINEFILE.

Unfortunately, I cant do without VS because "c:\Program Files (x86)\Intel\Composer XE 2011 SP1\bin\intel64\ifort" /Qcoarray=distributed /Qcoarray-num-images=8 hello_image.f90 -o hello_image.exe requires link which is in the VS directory only. So, I will continue my coarrays experiments with VS.

Thank you very much for your help.

obmeninfor
0 Kudos
James_T_Intel
Moderator
1,783 Views
Hi obmeninfor,

Running ictvars.bat should automatically set the Path to include link. If it does not, try running

[bash]C:Program Files (x86)Microsoft Visual Studio 10.0VCbinamd64vcvars64.bat[/bash]

Or the equivalentfor your desired architecture target. I need to do some more testing, but attempting to compile the coarray sample program in Visual Studio* 2010with distributed coarrays does not allow me to run across multiple computers at all.

Sincerely,
James Tullos
Technical Consulting Engineer
Intel Cluster Tools
0 Kudos
James_T_Intel
Moderator
1,783 Views
Hi obmeninfor,

My problem with running the executable from Visual Studio* was just that, my problem. The default for the sample is to compile 32-bit, and one of my test systems only had the 64-bit runtime libraries available. Once this was corrected (compiled 64-bit within VS), everything works as expected. So this is not likely to be the cause of what you are experiencing (though it would be prudent to verify that you do have the correct runtime libraries available for each system).

Sincerely,
James Tullos
Technical Consulting Engineer
Intel Cluster Tools
0 Kudos
James_T_Intel
Moderator
1,783 Views
Hi obmeninfor,

Let's take a look at the SMPD now. On each of the computers, run (as Administrator) the command

[plain]smpd -traceon [/plain]

You can name the logfile whatever you want, just make it distinct for each computer. This will turn on SMPD logging. Run the coarray_samples program. When it hangs and you've killed it, run

[plain]smpd -traceoff[/plain]

to turn logging off. Attach the two files and I'll see if there's anything in there that could help diagnose what's happening.

Sincerely,
James Tullos
Technical Consulting Engineer
Intel Cluster Tools

Edit note: edited to correct code type in first code section
0 Kudos
obmeninfor
Beginner
1,735 Views
Hi James,

Please see 3 files.

Thank you.
0 Kudos
Reply