Intel® oneAPI HPC Toolkit
Get help with building, analyzing, optimizing, and scaling high-performance computing (HPC) applications.
Announcements
The Intel sign-in experience has changed to support enhanced security controls. If you sign in, click here for more information.
2020 Discussions

2 nodes on Windows 7 x64 assistance

jimdempseyatthecove
Black Belt
748 Views

Installed Parallel Studio XE 2016 Update 2 Cluster edition on Windows 7 Pro x64 (installed on two systems) and attempting to run the MPI test program.

Each system can run the test program on itself, but I cannot run on other system from either system.

hydra_service.exe can be -remove and -install correctly.

I can see and copy files from each system to the other system (to shared folder), test program installed in shared folder.

C:\Test\TestMPI_CPP\TestMPI_CPP>mpiexec.exe -n 8 -hosts thor \Downloads\testmpi_cpp.exe
Hello world: rank 0 of 8 running on Thor
Hello world: rank 1 of 8 running on Thor
Hello world: rank 2 of 8 running on Thor
Hello world: rank 3 of 8 running on Thor
Hello world: rank 4 of 8 running on Thor
Hello world: rank 5 of 8 running on Thor
Hello world: rank 6 of 8 running on Thor
Hello world: rank 7 of 8 running on Thor

C:\Test\TestMPI_CPP\TestMPI_CPP>mpiexec.exe -n 8 -hosts i72600k \Downloads\testmpi_cpp.exe
Error connecting to the Service
[mpiexec@Thor] ..\hydra\utils\sock\sock.c (270): unable to connect from "Thor" to "i72600k" (No error)

C:\Test\TestMPI_CPP\TestMPI_CPP>dir \\i72600k\Downloads\TestMPI_CPP.exe /b
TestMPI_CPP.exe

The results above are same (with name changes) when run on other host.

I've run mpiexec -register on both systems.

Any hints would be helpful

Jim Dempsey

0 Kudos
9 Replies
James_T_Intel
Moderator
748 Views

Do you have any firewalls between the two systems?  Can each ping the other by name?

jimdempseyatthecove
Black Belt
748 Views

Yes. Both can ping each other by name.

This would have been self-evident by DIR \\name\folder working.

Windows Firewall on both systems have checks on

Intel(R) MPI Library Process Manager, Intel

Anything else need to be enabled? (added)

Jim Dempsey

 

James_T_Intel
Moderator
748 Views

You will probably need to set an exception for the program you are trying to run as well.  See https://software.intel.com/en-us/articles/firewalls-and-mpi for general steps for working with a firewall.

jimdempseyatthecove
Black Belt
748 Views

James,

After a bit of work, I had to add a Firewall exception to allow port 8679 incoming TCP packets.

At least the sample test program now runs.

Jim Dempsey

jimdempseyatthecove
Black Belt
748 Views

Well this is interesting

While on system A I can mpiexec run on system A or B
and
While on system B I can mpiexec run on system B or A

From either, I cannot run on both

The documentation gives conflicting information as to the command line for multiple hosts.

Can you provide a correct command line? (run Foo.exe on systems A and B)

Jim Dempsey

James_T_Intel
Moderator
748 Views

There are several different ways to do that.  Here are a few examples, see https://software.intel.com/en-us/articles/controlling-process-placement-with-the-intel-mpi-library for more.  The article is aimed at Linux*, simply substitute mpiexec.hydra for mpirun to run in Windows*.

>mpiexec.hydra -n 2 -ppn 1 -hosts sysA,sysB Foo.exe
>type hosts.txt
sysA
sysB
>mpiexec.hydra -n 2 -ppn 1 -f hosts.txt Foo.exe
>type machines.txt
sysA:1
sysB:1
>mpiexec.hydra -n 2 -machinefile machines.txt Foo.exe

 

jimdempseyatthecove
Black Belt
748 Views

The examples shows using mpiexec, not mpiexec.hydra.

My results:

C:\Program Files (x86)\IntelSWTools>mpiexec.hydra -n 16 -ppn 8 -hosts Thor,i72600k \Downloads\TestMPI_CPP.exe
Error connecting to the Service
[mpiexec@Thor] ..\hydra\utils\sock\sock.c (270): unable to connect from "Thor" to "i72600k" (No error)


C:\Program Files (x86)\IntelSWTools>mpiexec.hydra -n 16 -ppn 8 -hosts Thor,i72600k \Downloads\TestMPI_CPP.exe
Hello world: rank 0 of 16 running on Thor
Hello world: rank 1 of 16 running on Thor
Hello world: rank 2 of 16 running on Thor
Hello world: rank 3 of 16 running on Thor
Hello world: rank 4 of 16 running on Thor
Hello world: rank 5 of 16 running on Thor
Hello world: rank 6 of 16 running on Thor
Hello world: rank 7 of 16 running on Thor
Hello world: rank 8 of 16 running on i72600K
Hello world: rank 9 of 16 running on i72600K
Hello world: rank 10 of 16 running on i72600K
Hello world: rank 11 of 16 running on i72600K
Hello world: rank 12 of 16 running on i72600K
Hello world: rank 13 of 16 running on i72600K
Hello world: rank 14 of 16 running on i72600K
Hello world: rank 15 of 16 running on i72600K

Note, for some reason first attempt failed, second succeeded. Third attempt succeeded.

Any thoughts?

Jim Dempsey

James_T_Intel
Moderator
748 Views

Do you have anything in the Windows Event Viewer indicating problems at the time of the failed run.

jimdempseyatthecove
Black Belt
748 Views

If that shows up again, I will look in the logs (both systems).

I think that may have occurred the first time after a re-boot of one of the systems. I haven't seen it since.

I may also have to re-enable the Firewall logging too.

Jim Dempsey

Reply