Intel® MPI Library
Get help with building, analyzing, optimizing, and scaling high-performance computing (HPC) applications.

MPI code runs just on one core, problem with hydra service

sebastian_d_
Beginner
2,338 Views

I am trying to run the simple hello world code in fortran using Intel MPI library. But all cores have the same rank, as if the program does not run on more than one core. I was following the troubleshooting procedures provided by Intel (Point 2 - https://software.intel.com/en-us/mpi-developer-guide-windows-troubleshooting), and I got this:

C:\Program Files (x86)\IntelSWTools>mpiexec -ppn 1 -n 2 -hosts node01,node02 hostname
[mpiexec@Sebastian-PC] HYD_sock_connect (..\windows\src\hydra_sock.c:216): getaddrinfo returned error 11001
[mpiexec@Sebastian-PC] HYD_connect_to_service (bstrap\service\service_launch.c:76): unable to connect to service at node01:8680
[mpiexec@Sebastian-PC] HYDI_bstrap_service_launch (bstrap\service\service_launch.c:416): unable to connect to hydra service
[mpiexec@Sebastian-PC] launch_bstrap_proxies (bstrap\src\intel\i_hydra_bstrap.c:525): error launching bstrap proxy
[mpiexec@Sebastian-PC] HYD_bstrap_setup (bstrap\src\intel\i_hydra_bstrap.c:714): unable to launch bstrap proxy
[mpiexec@Sebastian-PC] wmain (mpiexec.c:1919): error setting up the boostrap proxies

Any ideas how to fix it? Any help would be appreciated.

 

0 Kudos
2 Replies
Anatoliy_R_Intel
Employee
2,338 Views

Hello,

 

Did you install hydra_service? Could you find it in 'Services'?

--

Best regards, Anatoliy

0 Kudos
Tco1
Beginner
2,315 Views

Hello,

We are facing the same symptoms (not sure the underlying problem is the same).

  • Using a simple case: "c:\path\to\mpiexec.exe -n 4 hostname" works.
  • Using a simple case: "c:\path\to\mpiexec.exe -n 4 -hosts the_hostname  hostname" does not work (error 11001 same stack trace as described earlier). We tried to use different hostname: "the_hostname", "the_hostname.domain.ext", "localhost"...  

- The ping work with any of the valid hostname ("the_hostname", "the_hostname.domain.ext", "localhost")
- The hydra service is well running (was installed through hydra_service -install).
- user credentials are well registered and mpiexec -validate returns "success"
- Ports are not blocked (no firewall)
- We tried adding loopback in the hosts file but without success

The stacktrace seems to point to a name resolution issue "getaddrinfo returned error 11001". Is there any specific network / DNS settings we should look at? Does anyone have recommendation on how to solve that issue?
(We are using MPI 2019.0.4)

Any hint would be helpful...

0 Kudos
Reply