Intel® MPI Library
Get help with building, analyzing, optimizing, and scaling high-performance computing (HPC) applications.
2178 Discussions

Hydra boostrap: ssh vs pbsdsh (21.4)

Viet-Duc
Novice
2,932 Views

Versions:

- Intel oneAPI HPC Toolkits 21.4

- PBS version: 2020.1.3

- OS: CentOS Linux release 7.7.1908 (Core)

I would like to echo the issue that other users are having with multi-node jobs (oneAPI HPC v21.4).

The error is as follow:

[mpiexec@node8103] check_exit_codes
(../../../../../src/pm/i_hydra/libhydra/demux/hydra_demux_poll.c:117): unable to run bstrap_proxy on node8104 (pid 65308, exit code 256
[mpiexec@node8103] Possible reasons: 
[mpiexec@node8103] 1. Host is unavailable. Please check that all hosts are available. 
[mpiexec@node8103] 2. Cannot launch hydra_bstrap_proxy or it crashed on one of the hosts. Make sure hydra_bstrap_proxy is available on all hosts and it has right permissions. 
[mpiexec@node8103] 3. Firewall refused connection. Check that enough ports are allowed in the firewall and specify them with the I_MPI_PORT_RANGE variable. 
[mpiexec@node8103] 4. pbs bootstrap cannot launch processes on remote host. You may try using -bootstrap option to select alternative launcher. 

With I_MPI_HYDRA_DEBUG=1:

/apps/compiler/intel/oneapi_21.4/mpi/2021.4.0/bin//hydra_bstrap_proxy --upstream-host node8103 --upstream-port 39812 --pgid 0 --launcher pbs --launcher-number 5 --base-path /apps/compiler/intel/oneapi_21.4/mpi/2021.4.0/bin/ --tree-width 2 --tree-level 1 --time-left -1 --launch-type 2 --debug --proxy-id 0 --node-id 0 --subtree-size 1 --upstream-fd 7 /apps/compiler/intel/oneapi_21.4/mpi/2021.4.0/bin//hydra_pmi_proxy --usize -1 --auto-cleanup 1 --abort-signal 9 

Here '--launcher pbs' caused the aforementioned bootstrap error. The issue can be solved by setting : I_MPI_HYDRA_BOOTSTRAP=ssh, which is the default according to documentation.

Thus:

- 2021.3: both pbsdsh and ssh works as hydra launcher 

- 2021.4: only ssh works as launcher. It could be a problem with either PBS or Intel MPI

 

My questions are:

- Is there a minimal version requirement for PBS ?

- Will there be a performance degradation when forcing 'ssh' as launcher ?

 

Thanks.

 

 

Labels (2)
0 Kudos
5 Replies
SantoshY_Intel
Moderator
2,906 Views

Hi,


Thanks for reaching out to us.


Could you please specify which job scheduler(Altair PBS Pro or OpenPBS) you are using?

Also, could you please provide the command you used for checking the PBS version?


Thanks & Regards,

Santosh


0 Kudos
Viet-Duc
Novice
2,901 Views

Hi,

 

It is PBS Pro version. 

I checked version using the following command:

$  qsub --version 
pbs_version = 2020.1.3.20210315160738

 

0 Kudos
SantoshY_Intel
Moderator
2,880 Views

Hi,

 

>>"Will there be a performance degradation when forcing 'ssh' as launcher ?"

There will be no effect on performance if we change the launcher to ssh.

 

>>"Is there a minimal version requirement for PBS ?"

We are working on your issue internally and will get back to you soon.

 

Thanks & Regards,

Santosh

 

0 Kudos
James_T_Intel
Moderator
2,871 Views

Please confirm if you encounter the same error with version 2021.5.


0 Kudos
James_T_Intel
Moderator
2,549 Views

Due to lack of reply, this case is closed for Intel support. Any further discussion on this thread will be considered community only.


0 Kudos
Reply