Intel® oneAPI HPC Toolkit
Get help with building, analyzing, optimizing, and scaling high-performance computing (HPC) applications.
1986 Discussions

Hydra boostrap: ssh vs pbsdsh (21.4)

Viet-Duc
Novice
733 Views

Versions:

- Intel oneAPI HPC Toolkits 21.4

- PBS version: 2020.1.3

- OS: CentOS Linux release 7.7.1908 (Core)

I would like to echo the issue that other users are having with multi-node jobs (oneAPI HPC v21.4).

The error is as follow:

[mpiexec@node8103] check_exit_codes
(../../../../../src/pm/i_hydra/libhydra/demux/hydra_demux_poll.c:117): unable to run bstrap_proxy on node8104 (pid 65308, exit code 256
[mpiexec@node8103] Possible reasons: 
[mpiexec@node8103] 1. Host is unavailable. Please check that all hosts are available. 
[mpiexec@node8103] 2. Cannot launch hydra_bstrap_proxy or it crashed on one of the hosts. Make sure hydra_bstrap_proxy is available on all hosts and it has right permissions. 
[mpiexec@node8103] 3. Firewall refused connection. Check that enough ports are allowed in the firewall and specify them with the I_MPI_PORT_RANGE variable. 
[mpiexec@node8103] 4. pbs bootstrap cannot launch processes on remote host. You may try using -bootstrap option to select alternative launcher. 

With I_MPI_HYDRA_DEBUG=1:

/apps/compiler/intel/oneapi_21.4/mpi/2021.4.0/bin//hydra_bstrap_proxy --upstream-host node8103 --upstream-port 39812 --pgid 0 --launcher pbs --launcher-number 5 --base-path /apps/compiler/intel/oneapi_21.4/mpi/2021.4.0/bin/ --tree-width 2 --tree-level 1 --time-left -1 --launch-type 2 --debug --proxy-id 0 --node-id 0 --subtree-size 1 --upstream-fd 7 /apps/compiler/intel/oneapi_21.4/mpi/2021.4.0/bin//hydra_pmi_proxy --usize -1 --auto-cleanup 1 --abort-signal 9 

Here '--launcher pbs' caused the aforementioned bootstrap error. The issue can be solved by setting : I_MPI_HYDRA_BOOTSTRAP=ssh, which is the default according to documentation.

Thus:

- 2021.3: both pbsdsh and ssh works as hydra launcher 

- 2021.4: only ssh works as launcher. It could be a problem with either PBS or Intel MPI

 

My questions are:

- Is there a minimal version requirement for PBS ?

- Will there be a performance degradation when forcing 'ssh' as launcher ?

 

Thanks.

 

 

Labels (2)
0 Kudos
5 Replies
SantoshY_Intel
Moderator
707 Views

Hi,


Thanks for reaching out to us.


Could you please specify which job scheduler(Altair PBS Pro or OpenPBS) you are using?

Also, could you please provide the command you used for checking the PBS version?


Thanks & Regards,

Santosh


Viet-Duc
Novice
702 Views

Hi,

 

It is PBS Pro version. 

I checked version using the following command:

$  qsub --version 
pbs_version = 2020.1.3.20210315160738

 

SantoshY_Intel
Moderator
681 Views

Hi,

 

>>"Will there be a performance degradation when forcing 'ssh' as launcher ?"

There will be no effect on performance if we change the launcher to ssh.

 

>>"Is there a minimal version requirement for PBS ?"

We are working on your issue internally and will get back to you soon.

 

Thanks & Regards,

Santosh

 

James_T_Intel
Moderator
672 Views

Please confirm if you encounter the same error with version 2021.5.


James_T_Intel
Moderator
350 Views

Due to lack of reply, this case is closed for Intel support. Any further discussion on this thread will be considered community only.


Reply