- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Versions:
- Intel oneAPI HPC Toolkits 21.4
- PBS version: 2020.1.3
- OS: CentOS Linux release 7.7.1908 (Core)
I would like to echo the issue that other users are having with multi-node jobs (oneAPI HPC v21.4).
The error is as follow:
[mpiexec@node8103] check_exit_codes
(../../../../../src/pm/i_hydra/libhydra/demux/hydra_demux_poll.c:117): unable to run bstrap_proxy on node8104 (pid 65308, exit code 256
[mpiexec@node8103] Possible reasons:
[mpiexec@node8103] 1. Host is unavailable. Please check that all hosts are available.
[mpiexec@node8103] 2. Cannot launch hydra_bstrap_proxy or it crashed on one of the hosts. Make sure hydra_bstrap_proxy is available on all hosts and it has right permissions.
[mpiexec@node8103] 3. Firewall refused connection. Check that enough ports are allowed in the firewall and specify them with the I_MPI_PORT_RANGE variable.
[mpiexec@node8103] 4. pbs bootstrap cannot launch processes on remote host. You may try using -bootstrap option to select alternative launcher.
With I_MPI_HYDRA_DEBUG=1:
/apps/compiler/intel/oneapi_21.4/mpi/2021.4.0/bin//hydra_bstrap_proxy --upstream-host node8103 --upstream-port 39812 --pgid 0 --launcher pbs --launcher-number 5 --base-path /apps/compiler/intel/oneapi_21.4/mpi/2021.4.0/bin/ --tree-width 2 --tree-level 1 --time-left -1 --launch-type 2 --debug --proxy-id 0 --node-id 0 --subtree-size 1 --upstream-fd 7 /apps/compiler/intel/oneapi_21.4/mpi/2021.4.0/bin//hydra_pmi_proxy --usize -1 --auto-cleanup 1 --abort-signal 9
Here '--launcher pbs' caused the aforementioned bootstrap error. The issue can be solved by setting : I_MPI_HYDRA_BOOTSTRAP=ssh, which is the default according to documentation.
Thus:
- 2021.3: both pbsdsh and ssh works as hydra launcher
- 2021.4: only ssh works as launcher. It could be a problem with either PBS or Intel MPI
My questions are:
- Is there a minimal version requirement for PBS ?
- Will there be a performance degradation when forcing 'ssh' as launcher ?
Thanks.
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Thanks for reaching out to us.
Could you please specify which job scheduler(Altair PBS Pro or OpenPBS) you are using?
Also, could you please provide the command you used for checking the PBS version?
Thanks & Regards,
Santosh
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
It is PBS Pro version.
I checked version using the following command:
$ qsub --version
pbs_version = 2020.1.3.20210315160738
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
>>"Will there be a performance degradation when forcing 'ssh' as launcher ?"
There will be no effect on performance if we change the launcher to ssh.
>>"Is there a minimal version requirement for PBS ?"
We are working on your issue internally and will get back to you soon.
Thanks & Regards,
Santosh
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Please confirm if you encounter the same error with version 2021.5.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Due to lack of reply, this case is closed for Intel support. Any further discussion on this thread will be considered community only.
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page