Intel® MPI Library
Get help with building, analyzing, optimizing, and scaling high-performance computing (HPC) applications.
公告
FPGA community forums and blogs on community.intel.com are migrating to the new Altera Community and are read-only. For urgent support needs during this transition, please visit the FPGA Design Resources page or contact an Altera Authorized Distributor.

MPI_HYDRA_BOOTSTRAP issue

youn__kihang
新手
4,709 次查看

Hello,

I am submitting a job through LSF Scheduler, and ssh setting is blocked with the nologin setting.
I am trying to connect LSF blaunch command, then I found that options associated with LSF.
Changing the I_MPI_HYDRA_BOOTSTRAP option from ssh to lsf seems to be solved.
But I tested the next four intel mpi, but I feel like it doesn't apply properly to the two x-marked mpi libraries.

2018.4.274: O
2019.2.187: X
2019.4.243: X
2019.5.281: O

Let me know if I'm missing something.

The options that I tried is below.

export I_MPI_HYDRA_BOOTSTRAP=lsf
export I_MPI_HYDRA_BOOTSTRAP_EXEC=lsf
export I_MPI_HYDRA_BOOTSTRAP_EXEC_EXTRA_ARGS=lsf
export I_MPI_HYDRA_RMK=lsf

And the error messesgs is below.

check_exit_codes (../hydra_demux_poll.c): unable to run proxy on hostname
poll_for_event (): check exit codes error
HYD_dmx_poll_wait_for_proxy_event (): poll for event error
HYD_bstrap_setup (): error waiting for event
main (): error setting up the boostrap proxies

Thanks

0 项奖励
3 回复数
youn__kihang
新手
4,709 次查看

I found a Q&A thread for the same issue as ours in the forum.
This was known issue and fixed in 2019u5, so we decide not to use 2019u2 and 2019u4.
Thanks all.
 

The original thread is attached.

https://software.intel.com/en-us/forums/intel-clusters-and-hpc-technology/topic/814696

0 项奖励
PrasanthD_intel
主持人
4,709 次查看

Hi Kihang,

Glad to know that you have got the information that you were looking for.
The issue has been fixed in the latest versions.
We are closing this thread. 
Please connect to us incase of any further queries.

 

Thanks 

Prasanth

0 项奖励
Shaikh__Samir
初学者
3,943 次查看

Hi,

I'm getting a similar issue with intel-2020.4.304 only for a large number of nodes (say 64+) on Cascadalake.

 

[mpiexec@cn001] check_exit_codes (../../../../../src/pm/i_hydra/libhydra/demux/hydra_demux_poll.c:121): unable to run bstrap_proxy (pid 5695, exit code 256)
[mpiexec@cn001] poll_for_event (../../../../../src/pm/i_hydra/libhydra/demux/hydra_demux_poll.c:159): check exit codes error
[mpiexec@cn001] HYD_dmx_poll_wait_for_proxy_event (../../../../../src/pm/i_hydra/libhydra/demux/hydra_demux_poll.c:212): poll for event error
[mpiexec@cn001] HYD_bstrap_setup (../../../../../src/pm/i_hydra/libhydra/bstrap/src/intel/i_hydra_bstrap.c:772): error waiting for event
[mpiexec@cn001] main (../../../../../src/pm/i_hydra/mpiexec/mpiexec.c:1938): error setting up the boostrap proxies

0 项奖励
回复