Intel® MPI Library
Get help with building, analyzing, optimizing, and scaling high-performance computing (HPC) applications.
2255 Discussions

Intel MPI multinode jobs running problems

Jenya
Beginner
1,470 Views

Hello,

I have a four-node cluster connected by infiniband switch. I have a common NFS home directory for each node. The startup sequence for each node (bashrc, etc.) is thus identical. Let's call the nodes, node1, node2, and node3. I have disabled the firewalls on each server (which are running OS version of Rocky 8.7).
We have a PBS PRO (pbs_version = 2022.1.4.20231010124201
) queuing system and when we run jobs on a single node everything works fine, but when we try to run jobs in multinode mode we get the following errors:
check_exit_codes (../../../../../src/pm/i_hydra/libhydra/demux/hydra_demux_poll.c:117): unable to run bstrap_proxy on node1
poll_for_event (../../../../../src/pm/i_hydra/libhydra/demux/hydra_demux_poll.c:159): check exit codes error
HYD_dmx_poll_wait_for_proxy_event (../../../../../src/pm/i_hydra/libhydra/demux/hydra_demux_poll.c:212): poll for event error
HYD_bstrap_setup (../../../../../src/pm/i_hydra/libhydra/bstrap/src/intel/i_hydra_bstrap.c:1065): error waiting for event
HYD_print_bstrap_setup_error_message (../../../../../src/pm/i_hydra/mpiexec/intel/i_mpiexec.c:1026): error setting up the bootstrap proxies

Also in cluster Intel® oneAPI HPC Toolkit 2023 installed.

I would be happy to provide additional information if needed.

0 Kudos
3 Replies
VeenaJ_Intel
Moderator
1,403 Views

Hi,

 

Thanks for posting in Intel communities!

 

Can you please try setting up the following Environment variable:

 

export I_MPI_HYDRA_IFACE="ib0"

 

After implementing this change, kindly execute the process in a multinode environment and share the results with us.

 

Additionally, we kindly request the following details for further investigation:

 

  • Reproducer code.
  • Recreation steps.
  • Interconnect hardware details.
  • FI_PROVIDER information.
  • Logs generated after running the Intel® MPI Benchmark (IMB) with the same number of nodes as in the case where the issue occurred.

 

We thank you in advance for your cooperation.

 

Regards,

Veena

 

0 Kudos
Jenya
Beginner
1,350 Views

Hi Veena,

We are going to check what you wrote earlier and provide you with the results.

BR,

Jenya.

0 Kudos
TobiasK
Moderator
1,158 Views

@Jenya


in case you still have problems, please open a new thread.


0 Kudos
Reply