Intel® MPI Library
Get help with building, analyzing, optimizing, and scaling high-performance computing (HPC) applications.
2226 Discussions

Unable to run bstrap_proxy error with intel-oneapi-mpi on Ubuntu 22.04.1

RMK-lfmd
Beginner
459 Views

We are developing a simple cluster of two nodes PC8(master node) and PC9 (slave node).
A passwordless login is enabled from PC8 to PC9
We don't have any job scheduler at this stage.
Our OS is Ubuntu 22.04.1 LTS
We have shared /nfstest with NFS and successfuly mounted on PC8.
We copied the intel toolkit and vasp software to the shared directory.
Then we initialized the setvars.sh environment from both nodes individually.
After that we tried to run the vasp with the command mpirun -np 2 -host 10.0.0.8: -host 10.0.0.9: /nfstest/vasp/vasp.5.4.4/bin/vasp_std
But it failed with the following error. Can someone help us please?

mpirun -np 2 -host 10.0.0.8: -host 10.0.0.9: /nfstest/vasp/vasp.5.4.4/bin/vasp_std
[mpiexec@PC8] check_exit_codes (../../../../../src/pm/i_hydra/libhydra/demux/hydra_demux_poll.c:117): unable to run bstrap_proxy on 10.0.0.8: (pid 2853, exit code 65280)
[mpiexec@PC8] poll_for_event (../../../../../src/pm/i_hydra/libhydra/demux/hydra_demux_poll.c:159): check exit codes error
[mpiexec@PC8] HYD_dmx_poll_wait_for_proxy_event (../../../../../src/pm/i_hydra/libhydra/demux/hydra_demux_poll.c:212): poll for event error
[mpiexec@PC8] HYD_bstrap_setup (../../../../../src/pm/i_hydra/libhydra/bstrap/src/intel/i_hydra_bstrap.c:1065): error waiting for event
[mpiexec@PC8] HYD_print_bstrap_setup_error_message (../../../../../src/pm/i_hydra/mpiexec/intel/i_mpiexec.c:1026): error setting up the bootstrap proxies
[mpiexec@PC8] Possible reasons:
[mpiexec@PC8] 1. Host is unavailable. Please check that all hosts are available.
[mpiexec@PC8] 2. Cannot launch hydra_bstrap_proxy or it crashed on one of the hosts. Make sure hydra_bstrap_proxy is available on all hosts and it has right permissions.
[mpiexec@PC8] 3. Firewall refused connection. Check that enough ports are allowed in the firewall and specify them with the I_MPI_PORT_RANGE variable.
[mpiexec@PC8] 4. Ssh bootstrap cannot launch processes on remote host. Make sure that passwordless ssh connection is established across compute hosts.
[mpiexec@PC8]    You may try using -bootstrap option to select alternative launcher.


our fstab file for NFS is in PC9 is:
# /etc/fstab: static file system information.
#
# Use 'blkid' to print the universally unique identifier for a
# device; this may be used with UUID= as a more robust way to name devices
# that works even if disks are added and removed. See fstab(5).
#
# <file system> <mount point>   <type>  <options>       <dump>  <pass>
# / was on /dev/nvme0n1p3 during installation
UUID=57a614ce-bbd0-4190-9695-bf8bdcbe21c2 /               ext4    errors=remount-ro 0       1
# /boot/efi was on /dev/nvme0n1p1 during installation
UUID=2A53-01A6  /boot/efi       vfat    umask=0077      0       1
/swapfile                                 none            swap    sw              0       0
10.0.0.8:/nfsshare /nfsshare nfs auto,noatime,nolock,bg,nfsvers=3,intr,tcp,actimeo=1800 0 0





while at /etc/exports in PC8 we have:
# /etc/exports: the access control list for filesystems which may be exported
# to NFS clients.  See exports(5).
#
# Example for NFSv2 and NFSv3:
# /srv/homes       hostname1(rw,sync,no_subtree_check) hostname2(ro,sync,no_subtree_check)
#
# Example for NFSv4:
# /srv/nfs4        gss/krb5i(rw,sync,fsid=0,crossmnt,no_subtree_check)
# /srv/nfs4/homes  gss/krb5i(rw,sync,no_subtree_check)
#
#/nfstest 10.0.0.9(rw,sync)

Labels (1)
0 Kudos
1 Reply
TobiasK
Moderator
332 Views

@RMK-lfmd 
Please change your commandline.
mpirun -np 2 -host 10.0.0.8: -host 10.0.0.9: /nfstest/vasp/vasp.5.4.4/bin/vasp_std
the ":" enables MPMD lauch where you need to specify a binary for each set:
https://www.intel.com/content/www/us/en/docs/mpi-library/developer-guide-linux/2021-13/mpmd-launch-mode.html

In your case, you don't need that, just use a comma separated list of hosts:
mpirun -np 2 -host 10.0.0.8,10.0.0.9 /nfstest/vasp/vasp.5.4.4/bin/vasp_std
https://www.intel.com/content/www/us/en/docs/mpi-library/developer-guide-linux/2021-13/controlling-process-placement.html

Please make sure that you set up correct FQDN, IP addresses might not be enough. Please make sure that you can connect from .8 to .9 but also from .9 to .8

 

Initializing the environment on the remote node is of no use, the environment is local to your shell, if you do not put it into the initialization of the shell. mpirun will just propagate the environment so the paths have to be identical.

Please try the functionality of MPI with something simple like
mpirun -np 2 -ppn 1 -hosts a,b IMB-MPI1

0 Kudos
Reply