Intel® MPI Library
Get help with building, analyzing, optimizing, and scaling high-performance computing (HPC) applications.
2159 Discussions

Re: running intel oneAPI MPI test.f90 example on multiple nodes

iirwan
Beginner
950 Views

Hi,

 

I had the same issue and I've been following this thread.

 

My current setup is:
- 1 Windows Server PC (ADAS-DC1)
- 3 Windows Pro PC (ADAS-MS1, ADAS-MS2, ADAS-MS3)
- Same oneAPI and MPI Library version (2021.5.0)
- All PC connected by LAN under the same domain
- All firewalls are off.
- All hydra_services are running.
- All pings to each of these pc are working.

 

I followed the instruction from this link.

I tried to run "mpiexec -validate" on all the pc and it runs successfully.
I don't have any problem running the mpiexec to run just by using one host only (these commands below were ran from ADAS-MS3).


C:\Program Files (x86)\Intel\oneAPI\mpi\2021.5.0\test>mpiexec -np 2 -ppn 2 -host ADAS-MS1 test.exe
Hello world: rank 0 of 2 running on ADAS-MS1
Hello world: rank 1 of 2 running on ADAS-MS1

C:\Program Files (x86)\Intel\oneAPI\mpi\2021.5.0\test>mpiexec -np 2 -ppn 2 -host ADAS-MS2 test.exe
Hello world: rank 0 of 2 running on ADAS-MS2
Hello world: rank 1 of 2 running on ADAS-MS2

C:\Program Files (x86)\Intel\oneAPI\mpi\2021.5.0\test>mpiexec -np 2 -ppn 2 -host ADAS-MS3 test.exe
Hello world: rank 0 of 2 running on ADAS-MS3
Hello world: rank 1 of 2 running on ADAS-MS3

 

But when I use more than one host, the command line stucked for a very long time with no result.
So I decided to use the set I_MPI_DEBUG=20 as mentioned above and here is what I got:

 

C:\Program Files (x86)\Intel\oneAPI\mpi\2021.5.0\test>set I_MPI_DEBUG=20

C:\Program Files (x86)\Intel\oneAPI\mpi\2021.5.0\test>mpiexec -np 3 -ppn 1 -hosts ADAS-MS1,ADAS-MS2,ADAS-MS3 test.exe
[0] MPI startup(): Intel(R) MPI Library, Version 2021.5 Build 20211102
[0] MPI startup(): Copyright (C) 2003-2021 Intel Corporation. All rights reserved.
[0] MPI startup(): library kind: release
[0] MPI startup(): libfabric version: 1.13.2-impi
libfabric:2668:core:core:ofi_hmem_init():209<info> Hmem iface FI_HMEM_CUDA not supported
libfabric:2668:core:core:ofi_hmem_init():209<info> Hmem iface FI_HMEM_ROCR not supported
libfabric:2668:core:core:ofi_hmem_init():209<info> Hmem iface FI_HMEM_ZE not supported
libfabric:2668:netdir:core:ofi_nd_startup():602<info> ofi_nd_startup: starting initialization
libfabric:2668:core:core:ofi_register_provider():474<info> registering provider: netdir (113.20)
libfabric:2668:core:core:ofi_register_provider():474<info> registering provider: ofi_rxm (113.20)
libfabric:2668:core:core:ofi_register_provider():474<info> registering provider: sockets (113.20)
libfabric:2668:core:core:ofi_register_provider():474<info> registering provider: tcp (113.20)
libfabric:2668:core:core:ofi_register_provider():474<info> registering provider: ofi_hook_perf (113.20)
libfabric:2668:core:core:ofi_register_provider():474<info> registering provider: ofi_hook_noop (113.20)
libfabric:2668:core:core:fi_getinfo():1138<info> Found provider with the highest priority tcp, must_use_util_prov = 1
libfabric:2668:core:core:fi_getinfo():1138<info> Found provider with the highest priority tcp, must_use_util_prov = 1
libfabric:2668:core:core:ofi_layering_ok():1001<info> Need core provider, skipping ofi_rxm
libfabric:2668:core:core:fi_getinfo():1161<info> Since tcp can be used, netdir has been skipped. To use netdir, please, set FI_PROVIDER=netdir
libfabric:2668:core:core:fi_getinfo():1161<info> Since tcp can be used, sockets has been skipped. To use sockets, please, set FI_PROVIDER=sockets
libfabric:2668:core:core:fi_getinfo():1138<info> Found provider with the highest priority tcp, must_use_util_prov = 1
libfabric:2668:core:core:ofi_layering_ok():1001<info> Need core provider, skipping ofi_rxm
libfabric:2668:core:core:fi_getinfo():1161<info> Since tcp can be used, netdir has been skipped. To use netdir, please, set FI_PROVIDER=netdir
libfabric:2668:core:core:fi_getinfo():1161<info> Since tcp can be used, sockets has been skipped. To use sockets, please, set FI_PROVIDER=sockets
libfabric:2668:core:core:fi_getinfo():1138<info> Found provider with the highest priority tcp, must_use_util_prov = 1
libfabric:2668:core:core:ofi_layering_ok():1001<info> Need core provider, skipping ofi_rxm
libfabric:2668:core:core:fi_getinfo():1161<info> Since tcp can be used, netdir has been skipped. To use netdir, please, set FI_PROVIDER=netdir
libfabric:2668:core:core:fi_getinfo():1161<info> Since tcp can be used, sockets has been skipped. To use sockets, please, set FI_PROVIDER=sockets
libfabric:2668:core:core:fi_getinfo():1138<info> Found provider with the highest priority tcp, must_use_util_prov = 1
libfabric:2668:core:core:ofi_layering_ok():1001<info> Need core provider, skipping ofi_rxm
libfabric:2668:core:core:fi_getinfo():1161<info> Since tcp can be used, netdir has been skipped. To use netdir, please, set FI_PROVIDER=netdir
libfabric:2668:core:core:fi_getinfo():1161<info> Since tcp can be used, sockets has been skipped. To use sockets, please, set FI_PROVIDER=sockets
libfabric:2668:core:core:fi_getinfo():1161<info> Since tcp can be used, netdir has been skipped. To use netdir, please, set FI_PROVIDER=netdir
libfabric:2668:core:core:fi_getinfo():1161<info> Since tcp can be used, sockets has been skipped. To use sockets, please, set FI_PROVIDER=sockets
libfabric:2668:core:core:fi_getinfo():1138<info> Found provider with the highest priority tcp, must_use_util_prov = 1
libfabric:2668:core:core:fi_getinfo():1123<warn> Can't find provider with the highest priority
libfabric:2668:core:core:fi_getinfo():1138<info> Found provider with the highest priority tcp, must_use_util_prov = 1
libfabric:2668:core:core:fi_getinfo():1161<info> Since tcp can be used, netdir has been skipped. To use netdir, please, set FI_PROVIDER=netdir
libfabric:2668:core:core:fi_getinfo():1161<info> Since tcp can be used, sockets has been skipped. To use sockets, please, set FI_PROVIDER=sockets
libfabric:2668:core:core:fi_getinfo():1138<info> Found provider with the highest priority tcp, must_use_util_prov = 1
libfa[0] MPI startup(): libfabric provider: tcp;ofi_rxm
[0] MPI startup(): max_ch4_vcis: 1, max_reg_eps 64, enable_sep 0, enable_shared_ctxs 0, do_av_insert 1
[0] MPI startup(): addrnamelen: zu
[0] MPI startup(): File "C:\Program Files (x86)\Intel\oneAPI\mpi\latest\env\../etc/tuning_skx_shm-ofi_tcp-ofi-rxm.dat" not found
[0] MPI startup(): Load tuning file: "C:\Program Files (x86)\Intel\oneAPI\mpi\latest\env\../etc/tuning_skx_shm-ofi.dat"

The running program stuck at that particular step and it always behave that way every time I run the mpiexec using multiple PC for test file only.

Then I tried to run using simpler program (hostname) and it works for multiple PC. Here's what I got:


C:\Program Files (x86)\Intel\oneAPI\mpi\2021.5.0\test>mpiexec -np 3 -ppn 1 -hosts ADAS-MS1,ADAS-MS2,ADAS-MS3 hostname
ADAS-MS1
ADAS-MS2
ADAS-MS3

In this case, why my test.exe program cannot work in multiple nodes?

 

Any help will be much appreciated.

 

Thanks

0 Kudos
3 Replies
HemanthCH_Intel
Moderator
898 Views

Hi,

 

Thanks for reaching out to us.

 

Could you please try with the below mentioned commands:

 

set FI_Provider=sockets

mpiexec -np 3 -ppn 1 -hosts ADAS-MS1,ADAS-MS2,ADAS-MS3 test.exe

If still the issue persists, then please provide the debug log by using the below commands:

set I_MPI_DEBUG=30
set FI_LOG_LEVEL=debug
mpiexec -np 3 -ppn 1 -hosts ADAS-MS1,ADAS-MS2,ADAS-MS3 test.exe

Thanks & Regards,

Hemanth.

 

0 Kudos
HemanthCH_Intel
Moderator
864 Views

Hi,


We have not heard back from you. Could you please confirm whether your is fixed or not? If you are still facing any issues, then please provide the debug log.


Thanks & Regards,

Hemanth.


0 Kudos
HemanthCH_Intel
Moderator
824 Views

Hi,

 

We assume that your is resolved. If you need any additional information, please post a new question as this thread will no longer be monitored by Intel.  

 

Thanks & Regards,

Hemanth.

 

0 Kudos
Reply