- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
After upgrading to update 1 of Intel 2019 we are not able to run even an MPI hello world example. This is new behavior and e.g. a spack installed gcc 8.20 and OpenMPI have no trouble on this system. This is a single workstation and only shm needs to work. For non-mpi use the compilers work correctly. Presumably dependencies have changed slightly in this new update?
$ cat /etc/redhat-release Red Hat Enterprise Linux Workstation release 7.5 (Maipo) $ source /opt/intel2019/bin/compilervars.sh intel64 $ mpiicc -v mpiicc for the Intel(R) MPI Library 2019 Update 1 for Linux* Copyright 2003-2018, Intel Corporation. icc version 19.0.1.144 (gcc version 4.8.5 compatibility) $ cat mpi_hello_world.c #include <mpi.h> #include <stdio.h> int main(int argc, char** argv) { // Initialize the MPI environment MPI_Init(NULL, NULL); // Get the number of processes int world_size; MPI_Comm_size(MPI_COMM_WORLD, &world_size); // Get the rank of the process int world_rank; MPI_Comm_rank(MPI_COMM_WORLD, &world_rank); // Get the name of the processor char processor_name[MPI_MAX_PROCESSOR_NAME]; int name_len; MPI_Get_processor_name(processor_name, &name_len); // Print off a hello world message printf("Hello world from processor %s, rank %d out of %d processors\n", processor_name, world_rank, world_size); // Finalize the MPI environment. MPI_Finalize(); } $ mpiicc ./mpi_hello_world.c $ ./a.out Abort(1094543) on node 0 (rank 0 in comm 0): Fatal error in PMPI_Init: Other MPI error, error stack: MPIR_Init_thread(639)......: MPID_Init(860).............: MPIDI_NM_mpi_init_hook(689): OFI addrinfo() failed (ofi_init.h:689:MPIDI_NM_mpi_init_hook:No data available) $ export I_MPI_FABRICS=shm:ofi $ export I_MPI_DEBUG=666 $ ./a.out [0] MPI startup(): Imported environment partly inaccesible. Map=0 Info=0 [0] MPI startup(): libfabric version: 1.7.0a1-impi Abort(1094543) on node 0 (rank 0 in comm 0): Fatal error in PMPI_Init: Other MPI error, error stack: MPIR_Init_thread(639)......: MPID_Init(860).............: MPIDI_NM_mpi_init_hook(689): OFI addrinfo() failed (ofi_init.h:689:MPIDI_NM_mpi_init_hook:No data available)
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I have encountered the same problem. Have you got any solutions yet?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Paul,
Could you send us an output from an "ifconfig" command, please?
Thank you!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
$ ifconfig
docker0: flags=4099<UP,BROADCAST,MULTICAST> mtu 1500
inet 172.17.0.1 netmask 255.255.0.0 broadcast 0.0.0.0
inet6 fe80::42:77ff:fed7:8a4c prefixlen 64 scopeid 0x20<link>
ether 02:42:77:d7:8a:4c txqueuelen 0 (Ethernet)
RX packets 126235 bytes 5187732 (4.9 MiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 174599 bytes 478222947 (456.0 MiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 128.219.166.53 netmask 255.255.252.0 broadcast 128.219.167.255
inet6 fe80::225:90ff:fee1:835a prefixlen 64 scopeid 0x20<link>
ether 00:25:90:e1:83:5a txqueuelen 1000 (Ethernet)
RX packets 58556671 bytes 16033320775 (14.9 GiB)
RX errors 0 dropped 33204 overruns 0 frame 0
TX packets 13740853 bytes 6787935989 (6.3 GiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
device memory 0xe3920000-e393ffff
eth1: flags=4099<UP,BROADCAST,MULTICAST> mtu 1500
ether 00:25:90:e1:83:5b txqueuelen 1000 (Ethernet)
RX packets 0 bytes 0 (0.0 B)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 0 bytes 0 (0.0 B)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
device memory 0xe3900000-e391ffff
lo: flags=73<UP,LOOPBACK,RUNNING> mtu 65536
inet 127.0.0.1 netmask 255.0.0.0
inet6 ::1 prefixlen 128 scopeid 0x10<host>
loop txqueuelen 1000 (Local Loopback)
RX packets 4183437 bytes 4132051465 (3.8 GiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 4183437 bytes 4132051465 (3.8 GiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I have encountered the same problem / same error message running an application between nodes.
Configuration: Scientific Linux 7.5, Intel latest version
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Installing Intel Parallel Studio XE Cluster Edition from scratch (clean install) did not change this problem or error. i.e. It is unrelated to the update procedure 2019.0 -> 2019.1
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Paul,
Did you see the same error with IMPI 2019 Gold?
I have two possible methods that could help you:
- set FI_SOCKETS_IFACE=eth0 (or any IP interfaces that works correctly) environment variable.
- set FI_PROVIDER=tcp environment variable (only applicable for IMPI 2019 U1). This will switch to another OFI provider (i.e. IMPI access to the network), but this provider is available as a Technical Preview and will replace OFI/sockets provider in the future releases.
If you still have the same problems, please, collect logs with FI_LOG_LEVEL=debug set. The logs will be printed to standard error (stderr) output.
Thank you!
--
Dmitry
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks Dmitry. Setting FI_LOG_LEVEL=debug was very helpful:
$ ./a.out
Abort(1094543) on node 0 (rank 0 in comm 0): Fatal error in PMPI_Init: Other MPI error, error stack:
MPIR_Init_thread(639)......:
MPID_Init(860).............:
MPIDI_NM_mpi_init_hook(689): OFI addrinfo() failed (ofi_init.h:689:MPIDI_NM_mpi_init_hook:No data available)
$ export FI_LOG_LEVEL=debug
$ ./a.out
libfabric:core:core:fi_param_define_():223<info> registered var perf_cntr
libfabric:core:core:fi_param_get_():272<info> variable perf_cntr=<not set>
...libfabric:core:core:fi_getinfo_():899<warn> fi_getinfo: provider psm2 returned -61 (No data available)
libfabric:psm2:core:psmx2_getinfo():341<info>
libfabric:psm2:core:psmx2_init_prov_info():201<info> Unsupported endpoint type
libfabric:psm2:core:psmx2_init_prov_info():203<info> Supported: FI_EP_RDM
libfabric:psm2:core:psmx2_init_prov_info():205<info> Supported: FI_EP_DGRAM
libfabric:psm2:core:psmx2_init_prov_info():207<info> Requested: FI_EP_MSG
libfabric:core:core:fi_getinfo_():899<warn> fi_getinfo: provider psm2 returned -61 (No data available)
libfabric:core:core:ofi_layering_ok():776<info> Need core provider, skipping util ofi_rxm
libfabric:core:core:fi_getinfo_():877<info> Since psm2 can be used, sockets has been skipped. To use sockets, please, set FI_PROVIDER=sockets
libfabric:core:core:fi_getinfo_():877<info> Since psm2 can be used, tcp has been skipped. To use tcp, please, set FI_PROVIDER=tcp
libfabric:core:core:fi_getinfo_():899<warn> fi_getinfo: provider ofi_rxm returned -61 (No data available)
libfabric:core:core:fi_getinfo_():877<info> Since psm2 can be used, sockets has been skipped. To use sockets, please, set FI_PROVIDER=sockets
libfabric:core:core:fi_getinfo_():877<info> Since psm2 can be used, tcp has been skipped. To use tcp, please, set FI_PROVIDER=tcp
Abort(1094543) on node 0 (rank 0 in comm 0): Fatal error in PMPI_Init: Other MPI error, error stack:
MPIR_Init_thread(639)......:
MPID_Init(860).............:
MPIDI_NM_mpi_init_hook(689): OFI addrinfo() failed (ofi_init.h:689:MPIDI_NM_mpi_init_hook:No data available)
libfabric:psm2:core:psmx2_fini():476<info>$export FI_PROVIDER=tcp
$ ./a.out...
Hello world from processor system.place.com, rank 0 out of 1 processors
Using export FI_PROVIDER=tcp solves the crash for us. Hopefully this has no performance impact for on-node messages. I also hope update 2 can avoid this environment/configuration variable requirement.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Paul,
Sorry for inconvenience. Yes, we are going to identify the root cause of the problem.
FI_PROVIDER=tcp has better performance number in compare to FI_PROVIDER=sockets (current default OFI provider for Intel MPI 2019 Gold and U1), but FI_PROVIDER=tcp is a technical preview due to some stability issues.
You are right. TCP provider doesn't impact performance for intra-node communication, because SHM transport is used by default. You can ensure that shm is used for intra- and OFI for inter-node communications by setting I_MPI_FABRICS=shm:ofi.
By the way, I have one more question to you that can help our team to indentify the root cause -- Did you try to set FI_SOCKETS_IFACE to any interface? If not, could you try it, please (please, unset FI_PROVIDER or set to FI_PROVIDER=sockets, we should ensure that OFI/sockets provider is used in your test)?
It would be great to check it for all values: docker0; eth0; eth1.
Thank you in advance!
--
Dmitry
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
$ source /opt/intel2019/bin/compilervars.sh intel64 $ ./a.out Abort(1094543) on node 0 (rank 0 in comm 0): Fatal error in PMPI_Init: Other MPI error, error stack: MPIR_Init_thread(639)......: MPID_Init(860).............: MPIDI_NM_mpi_init_hook(689): OFI addrinfo() failed (ofi_init.h:689:MPIDI_NM_mpi_init_hook:No data available) $ export FI_SOCKETS_IFACE=eth0 $ ./a.out Abort(1094543) on node 0 (rank 0 in comm 0): Fatal error in PMPI_Init: Other MPI error, error stack: MPIR_Init_thread(639)......: MPID_Init(860).............: MPIDI_NM_mpi_init_hook(689): OFI addrinfo() failed (ofi_init.h:689:MPIDI_NM_mpi_init_hook:No data available) $ export FI_PROVIDER=sockets $ ./a.out Hello world from processor thing.machine.com, rank 0 out of 1 processors $ export FI_SOCKETS_IFACE=eth1 [pk7@oxygen t]$ ./a.out Hello world from processor thing.machine.com, rank 0 out of 1 processors $ export FI_SOCKETS_IFACE=docker0 [pk7@oxygen t]$ ./a.out Hello world from processor thing.machine.com, rank 0 out of 1 processors $ export FI_PROVIDER="" [pk7@oxygen t]$ ./a.out Abort(1094543) on node 0 (rank 0 in comm 0): Fatal error in PMPI_Init: Other MPI error, error stack: MPIR_Init_thread(639)......: MPID_Init(860).............: MPIDI_NM_mpi_init_hook(689): OFI addrinfo() failed (ofi_init.h:689:MPIDI_NM_mpi_init_hook:No data available) $ export FI_PROVIDER=tcp $ ./a.out Hello world from processor thing.machine.com, rank 0 out of 1 processors
Looks like it "lost" the default sockets provider in the update.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Paul,
Thank you! We appreciate your help.
It looks like that setting either FI_PROVIDER=sockets or FI_PROVIDER=tcp solves your problem, doesn't it?
IMPI 2019 should use sockets OFI provider (i.e. FI_PROVIDER=sockets) by default, but this is not case due to some reason.
--
Dmitry
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello, I am seeing what appears to be the same problem, but none of the suggest environment variable settings are working for me.
This is on CentOS Linux release 7.2.1511. The test program is /opt/intel/compilers_and_libraries_2019.1.144/linux/mpi/test/test.c compiled with mpigcc. Trying to launch just on a single host for now.
[user1@centos7-2 tmp]$ export LD_LIBRARY_PATH=/opt/intel/compilers_and_libraries_2019.1.144/linux/mpi/intel64/libfabric/lib/:$LD_LIBRARY_PATH
[user1@centos7-2 tmp]$ env | grep FI_
FI_SOCKETS_IFACE=enp0s3
FI_LOG_LEVEL=debug
FI_PROVIDER=tcp
[user1@centos7-2 tmp]$ mpiexec ./mpitest
libfabric:core:core:fi_param_define_():223<info> registered var perf_cntr
libfabric:core:core:fi_param_get_():272<info> variable perf_cntr=<not set>
libfabric:core:core:fi_param_define_():223<info> registered var hook
libfabric:core:core:fi_param_get_():272<info> variable hook=<not set>
libfabric:core:core:fi_param_define_():223<info> registered var provider
libfabric:core:core:fi_param_define_():223<info> registered var fork_unsafe
libfabric:core:core:fi_param_define_():223<info> registered var universe_size
libfabric:core:core:fi_param_get_():281<info> read string var provider=tcp
libfabric:core:core:ofi_create_filter():322<warn> unable to parse filter from: tcp
libfabric:core:core:fi_param_define_():223<info> registered var provider_path
libfabric:core:core:fi_param_get_():272<info> variable provider_path=<not set>
libfabric:core:core:ofi_register_provider():194<warn> no provider structure or name
libfabric:core:core:ofi_register_provider():194<warn> no provider structure or name
libfabric:core:core:ofi_register_provider():194<warn> no provider structure or name
libfabric:core:core:ofi_register_provider():194<warn> no provider structure or name
libfabric:core:core:ofi_register_provider():194<warn> no provider structure or name
libfabric:core:core:ofi_register_provider():194<warn> no provider structure or name
libfabric:core:core:ofi_register_provider():194<warn> no provider structure or name
libfabric:core:core:ofi_register_provider():194<warn> no provider structure or name
libfabric:core:core:ofi_register_provider():194<warn> no provider structure or name
libfabric:core:core:ofi_register_provider():194<warn> no provider structure or name
libfabric:core:core:ofi_register_provider():194<warn> no provider structure or name
libfabric:core:core:fi_param_define_():223<info> registered var rxd_enable
libfabric:core:core:fi_param_get_():272<info> variable rxd_enable=<not set>
libfabric:core:core:ofi_register_provider():194<warn> no provider structure or name
libfabric:core:core:ofi_register_provider():194<warn> no provider structure or name
libfabric:core:core:ofi_register_provider():194<warn> no provider structure or name
Abort(1618831) on node 0 (rank 0 in comm 0): Fatal error in PMPI_Init: Other MPI error, error stack:
MPIR_Init_thread(639)......:
MPID_Init(860).............:
MPIDI_NM_mpi_init_hook(689): OFI addrinfo() failed (ofi_init.h:689:MPIDI_NM_mpi_init_hook:No data available)
[user1@centos7-2 tmp]$ export FI_PROVIDER=sockets
[user1@centos7-2 tmp]$ env | grep FI_
FI_SOCKETS_IFACE=enp0s3
FI_LOG_LEVEL=debug
FI_PROVIDER=sockets
[user1@centos7-2 tmp]$ mpiexec ./mpitest
libfabric:core:core:fi_param_define_():223<info> registered var perf_cntr
libfabric:core:core:fi_param_get_():272<info> variable perf_cntr=<not set>
libfabric:core:core:fi_param_define_():223<info> registered var hook
libfabric:core:core:fi_param_get_():272<info> variable hook=<not set>
libfabric:core:core:fi_param_define_():223<info> registered var provider
libfabric:core:core:fi_param_define_():223<info> registered var fork_unsafe
libfabric:core:core:fi_param_define_():223<info> registered var universe_size
libfabric:core:core:fi_param_get_():281<info> read string var provider=sockets
libfabric:core:core:ofi_create_filter():322<warn> unable to parse filter from: sockets
libfabric:core:core:fi_param_define_():223<info> registered var provider_path
libfabric:core:core:fi_param_get_():272<info> variable provider_path=<not set>
libfabric:core:core:ofi_register_provider():194<warn> no provider structure or name
libfabric:core:core:ofi_register_provider():194<warn> no provider structure or name
libfabric:core:core:ofi_register_provider():194<warn> no provider structure or name
libfabric:core:core:ofi_register_provider():194<warn> no provider structure or name
libfabric:core:core:ofi_register_provider():194<warn> no provider structure or name
libfabric:core:core:ofi_register_provider():194<warn> no provider structure or name
libfabric:core:core:ofi_register_provider():194<warn> no provider structure or name
libfabric:core:core:ofi_register_provider():194<warn> no provider structure or name
libfabric:core:core:ofi_register_provider():194<warn> no provider structure or name
libfabric:core:core:ofi_register_provider():194<warn> no provider structure or name
libfabric:core:core:ofi_register_provider():194<warn> no provider structure or name
libfabric:core:core:fi_param_define_():223<info> registered var rxd_enable
libfabric:core:core:fi_param_get_():272<info> variable rxd_enable=<not set>
libfabric:core:core:ofi_register_provider():194<warn> no provider structure or name
libfabric:core:core:ofi_register_provider():194<warn> no provider structure or name
libfabric:core:core:ofi_register_provider():194<warn> no provider structure or name
Abort(1618831) on node 0 (rank 0 in comm 0): Fatal error in PMPI_Init: Other MPI error, error stack:
MPIR_Init_thread(639)......:
MPID_Init(860).............:
MPIDI_NM_mpi_init_hook(689): OFI addrinfo() failed (ofi_init.h:689:MPIDI_NM_mpi_init_hook:No data available)
[root@centos7-2 tmp]# ifconfig
enp0s3: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 10.10.10.9 netmask 255.0.0.0 broadcast 10.255.255.255
inet6 fe80::a00:27ff:fe0b:6565 prefixlen 64 scopeid 0x20<link>
ether 08:00:27:0b:65:65 txqueuelen 1000 (Ethernet)
RX packets 405685 bytes 534299467 (509.5 MiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 441363 bytes 675447638 (644.1 MiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
enp0s8: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 10.0.3.15 netmask 255.255.255.0 broadcast 10.0.3.255
inet6 fe80::a00:27ff:fee4:5499 prefixlen 64 scopeid 0x20<link>
ether 08:00:27:e4:54:99 txqueuelen 1000 (Ethernet)
RX packets 102700 bytes 133004229 (126.8 MiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 51572 bytes 3477801 (3.3 MiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
lo: flags=73<UP,LOOPBACK,RUNNING> mtu 65536
inet 127.0.0.1 netmask 255.0.0.0
inet6 ::1 prefixlen 128 scopeid 0x10<host>
loop txqueuelen 0 (Local Loopback)
RX packets 48387 bytes 8166614 (7.7 MiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 48387 bytes 8166614 (7.7 MiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
virbr0: flags=4099<UP,BROADCAST,MULTICAST> mtu 1500
inet 192.168.122.1 netmask 255.255.255.0 broadcast 192.168.122.255
ether 52:54:00:86:b9:f7 txqueuelen 0 (Ethernet)
RX packets 0 bytes 0 (0.0 B)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 0 bytes 0 (0.0 B)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
I get the same errors if I set FI_SOCKETS_IFACE to enp0s8 as well.
Any suggestions?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello, I am seeing what appears to be the same problem, but none of the suggest environment variable settings are working for me.
This is on CentOS Linux release 7.2.1511. The test program is /opt/intel/compilers_and_libraries_2019.1.144/linux/mpi/test/test.c compiled with mpigcc. Trying to launch just on a single host for now.
[user1@centos7-2 tmp]$ export LD_LIBRARY_PATH=/opt/intel/compilers_and_libraries_2019.1.144/linux/mpi/intel64/libfabric/lib/:$LD_LIBRARY_PATH
[user1@centos7-2 tmp]$ env | grep FI_
FI_SOCKETS_IFACE=enp0s3
FI_LOG_LEVEL=debug
FI_PROVIDER=tcp
[user1@centos7-2 tmp]$ mpiexec ./mpitest
libfabric:core:core:fi_param_define_():223<info> registered var perf_cntr
libfabric:core:core:fi_param_get_():272<info> variable perf_cntr=<not set>
libfabric:core:core:fi_param_define_():223<info> registered var hook
libfabric:core:core:fi_param_get_():272<info> variable hook=<not set>
libfabric:core:core:fi_param_define_():223<info> registered var provider
libfabric:core:core:fi_param_define_():223<info> registered var fork_unsafe
libfabric:core:core:fi_param_define_():223<info> registered var universe_size
libfabric:core:core:fi_param_get_():281<info> read string var provider=tcp
libfabric:core:core:ofi_create_filter():322<warn> unable to parse filter from: tcp
libfabric:core:core:fi_param_define_():223<info> registered var provider_path
libfabric:core:core:fi_param_get_():272<info> variable provider_path=<not set>
libfabric:core:core:ofi_register_provider():194<warn> no provider structure or name
libfabric:core:core:ofi_register_provider():194<warn> no provider structure or name
libfabric:core:core:ofi_register_provider():194<warn> no provider structure or name
libfabric:core:core:ofi_register_provider():194<warn> no provider structure or name
libfabric:core:core:ofi_register_provider():194<warn> no provider structure or name
libfabric:core:core:ofi_register_provider():194<warn> no provider structure or name
libfabric:core:core:ofi_register_provider():194<warn> no provider structure or name
libfabric:core:core:ofi_register_provider():194<warn> no provider structure or name
libfabric:core:core:ofi_register_provider():194<warn> no provider structure or name
libfabric:core:core:ofi_register_provider():194<warn> no provider structure or name
libfabric:core:core:ofi_register_provider():194<warn> no provider structure or name
libfabric:core:core:fi_param_define_():223<info> registered var rxd_enable
libfabric:core:core:fi_param_get_():272<info> variable rxd_enable=<not set>
libfabric:core:core:ofi_register_provider():194<warn> no provider structure or name
libfabric:core:core:ofi_register_provider():194<warn> no provider structure or name
libfabric:core:core:ofi_register_provider():194<warn> no provider structure or name
Abort(1618831) on node 0 (rank 0 in comm 0): Fatal error in PMPI_Init: Other MPI error, error stack:
MPIR_Init_thread(639)......:
MPID_Init(860).............:
MPIDI_NM_mpi_init_hook(689): OFI addrinfo() failed (ofi_init.h:689:MPIDI_NM_mpi_init_hook:No data available)
[user1@centos7-2 tmp]$ export FI_PROVIDER=sockets
[user1@centos7-2 tmp]$ env | grep FI_
FI_SOCKETS_IFACE=enp0s3
FI_LOG_LEVEL=debug
FI_PROVIDER=sockets
[user1@centos7-2 tmp]$ mpiexec ./mpitest
libfabric:core:core:fi_param_define_():223<info> registered var perf_cntr
libfabric:core:core:fi_param_get_():272<info> variable perf_cntr=<not set>
libfabric:core:core:fi_param_define_():223<info> registered var hook
libfabric:core:core:fi_param_get_():272<info> variable hook=<not set>
libfabric:core:core:fi_param_define_():223<info> registered var provider
libfabric:core:core:fi_param_define_():223<info> registered var fork_unsafe
libfabric:core:core:fi_param_define_():223<info> registered var universe_size
libfabric:core:core:fi_param_get_():281<info> read string var provider=sockets
libfabric:core:core:ofi_create_filter():322<warn> unable to parse filter from: sockets
libfabric:core:core:fi_param_define_():223<info> registered var provider_path
libfabric:core:core:fi_param_get_():272<info> variable provider_path=<not set>
libfabric:core:core:ofi_register_provider():194<warn> no provider structure or name
libfabric:core:core:ofi_register_provider():194<warn> no provider structure or name
libfabric:core:core:ofi_register_provider():194<warn> no provider structure or name
libfabric:core:core:ofi_register_provider():194<warn> no provider structure or name
libfabric:core:core:ofi_register_provider():194<warn> no provider structure or name
libfabric:core:core:ofi_register_provider():194<warn> no provider structure or name
libfabric:core:core:ofi_register_provider():194<warn> no provider structure or name
libfabric:core:core:ofi_register_provider():194<warn> no provider structure or name
libfabric:core:core:ofi_register_provider():194<warn> no provider structure or name
libfabric:core:core:ofi_register_provider():194<warn> no provider structure or name
libfabric:core:core:ofi_register_provider():194<warn> no provider structure or name
libfabric:core:core:fi_param_define_():223<info> registered var rxd_enable
libfabric:core:core:fi_param_get_():272<info> variable rxd_enable=<not set>
libfabric:core:core:ofi_register_provider():194<warn> no provider structure or name
libfabric:core:core:ofi_register_provider():194<warn> no provider structure or name
libfabric:core:core:ofi_register_provider():194<warn> no provider structure or name
Abort(1618831) on node 0 (rank 0 in comm 0): Fatal error in PMPI_Init: Other MPI error, error stack:
MPIR_Init_thread(639)......:
MPID_Init(860).............:
MPIDI_NM_mpi_init_hook(689): OFI addrinfo() failed (ofi_init.h:689:MPIDI_NM_mpi_init_hook:No data available)
[root@centos7-2 tmp]# ifconfig
enp0s3: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 10.10.10.9 netmask 255.0.0.0 broadcast 10.255.255.255
inet6 fe80::a00:27ff:fe0b:6565 prefixlen 64 scopeid 0x20<link>
ether 08:00:27:0b:65:65 txqueuelen 1000 (Ethernet)
RX packets 405685 bytes 534299467 (509.5 MiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 441363 bytes 675447638 (644.1 MiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
enp0s8: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 10.0.3.15 netmask 255.255.255.0 broadcast 10.0.3.255
inet6 fe80::a00:27ff:fee4:5499 prefixlen 64 scopeid 0x20<link>
ether 08:00:27:e4:54:99 txqueuelen 1000 (Ethernet)
RX packets 102700 bytes 133004229 (126.8 MiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 51572 bytes 3477801 (3.3 MiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
lo: flags=73<UP,LOOPBACK,RUNNING> mtu 65536
inet 127.0.0.1 netmask 255.0.0.0
inet6 ::1 prefixlen 128 scopeid 0x10<host>
loop txqueuelen 0 (Local Loopback)
RX packets 48387 bytes 8166614 (7.7 MiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 48387 bytes 8166614 (7.7 MiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
virbr0: flags=4099<UP,BROADCAST,MULTICAST> mtu 1500
inet 192.168.122.1 netmask 255.255.255.0 broadcast 192.168.122.255
ether 52:54:00:86:b9:f7 txqueuelen 0 (Ethernet)
RX packets 0 bytes 0 (0.0 B)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 0 bytes 0 (0.0 B)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
I get the same errors if I set FI_SOCKETS_IFACE to enp0s8 as well.
Any suggestions?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Yes, setting either FI_PROVIDER=sockets or FI_PROVIDER=tcp solves the problem. Thanks.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I had made a post here yesterday that never made it through moderation, I have since solved the problem with "source /opt/intel/bin/compilervars.sh intel64", there is no need to push my original post through. Thanks.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
you need FI_PROVIDER_PATH set, eg
export FI_PROVIDER_PATH=$MPI_HOME/compilers_and_libraries_2019.2.187/linux/mpi/intel64/libfabric/lib/prov
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Finding the documentation for Intel MPI 2019 environment variables is not easy. I found that setting FI_PROVIDER to tcp or sockets works in my non OFI setting. I would like to know more about this variable. What are the valid values for FI_PROVIDER?
Regards,
-Rashawn
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I noticed that instead of setting FI_PROVIDER, setting the following environment variable also works:
I_MPI_FABRICS=shm
Whereas setting I_MPI_FABRCS=shm:ofi results in the same error as above.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I seem to have the same (or similar) problem (2019 Update 5), except that it only occurs for when run on a E5410. It runs fine with E5-2660, Gold 6130 or Gold 6138, and also runs fine with comp2015/impi/5.0.2.044. The various environmental options suggested don't work (for me).
mpirun -np 8 -machinefile .machine0 ./Hello
forrtl: severe (168): Program Exception - illegal instruction
Image PC Routine Line Source
Hello 0000000000405EA4 Unknown Unknown Unknown
libpthread-2.17.s 00002AD66501F5D0 Unknown Unknown Unknown
libmpi.so.12.0.0 00002AD664705252 MPL_dbg_pre_init Unknown Unknown
libmpi.so.12.0.0 00002AD66421B0FE MPI_Init Unknown Unknown
libmpifort.so.12. 00002AD663937D2B MPI_INIT Unknown Unknown
Hello 0000000000404F40 Unknown Unknown Unknown
Hello 0000000000404EE2 Unknown Unknown Unknown
libc-2.17.so 00002AD6655503D5 __libc_start_main Unknown Unknown
Hello 0000000000404DE9 Unknown Unknown Unknown
forrtl: severe (168): Program Exception - illegal instruction
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Similar thing happens to me. Just installed the 2019.5.281 version of MPI Library. I use the Intel Parallel Studio XE 2019, the latest version on the Visual Studio. Running the MPI code results in the following message:
[mpiexec@Sebastian-PC] bstrap\service\service_launch.c (305): server rejected credentials
[mpiexec@Sebastian-PC] bstrap\src\hydra_bstrap.c (371): error launching bstrap proxy
[mpiexec@Sebastian-PC] mpiexec.c (1898): error setting up the boostrap proxies
With -localonly I can run the code but all cores are executing the same thing as the master code (everybody runs the same) and have the same id. Any ideas how to fix this?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I have exactly the same issue as described above by L.D. Marks. I an using Intel(R) Xeon(R) CPU E5-2690 v4 and Red Hat Enterprise 7.7.
If I downgrade from Intel MPI 2019.5 to Intel MPI 2019.2 the everything works just fine.
Compiling my "Hello World" test with 2019.5 and running it using the Intel MPI 2019.2 runtime environment actually works as well but using the 2019.3, 2019.4 or. 2019.5 runtime environment will result in the below error.
Maybe Intel can suggest a solution?
forrtl: severe (168): Program Exception - illegal instruction
Image PC Routine Line Source
hello 0000000000404C34 Unknown Unknown Unknown
libpthread-2.17.s 00007F41683F3630 Unknown Unknown Unknown
libmpi.so.12.0.0 00007F4169512252 MPL_dbg_pre_init Unknown Unknown
libmpi.so.12.0.0 00007F41690280FE MPI_Init Unknown Unknown
libmpifort.so.12. 00007F4169B07D2B MPI_INIT Unknown Unknown
hello 0000000000403D40 Unknown Unknown Unknown
hello 0000000000403CE2 Unknown Unknown Unknown
libc-2.17.so 00007F4167D36545 __libc_start_main Unknown Unknown
hello 0000000000403BE9 Unknown Unknown Unknown

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page