Intel® MPI Library
Get help with building, analyzing, optimizing, and scaling high-performance computing (HPC) applications.

New MPI error with Intel 2019.1, unable to run MPI hello world

Paul_K_2
Beginner
23,416 Views

After upgrading to update 1 of Intel 2019 we are not able to run even an MPI hello world example. This is new behavior and e.g. a spack installed gcc 8.20 and OpenMPI have no trouble on this system. This is a single workstation and only shm needs to work. For non-mpi use the compilers  work correctly. Presumably dependencies have changed slightly in this new update?

 

$ cat /etc/redhat-release
Red Hat Enterprise Linux Workstation release 7.5 (Maipo)
$ source /opt/intel2019/bin/compilervars.sh intel64
$ mpiicc -v
mpiicc for the Intel(R) MPI Library 2019 Update 1 for Linux*
Copyright 2003-2018, Intel Corporation.
icc version 19.0.1.144 (gcc version 4.8.5 compatibility)
$ cat mpi_hello_world.c
#include <mpi.h>
#include <stdio.h>

int main(int argc, char** argv) {
  // Initialize the MPI environment
  MPI_Init(NULL, NULL);

  // Get the number of processes
  int world_size;
  MPI_Comm_size(MPI_COMM_WORLD, &world_size);

  // Get the rank of the process
  int world_rank;
  MPI_Comm_rank(MPI_COMM_WORLD, &world_rank);

  // Get the name of the processor
  char processor_name[MPI_MAX_PROCESSOR_NAME];
  int name_len;
  MPI_Get_processor_name(processor_name, &name_len);

  // Print off a hello world message
  printf("Hello world from processor %s, rank %d out of %d processors\n",
	 processor_name, world_rank, world_size);

  // Finalize the MPI environment.
  MPI_Finalize();
}
$ mpiicc ./mpi_hello_world.c
$ ./a.out
Abort(1094543) on node 0 (rank 0 in comm 0): Fatal error in PMPI_Init: Other MPI error, error stack:
MPIR_Init_thread(639)......:
MPID_Init(860).............:
MPIDI_NM_mpi_init_hook(689): OFI addrinfo() failed (ofi_init.h:689:MPIDI_NM_mpi_init_hook:No data available)
$ export I_MPI_FABRICS=shm:ofi
$ export I_MPI_DEBUG=666
$ ./a.out
[0] MPI startup(): Imported environment partly inaccesible. Map=0 Info=0
[0] MPI startup(): libfabric version: 1.7.0a1-impi
Abort(1094543) on node 0 (rank 0 in comm 0): Fatal error in PMPI_Init: Other MPI error, error stack:
MPIR_Init_thread(639)......:
MPID_Init(860).............:
MPIDI_NM_mpi_init_hook(689): OFI addrinfo() failed (ofi_init.h:689:MPIDI_NM_mpi_init_hook:No data available)

 

0 Kudos
23 Replies
Liang__C
Beginner
21,525 Views

I have encountered the same problem. Have you got any solutions yet?

0 Kudos
Dmitry_G_Intel
Employee
21,525 Views

Hi Paul,

 

Could you send us an output from an "ifconfig" command, please?

Thank you!

0 Kudos
Paul_K_2
Beginner
21,525 Views

$ ifconfig
docker0: flags=4099<UP,BROADCAST,MULTICAST>  mtu 1500
        inet 172.17.0.1  netmask 255.255.0.0  broadcast 0.0.0.0
        inet6 fe80::42:77ff:fed7:8a4c  prefixlen 64  scopeid 0x20<link>
        ether 02:42:77:d7:8a:4c  txqueuelen 0  (Ethernet)
        RX packets 126235  bytes 5187732 (4.9 MiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 174599  bytes 478222947 (456.0 MiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 128.219.166.53  netmask 255.255.252.0  broadcast 128.219.167.255
        inet6 fe80::225:90ff:fee1:835a  prefixlen 64  scopeid 0x20<link>
        ether 00:25:90:e1:83:5a  txqueuelen 1000  (Ethernet)
        RX packets 58556671  bytes 16033320775 (14.9 GiB)
        RX errors 0  dropped 33204  overruns 0  frame 0
        TX packets 13740853  bytes 6787935989 (6.3 GiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
        device memory 0xe3920000-e393ffff

eth1: flags=4099<UP,BROADCAST,MULTICAST>  mtu 1500
        ether 00:25:90:e1:83:5b  txqueuelen 1000  (Ethernet)
        RX packets 0  bytes 0 (0.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 0  bytes 0 (0.0 B)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
        device memory 0xe3900000-e391ffff

lo: flags=73<UP,LOOPBACK,RUNNING>  mtu 65536
        inet 127.0.0.1  netmask 255.0.0.0
        inet6 ::1  prefixlen 128  scopeid 0x10<host>
        loop  txqueuelen 1000  (Local Loopback)
        RX packets 4183437  bytes 4132051465 (3.8 GiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 4183437  bytes 4132051465 (3.8 GiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

 

0 Kudos
Rudolf_Berrendorf
21,527 Views

I have encountered the same problem / same error message running an application between nodes.

Configuration: Scientific Linux 7.5, Intel latest version

0 Kudos
Paul_K_2
Beginner
21,527 Views

Installing Intel Parallel Studio XE Cluster Edition from scratch (clean install) did not change this problem or error. i.e. It is unrelated to the update procedure 2019.0 -> 2019.1

 

 

0 Kudos
Dmitry_G_Intel
Employee
21,527 Views

Hi Paul,

 

Did you see the same error with IMPI 2019 Gold?

I have two possible methods that could help you:

- set FI_SOCKETS_IFACE=eth0 (or any IP interfaces that works correctly) environment variable.

- set FI_PROVIDER=tcp environment variable (only applicable for IMPI 2019 U1). This will switch to another OFI provider (i.e. IMPI access to the network), but this provider is available as a Technical Preview and will replace OFI/sockets provider in the future releases.

If you still have the same problems, please, collect logs with FI_LOG_LEVEL=debug set. The logs will be printed to standard error (stderr) output.

 

Thank you!

--

Dmitry

0 Kudos
Paul_K_2
Beginner
21,527 Views

Thanks Dmitry. Setting FI_LOG_LEVEL=debug was very helpful:

$ ./a.out
Abort(1094543) on node 0 (rank 0 in comm 0): Fatal error in PMPI_Init: Other MPI error, error stack:
MPIR_Init_thread(639)......:
MPID_Init(860).............:
MPIDI_NM_mpi_init_hook(689): OFI addrinfo() failed (ofi_init.h:689:MPIDI_NM_mpi_init_hook:No data available)
$ export FI_LOG_LEVEL=debug
$ ./a.out
libfabric:core:core:fi_param_define_():223<info> registered var perf_cntr
libfabric:core:core:fi_param_get_():272<info> variable perf_cntr=<not set>
...

libfabric:core:core:fi_getinfo_():899<warn> fi_getinfo: provider psm2 returned -61 (No data available)
libfabric:psm2:core:psmx2_getinfo():341<info>
libfabric:psm2:core:psmx2_init_prov_info():201<info> Unsupported endpoint type
libfabric:psm2:core:psmx2_init_prov_info():203<info> Supported: FI_EP_RDM
libfabric:psm2:core:psmx2_init_prov_info():205<info> Supported: FI_EP_DGRAM
libfabric:psm2:core:psmx2_init_prov_info():207<info> Requested: FI_EP_MSG
libfabric:core:core:fi_getinfo_():899<warn> fi_getinfo: provider psm2 returned -61 (No data available)
libfabric:core:core:ofi_layering_ok():776<info> Need core provider, skipping util ofi_rxm
libfabric:core:core:fi_getinfo_():877<info> Since psm2 can be used, sockets has been skipped. To use sockets, please, set FI_PROVIDER=sockets
libfabric:core:core:fi_getinfo_():877<info> Since psm2 can be used, tcp has been skipped. To use tcp, please, set FI_PROVIDER=tcp
libfabric:core:core:fi_getinfo_():899<warn> fi_getinfo: provider ofi_rxm returned -61 (No data available)
libfabric:core:core:fi_getinfo_():877<info> Since psm2 can be used, sockets has been skipped. To use sockets, please, set FI_PROVIDER=sockets
libfabric:core:core:fi_getinfo_():877<info> Since psm2 can be used, tcp has been skipped. To use tcp, please, set FI_PROVIDER=tcp
Abort(1094543) on node 0 (rank 0 in comm 0): Fatal error in PMPI_Init: Other MPI error, error stack:
MPIR_Init_thread(639)......:
MPID_Init(860).............:
MPIDI_NM_mpi_init_hook(689): OFI addrinfo() failed (ofi_init.h:689:MPIDI_NM_mpi_init_hook:No data available)
libfabric:psm2:core:psmx2_fini():476<info>

$export FI_PROVIDER=tcp
$ ./a.out

...

Hello world from processor system.place.com, rank 0 out of 1 processors

Using export FI_PROVIDER=tcp solves the crash for us. Hopefully this has no performance impact for on-node messages. I also hope update 2 can avoid this environment/configuration variable requirement.

 

0 Kudos
Dmitry_G_Intel
Employee
21,527 Views

Hi Paul,

 

Sorry for inconvenience. Yes, we are going to identify the root cause of the problem.

FI_PROVIDER=tcp has better performance number in compare to FI_PROVIDER=sockets (current default OFI provider for Intel MPI 2019 Gold and U1), but FI_PROVIDER=tcp is a technical preview due to some stability issues.

You are right. TCP provider doesn't impact performance for intra-node communication, because SHM transport is used by default. You can ensure that shm is used for intra- and OFI for inter-node communications by setting I_MPI_FABRICS=shm:ofi.

 

By the way, I have one more question to you that can help our team to indentify the root cause -- Did you try to set FI_SOCKETS_IFACE to any interface? If not, could you try it, please (please, unset FI_PROVIDER or set to FI_PROVIDER=sockets, we should ensure that OFI/sockets provider is used in your test)?

It would be great to check it for all values: docker0; eth0; eth1.

 

Thank you in advance!

--

Dmitry

0 Kudos
Paul_K_2
Beginner
21,527 Views
$ source /opt/intel2019/bin/compilervars.sh intel64
$ ./a.out
Abort(1094543) on node 0 (rank 0 in comm 0): Fatal error in PMPI_Init: Other MPI error, error stack:
MPIR_Init_thread(639)......:
MPID_Init(860).............:
MPIDI_NM_mpi_init_hook(689): OFI addrinfo() failed (ofi_init.h:689:MPIDI_NM_mpi_init_hook:No data available)
$ export FI_SOCKETS_IFACE=eth0
$ ./a.out
Abort(1094543) on node 0 (rank 0 in comm 0): Fatal error in PMPI_Init: Other MPI error, error stack:
MPIR_Init_thread(639)......:
MPID_Init(860).............:
MPIDI_NM_mpi_init_hook(689): OFI addrinfo() failed (ofi_init.h:689:MPIDI_NM_mpi_init_hook:No data available)
$ export FI_PROVIDER=sockets
$ ./a.out
Hello world from processor thing.machine.com, rank 0 out of 1 processors
$ export FI_SOCKETS_IFACE=eth1
[pk7@oxygen t]$ ./a.out
Hello world from processor thing.machine.com, rank 0 out of 1 processors
$ export FI_SOCKETS_IFACE=docker0
[pk7@oxygen t]$ ./a.out
Hello world from processor thing.machine.com, rank 0 out of 1 processors
$ export FI_PROVIDER=""
[pk7@oxygen t]$ ./a.out
Abort(1094543) on node 0 (rank 0 in comm 0): Fatal error in PMPI_Init: Other MPI error, error stack:
MPIR_Init_thread(639)......:
MPID_Init(860).............:
MPIDI_NM_mpi_init_hook(689): OFI addrinfo() failed (ofi_init.h:689:MPIDI_NM_mpi_init_hook:No data available)
$ export FI_PROVIDER=tcp
$ ./a.out
Hello world from processor thing.machine.com, rank 0 out of 1 processors

 

Looks like it "lost" the default sockets provider in the update.

0 Kudos
Dmitry_G_Intel
Employee
21,527 Views

Hi Paul,

Thank you! We appreciate your help.

It looks like that setting either FI_PROVIDER=sockets or FI_PROVIDER=tcp solves your problem, doesn't it?

IMPI 2019 should use sockets OFI provider (i.e. FI_PROVIDER=sockets) by default, but this is not case due to some reason. 

--

Dmitry

0 Kudos
campbell__scott
Beginner
21,527 Views

Hello, I am seeing what appears to be the same problem, but none of the suggest environment variable settings are working for me.

 

This is on CentOS Linux release 7.2.1511.  The test program is /opt/intel/compilers_and_libraries_2019.1.144/linux/mpi/test/test.c compiled with mpigcc.  Trying to launch just on a single host for now.

 

[user1@centos7-2 tmp]$ export LD_LIBRARY_PATH=/opt/intel/compilers_and_libraries_2019.1.144/linux/mpi/intel64/libfabric/lib/:$LD_LIBRARY_PATH
[user1@centos7-2 tmp]$ env | grep FI_
FI_SOCKETS_IFACE=enp0s3
FI_LOG_LEVEL=debug
FI_PROVIDER=tcp
[user1@centos7-2 tmp]$ mpiexec ./mpitest
libfabric:core:core:fi_param_define_():223<info> registered var perf_cntr
libfabric:core:core:fi_param_get_():272<info> variable perf_cntr=<not set>
libfabric:core:core:fi_param_define_():223<info> registered var hook
libfabric:core:core:fi_param_get_():272<info> variable hook=<not set>
libfabric:core:core:fi_param_define_():223<info> registered var provider
libfabric:core:core:fi_param_define_():223<info> registered var fork_unsafe
libfabric:core:core:fi_param_define_():223<info> registered var universe_size
libfabric:core:core:fi_param_get_():281<info> read string var provider=tcp
libfabric:core:core:ofi_create_filter():322<warn> unable to parse filter from: tcp
libfabric:core:core:fi_param_define_():223<info> registered var provider_path
libfabric:core:core:fi_param_get_():272<info> variable provider_path=<not set>
libfabric:core:core:ofi_register_provider():194<warn> no provider structure or name
libfabric:core:core:ofi_register_provider():194<warn> no provider structure or name
libfabric:core:core:ofi_register_provider():194<warn> no provider structure or name
libfabric:core:core:ofi_register_provider():194<warn> no provider structure or name
libfabric:core:core:ofi_register_provider():194<warn> no provider structure or name
libfabric:core:core:ofi_register_provider():194<warn> no provider structure or name
libfabric:core:core:ofi_register_provider():194<warn> no provider structure or name
libfabric:core:core:ofi_register_provider():194<warn> no provider structure or name
libfabric:core:core:ofi_register_provider():194<warn> no provider structure or name
libfabric:core:core:ofi_register_provider():194<warn> no provider structure or name
libfabric:core:core:ofi_register_provider():194<warn> no provider structure or name
libfabric:core:core:fi_param_define_():223<info> registered var rxd_enable
libfabric:core:core:fi_param_get_():272<info> variable rxd_enable=<not set>
libfabric:core:core:ofi_register_provider():194<warn> no provider structure or name
libfabric:core:core:ofi_register_provider():194<warn> no provider structure or name
libfabric:core:core:ofi_register_provider():194<warn> no provider structure or name
Abort(1618831) on node 0 (rank 0 in comm 0): Fatal error in PMPI_Init: Other MPI error, error stack:
MPIR_Init_thread(639)......:
MPID_Init(860).............:
MPIDI_NM_mpi_init_hook(689): OFI addrinfo() failed (ofi_init.h:689:MPIDI_NM_mpi_init_hook:No data available)

 


[user1@centos7-2 tmp]$ export FI_PROVIDER=sockets
[user1@centos7-2 tmp]$ env | grep FI_
FI_SOCKETS_IFACE=enp0s3
FI_LOG_LEVEL=debug
FI_PROVIDER=sockets
[user1@centos7-2 tmp]$ mpiexec ./mpitest
libfabric:core:core:fi_param_define_():223<info> registered var perf_cntr
libfabric:core:core:fi_param_get_():272<info> variable perf_cntr=<not set>
libfabric:core:core:fi_param_define_():223<info> registered var hook
libfabric:core:core:fi_param_get_():272<info> variable hook=<not set>
libfabric:core:core:fi_param_define_():223<info> registered var provider
libfabric:core:core:fi_param_define_():223<info> registered var fork_unsafe
libfabric:core:core:fi_param_define_():223<info> registered var universe_size
libfabric:core:core:fi_param_get_():281<info> read string var provider=sockets
libfabric:core:core:ofi_create_filter():322<warn> unable to parse filter from: sockets
libfabric:core:core:fi_param_define_():223<info> registered var provider_path
libfabric:core:core:fi_param_get_():272<info> variable provider_path=<not set>
libfabric:core:core:ofi_register_provider():194<warn> no provider structure or name
libfabric:core:core:ofi_register_provider():194<warn> no provider structure or name
libfabric:core:core:ofi_register_provider():194<warn> no provider structure or name
libfabric:core:core:ofi_register_provider():194<warn> no provider structure or name
libfabric:core:core:ofi_register_provider():194<warn> no provider structure or name
libfabric:core:core:ofi_register_provider():194<warn> no provider structure or name
libfabric:core:core:ofi_register_provider():194<warn> no provider structure or name
libfabric:core:core:ofi_register_provider():194<warn> no provider structure or name
libfabric:core:core:ofi_register_provider():194<warn> no provider structure or name
libfabric:core:core:ofi_register_provider():194<warn> no provider structure or name
libfabric:core:core:ofi_register_provider():194<warn> no provider structure or name
libfabric:core:core:fi_param_define_():223<info> registered var rxd_enable
libfabric:core:core:fi_param_get_():272<info> variable rxd_enable=<not set>
libfabric:core:core:ofi_register_provider():194<warn> no provider structure or name
libfabric:core:core:ofi_register_provider():194<warn> no provider structure or name
libfabric:core:core:ofi_register_provider():194<warn> no provider structure or name
Abort(1618831) on node 0 (rank 0 in comm 0): Fatal error in PMPI_Init: Other MPI error, error stack:
MPIR_Init_thread(639)......:
MPID_Init(860).............:
MPIDI_NM_mpi_init_hook(689): OFI addrinfo() failed (ofi_init.h:689:MPIDI_NM_mpi_init_hook:No data available)

 

 

[root@centos7-2 tmp]# ifconfig
enp0s3: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 10.10.10.9  netmask 255.0.0.0  broadcast 10.255.255.255
        inet6 fe80::a00:27ff:fe0b:6565  prefixlen 64  scopeid 0x20<link>
        ether 08:00:27:0b:65:65  txqueuelen 1000  (Ethernet)
        RX packets 405685  bytes 534299467 (509.5 MiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 441363  bytes 675447638 (644.1 MiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

enp0s8: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 10.0.3.15  netmask 255.255.255.0  broadcast 10.0.3.255
        inet6 fe80::a00:27ff:fee4:5499  prefixlen 64  scopeid 0x20<link>
        ether 08:00:27:e4:54:99  txqueuelen 1000  (Ethernet)
        RX packets 102700  bytes 133004229 (126.8 MiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 51572  bytes 3477801 (3.3 MiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

lo: flags=73<UP,LOOPBACK,RUNNING>  mtu 65536
        inet 127.0.0.1  netmask 255.0.0.0
        inet6 ::1  prefixlen 128  scopeid 0x10<host>
        loop  txqueuelen 0  (Local Loopback)
        RX packets 48387  bytes 8166614 (7.7 MiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 48387  bytes 8166614 (7.7 MiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

virbr0: flags=4099<UP,BROADCAST,MULTICAST>  mtu 1500
        inet 192.168.122.1  netmask 255.255.255.0  broadcast 192.168.122.255
        ether 52:54:00:86:b9:f7  txqueuelen 0  (Ethernet)
        RX packets 0  bytes 0 (0.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 0  bytes 0 (0.0 B)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

 

 

I get the same errors if I set FI_SOCKETS_IFACE to enp0s8 as well. 

 

Any suggestions?

0 Kudos
campbell__scott
Beginner
21,527 Views

Hello, I am seeing what appears to be the same problem, but none of the suggest environment variable settings are working for me.

 

This is on CentOS Linux release 7.2.1511.  The test program is /opt/intel/compilers_and_libraries_2019.1.144/linux/mpi/test/test.c compiled with mpigcc.  Trying to launch just on a single host for now.

 

[user1@centos7-2 tmp]$ export LD_LIBRARY_PATH=/opt/intel/compilers_and_libraries_2019.1.144/linux/mpi/intel64/libfabric/lib/:$LD_LIBRARY_PATH
[user1@centos7-2 tmp]$ env | grep FI_
FI_SOCKETS_IFACE=enp0s3
FI_LOG_LEVEL=debug
FI_PROVIDER=tcp
[user1@centos7-2 tmp]$ mpiexec ./mpitest
libfabric:core:core:fi_param_define_():223<info> registered var perf_cntr
libfabric:core:core:fi_param_get_():272<info> variable perf_cntr=<not set>
libfabric:core:core:fi_param_define_():223<info> registered var hook
libfabric:core:core:fi_param_get_():272<info> variable hook=<not set>
libfabric:core:core:fi_param_define_():223<info> registered var provider
libfabric:core:core:fi_param_define_():223<info> registered var fork_unsafe
libfabric:core:core:fi_param_define_():223<info> registered var universe_size
libfabric:core:core:fi_param_get_():281<info> read string var provider=tcp
libfabric:core:core:ofi_create_filter():322<warn> unable to parse filter from: tcp
libfabric:core:core:fi_param_define_():223<info> registered var provider_path
libfabric:core:core:fi_param_get_():272<info> variable provider_path=<not set>
libfabric:core:core:ofi_register_provider():194<warn> no provider structure or name
libfabric:core:core:ofi_register_provider():194<warn> no provider structure or name
libfabric:core:core:ofi_register_provider():194<warn> no provider structure or name
libfabric:core:core:ofi_register_provider():194<warn> no provider structure or name
libfabric:core:core:ofi_register_provider():194<warn> no provider structure or name
libfabric:core:core:ofi_register_provider():194<warn> no provider structure or name
libfabric:core:core:ofi_register_provider():194<warn> no provider structure or name
libfabric:core:core:ofi_register_provider():194<warn> no provider structure or name
libfabric:core:core:ofi_register_provider():194<warn> no provider structure or name
libfabric:core:core:ofi_register_provider():194<warn> no provider structure or name
libfabric:core:core:ofi_register_provider():194<warn> no provider structure or name
libfabric:core:core:fi_param_define_():223<info> registered var rxd_enable
libfabric:core:core:fi_param_get_():272<info> variable rxd_enable=<not set>
libfabric:core:core:ofi_register_provider():194<warn> no provider structure or name
libfabric:core:core:ofi_register_provider():194<warn> no provider structure or name
libfabric:core:core:ofi_register_provider():194<warn> no provider structure or name
Abort(1618831) on node 0 (rank 0 in comm 0): Fatal error in PMPI_Init: Other MPI error, error stack:
MPIR_Init_thread(639)......:
MPID_Init(860).............:
MPIDI_NM_mpi_init_hook(689): OFI addrinfo() failed (ofi_init.h:689:MPIDI_NM_mpi_init_hook:No data available)

 


[user1@centos7-2 tmp]$ export FI_PROVIDER=sockets
[user1@centos7-2 tmp]$ env | grep FI_
FI_SOCKETS_IFACE=enp0s3
FI_LOG_LEVEL=debug
FI_PROVIDER=sockets
[user1@centos7-2 tmp]$ mpiexec ./mpitest
libfabric:core:core:fi_param_define_():223<info> registered var perf_cntr
libfabric:core:core:fi_param_get_():272<info> variable perf_cntr=<not set>
libfabric:core:core:fi_param_define_():223<info> registered var hook
libfabric:core:core:fi_param_get_():272<info> variable hook=<not set>
libfabric:core:core:fi_param_define_():223<info> registered var provider
libfabric:core:core:fi_param_define_():223<info> registered var fork_unsafe
libfabric:core:core:fi_param_define_():223<info> registered var universe_size
libfabric:core:core:fi_param_get_():281<info> read string var provider=sockets
libfabric:core:core:ofi_create_filter():322<warn> unable to parse filter from: sockets
libfabric:core:core:fi_param_define_():223<info> registered var provider_path
libfabric:core:core:fi_param_get_():272<info> variable provider_path=<not set>
libfabric:core:core:ofi_register_provider():194<warn> no provider structure or name
libfabric:core:core:ofi_register_provider():194<warn> no provider structure or name
libfabric:core:core:ofi_register_provider():194<warn> no provider structure or name
libfabric:core:core:ofi_register_provider():194<warn> no provider structure or name
libfabric:core:core:ofi_register_provider():194<warn> no provider structure or name
libfabric:core:core:ofi_register_provider():194<warn> no provider structure or name
libfabric:core:core:ofi_register_provider():194<warn> no provider structure or name
libfabric:core:core:ofi_register_provider():194<warn> no provider structure or name
libfabric:core:core:ofi_register_provider():194<warn> no provider structure or name
libfabric:core:core:ofi_register_provider():194<warn> no provider structure or name
libfabric:core:core:ofi_register_provider():194<warn> no provider structure or name
libfabric:core:core:fi_param_define_():223<info> registered var rxd_enable
libfabric:core:core:fi_param_get_():272<info> variable rxd_enable=<not set>
libfabric:core:core:ofi_register_provider():194<warn> no provider structure or name
libfabric:core:core:ofi_register_provider():194<warn> no provider structure or name
libfabric:core:core:ofi_register_provider():194<warn> no provider structure or name
Abort(1618831) on node 0 (rank 0 in comm 0): Fatal error in PMPI_Init: Other MPI error, error stack:
MPIR_Init_thread(639)......:
MPID_Init(860).............:
MPIDI_NM_mpi_init_hook(689): OFI addrinfo() failed (ofi_init.h:689:MPIDI_NM_mpi_init_hook:No data available)

 

 

[root@centos7-2 tmp]# ifconfig
enp0s3: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 10.10.10.9  netmask 255.0.0.0  broadcast 10.255.255.255
        inet6 fe80::a00:27ff:fe0b:6565  prefixlen 64  scopeid 0x20<link>
        ether 08:00:27:0b:65:65  txqueuelen 1000  (Ethernet)
        RX packets 405685  bytes 534299467 (509.5 MiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 441363  bytes 675447638 (644.1 MiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

enp0s8: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 10.0.3.15  netmask 255.255.255.0  broadcast 10.0.3.255
        inet6 fe80::a00:27ff:fee4:5499  prefixlen 64  scopeid 0x20<link>
        ether 08:00:27:e4:54:99  txqueuelen 1000  (Ethernet)
        RX packets 102700  bytes 133004229 (126.8 MiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 51572  bytes 3477801 (3.3 MiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

lo: flags=73<UP,LOOPBACK,RUNNING>  mtu 65536
        inet 127.0.0.1  netmask 255.0.0.0
        inet6 ::1  prefixlen 128  scopeid 0x10<host>
        loop  txqueuelen 0  (Local Loopback)
        RX packets 48387  bytes 8166614 (7.7 MiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 48387  bytes 8166614 (7.7 MiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

virbr0: flags=4099<UP,BROADCAST,MULTICAST>  mtu 1500
        inet 192.168.122.1  netmask 255.255.255.0  broadcast 192.168.122.255
        ether 52:54:00:86:b9:f7  txqueuelen 0  (Ethernet)
        RX packets 0  bytes 0 (0.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 0  bytes 0 (0.0 B)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

 

 

I get the same errors if I set FI_SOCKETS_IFACE to enp0s8 as well. 

 

Any suggestions?

0 Kudos
Paul_K_2
Beginner
21,527 Views

Yes, setting either FI_PROVIDER=sockets or FI_PROVIDER=tcp solves the problem. Thanks.

0 Kudos
campbell__scott
Beginner
21,527 Views

I had made a post here yesterday that never made it through moderation, I have since solved the problem with "source /opt/intel/bin/compilervars.sh intel64", there is no need to push my original post through.  Thanks.

0 Kudos
Matt_H_3
Beginner
21,527 Views

you need FI_PROVIDER_PATH set, eg

 

export FI_PROVIDER_PATH=$MPI_HOME/compilers_and_libraries_2019.2.187/linux/mpi/intel64/libfabric/lib/prov

0 Kudos
Rashawn_K_Intel1
Employee
21,527 Views

Finding the documentation for Intel MPI 2019 environment variables is not easy. I found that setting FI_PROVIDER to tcp or sockets works in my non OFI setting. I would like to know more about this variable.  What are the valid values  for FI_PROVIDER?

Regards,

-Rashawn

0 Kudos
subham_m_
Beginner
21,527 Views

I noticed that instead of setting FI_PROVIDER, setting the following environment variable also works:

I_MPI_FABRICS=shm

 

Whereas setting I_MPI_FABRCS=shm:ofi results in the same error as above.

0 Kudos
L__D__Marks
New Contributor II
21,526 Views

I seem to have the same (or similar) problem (2019 Update 5), except that it only occurs for when run on a E5410. It runs fine with E5-2660, Gold 6130 or Gold 6138, and also runs fine with comp2015/impi/5.0.2.044. The various environmental options suggested don't work (for me).

 mpirun -np 8 -machinefile .machine0 ./Hello
forrtl: severe (168): Program Exception - illegal instruction
Image              PC                Routine            Line        Source             
Hello              0000000000405EA4  Unknown               Unknown  Unknown
libpthread-2.17.s  00002AD66501F5D0  Unknown               Unknown  Unknown
libmpi.so.12.0.0   00002AD664705252  MPL_dbg_pre_init      Unknown  Unknown
libmpi.so.12.0.0   00002AD66421B0FE  MPI_Init              Unknown  Unknown
libmpifort.so.12.  00002AD663937D2B  MPI_INIT              Unknown  Unknown
Hello              0000000000404F40  Unknown               Unknown  Unknown
Hello              0000000000404EE2  Unknown               Unknown  Unknown
libc-2.17.so       00002AD6655503D5  __libc_start_main     Unknown  Unknown
Hello              0000000000404DE9  Unknown               Unknown  Unknown
forrtl: severe (168): Program Exception - illegal instruction
 

0 Kudos
sebastian_d_
Beginner
21,526 Views

Similar thing happens to me. Just installed the 2019.5.281 version of MPI Library. I use the Intel Parallel Studio XE 2019, the latest version on the Visual Studio. Running the MPI code results in the following message:

[mpiexec@Sebastian-PC] bstrap\service\service_launch.c (305): server rejected credentials
[mpiexec@Sebastian-PC] bstrap\src\hydra_bstrap.c (371): error launching bstrap proxy
[mpiexec@Sebastian-PC] mpiexec.c (1898): error setting up the boostrap proxies

With -localonly I can run the code but all cores are executing the same thing as the master code (everybody runs the same) and have the same id.  Any ideas how to fix this?

0 Kudos
Sørensen__Lars_Steen
15,760 Views

I have exactly the same issue as described above by L.D. Marks. I an using Intel(R) Xeon(R) CPU E5-2690 v4 and Red Hat Enterprise 7.7.

If I downgrade from Intel MPI 2019.5 to Intel MPI 2019.2 the everything works just fine.

Compiling my "Hello World" test with 2019.5 and running it using the Intel MPI 2019.2 runtime environment actually works as well but using the 2019.3, 2019.4 or. 2019.5 runtime environment will result in the below error.

Maybe Intel can suggest a solution?

forrtl: severe (168): Program Exception - illegal instruction
Image              PC                Routine            Line        Source      
hello              0000000000404C34  Unknown               Unknown  Unknown
libpthread-2.17.s  00007F41683F3630  Unknown               Unknown  Unknown
libmpi.so.12.0.0   00007F4169512252  MPL_dbg_pre_init      Unknown  Unknown
libmpi.so.12.0.0   00007F41690280FE  MPI_Init              Unknown  Unknown
libmpifort.so.12.  00007F4169B07D2B  MPI_INIT              Unknown  Unknown
hello              0000000000403D40  Unknown               Unknown  Unknown
hello              0000000000403CE2  Unknown               Unknown  Unknown
libc-2.17.so       00007F4167D36545  __libc_start_main     Unknown  Unknown
hello              0000000000403BE9  Unknown               Unknown  Unknown

0 Kudos
Reply