Intel® MPI Library
Get help with building, analyzing, optimizing, and scaling high-performance computing (HPC) applications.
2229 Discussions

Where is the difference between precompiled libfabric of intelmpi and libfabric compiled by myself?

oleotiger
Novice
1,478 Views

I'm working with oneapi 2021.2 on centos7.6 3.10.0-957.el7.x86_64.

The network card is Huawei 1822 ( driver:3.5.0.3 ) with RoCE supported.

The version of libfabric is :

fi_info: 1.11.0
libfabric: 1.11.0-impi
libfabric api: 1.11

When I run  IMB with command:

/opt/x86_64/libs/compiler/intel/21.2.0/mpi/2021.2.0/bin/mpirun -hostfile /root/host2  -ppn 1 /opt/x86_64/libs/compiler/intel/21.2.0/mpi/2021.2.0/bin/IMB-MPI1

There are errors:

libfabric:94654:udp:core:fi_param_get_():280<info> variable iface=<not set>
libfabric:22194:sockets:core:ofi_get_list_of_addr():1407<info> Available addr: 150.1.68.77, iface name: enp61s0, speed: 25000
libfabric:94654:udp:core:ofi_get_list_of_addr():1407<info> Available addr: 150.1.68.79, iface name: enp61s0, speed: 25000
libfabric:22194:sockets:core:ofi_get_list_of_addr():1407<info> Available addr: 16.16.1.77, iface name: enp177s0, speed: 25000
libfabric:94654:udp:core:ofi_get_list_of_addr():1407<info> Available addr: 16.16.1.79, iface name: enp177s0, speed: 25000
libfabric:22194:sockets:core:ofi_get_list_of_addr():1407<info> Available addr: fe80::7e9c:ab7f:3930:d669, iface name: enp61s0, speed: 25000
libfabric:94654:udp:core:ofi_get_list_of_addr():1407<info> Available addr: fe80::5236:14ee:1ea:9562, iface name: enp61s0, speed: 25000
libfabric:22194:sockets:core:ofi_get_list_of_addr():1407<info> Available addr: fe80::e2e9:6e36:8dfb:a465, iface name: enp177s0, speed: 25000
libfabric:22194:sockets:core:ofi_insert_loopback_addr():1250<info> available addr: : fi_sockaddr_in://127.0.0.1:0
libfabric:22194:sockets:core:ofi_insert_loopback_addr():1265<info> available addr: : fi_sockaddr_in6://[::1]:0
libfabric:22194:sockets:core:util_getinfo_ifs():318<info> Chosen addr for using: 150.1.68.77, speed 25000
libfabric:22194:sockets:core:fi_param_get_():280<info> variable iface=<not set>
libfabric:94654:udp:core:ofi_get_list_of_addr():1407<info> Available addr: fe80::e83b:4ed6:f5:fc1f, iface name: enp177s0, speed: 25000
libfabric:94654:udp:core:ofi_get_list_of_addr():1407<info> Available addr: fe80::3b41:4688:d197:9e38, iface name: enp179s0, speed: 25000
libfabric:94654:udp:core:ofi_insert_loopback_addr():1250<info> available addr: : fi_sockaddr_in://127.0.0.1:0
libfabric:94654:udp:core:ofi_insert_loopback_addr():1265<info> available addr: : fi_sockaddr_in6://[::1]:0
libfabric:94654:udp:core:util_getinfo_ifs():318<info> Chosen addr for using: 150.1.68.79, speed 25000
libfabric:94654:core:core:fi_getinfo_():1033<debug> fi_getinfo: provider udp returned success
libfabric:94654:tcp:core:util_getinfo():147<debug> checking info
libfabric:94654:tcp:core:fi_param_get_():280<info> variable iface=<not set>
libfabric:22194:sockets:core:ofi_get_list_of_addr():1407<info> Available addr: 150.1.68.77, iface name: enp61s0, speed: 25000
libfabric:22194:sockets:core:ofi_get_list_of_addr():1407<info> Available addr: 16.16.1.77, iface name: enp177s0, speed: 25000
libfabric:94654:tcp:core:ofi_get_list_of_addr():1407<info> Available addr: 150.1.68.79, iface name: enp61s0, speed: 25000
libfabric:22194:sockets:core:ofi_get_list_of_addr():1407<info> Available addr: fe80::7e9c:ab7f:3930:d669, iface name: enp61s0, speed: 25000
libfabric:94654:tcp:core:ofi_get_list_of_addr():1407<info> Available addr: 16.16.1.79, iface name: enp177s0, speed: 25000
libfabric:22194:sockets:core:ofi_get_list_of_addr():1407<info> Available addr: fe80::e2e9:6e36:8dfb:a465, iface name: enp177s0, speed: 25000
libfabric:22194:sockets:core:ofi_insert_loopback_addr():1250<info> available addr: : fi_sockaddr_in://127.0.0.1:0
libfabric:22194:sockets:core:ofi_insert_loopback_addr():1265<info> available addr: : fi_sockaddr_in6://[::1]:0
libfabric:22194:sockets:core:util_getinfo_ifs():318<info> Chosen addr for using: 150.1.68.77, speed 25000
libfabric:22194:core:core:fi_getinfo_():1033<debug> fi_getinfo: provider sockets returned success
libfabric:22194:ofi_mrail:fabric:mrail_get_core_info():289<info> OFI_MRAIL_ADDR_STRC env variable not set!
libfabric:22194:core:core:fi_getinfo_():1021<info> fi_getinfo: provider ofi_mrail returned -61 (No data available)
Abort(1091215) on node 0 (rank 0 in comm 0): Fatal error in PMPI_Init_thread: Other MPI error, error stack:
MPIR_Init_thread(138)........:
MPID_Init(1139)..............:
MPIDI_OFI_mpi_init_hook(1238): OFI addrinfo() failed (ofi_init.c:1238:MPIDI_OFI_mpi_init_hook:No data available)

 

But when I use libfabric compiled from source code v1.12.0 by myself. It works.

The libfabric version:

/opt/x86_64/libs/libfabric-debug/bin/fi_info: 1.12.0
libfabric: 1.12.0
libfabric api: 1.12

 Run command:

export LD_LIBRARY_PATH=/opt/x86_64/libs/libfabric-debug/lib:$LD_LIBRARY_PATH
export FI_PROVIDER_PATH=/opt/x86_64/libs/libfabric-debug/lib/libfabric
export FI_LOG_LEVEL=debug
 /opt/x86_64/libs/compiler/intel/21.2.0/mpi/2021.2.0/bin/mpirun -hostfile /root/host2  -ppn 1 /opt/x86_64/libs/compiler/intel/21.2.0/mpi/2021.2.0/bin/IMB-MPI1

 

But when I set then environment fi provider, I got wried result:

1. # if I set export I_MPI_OFI_PROVIDER=verbs
The problem reoccurs and the error log is the same.
2. # if I set export FI_PROVIDER=verbs
It can works correctly.

 

In summary I have 2 question:

  • When I set libfabric compiled by myself, the command works. Why? What's the difference between precompiled libfabric  in oneapi and libfabric compiled by myself?
  • According to the Intel® MPI Library Over Libfabric , both I_MPI_OFI_PROVIDER and  FI_PROVIDER function identically and the I_MPI_OFI_PROVIDER is preferred over the FI_PROVIDER. But as described above, these two environment variables have different effects. What's the difference between I_MPI_OFI_PROVIDER and FI_PROVIDER?

 

0 Kudos
3 Replies
ShivaniK_Intel
Moderator
1,458 Views

Hi,

 

Thanks for reaching out to us.

 

For better practice we recommend you to use the command as below.

 

/opt/x86_64/libs/compiler/intel/21.2.0/mpi/2021.2.0/bin/mpirun -hostfile /root/host2 -np 1 -ppn 1 /opt/x86_64/libs/compiler/intel/21.2.0/mpi/2021.2.0/bin/IMB-MPI1

 

Could you please provide the Intel libfabric you have been using with the exact connect number?

 

Could you also re-run the above command by setting the FI_PROVIDER with Intel libfabric?

 

>>> When I set libfabric compiled by myself, the command works. Why? What's the difference between precompiled libfabric  in oneapi and libfabric compiled by myself?

Yes, there are differences between the two libfabric versions.

 

>>>According to the Intel® MPI Library Over Libfabric , both I_MPI_OFI_PROVIDER and  FI_PROVIDER function identically and the I_MPI_OFI_PROVIDER is preferred over the FI_PROVIDER. But as described above, these two environment variables have different effects. What's the difference between I_MPI_OFI_PROVIDER and FI_PROVIDER?

I_MPI_OFI_PROVIDER and FI_PROVIDER both offer the same functionality. I_MPI_OFI_PROVIDER is an Intel MPI Library environment variable and is an alias for FI_PROVIDER, which comes from Libfabric.

 

 

Thanks & Regards

Shivani 

 

0 Kudos
ShivaniK_Intel
Moderator
1,436 Views

Hi,

 

As we didn't hear back from you, Could you please provide the details that have been asked in my previous post so that we can investigate more on your issue?

 

Thanks & Regards

Shivani

 

0 Kudos
ShivaniK_Intel
Moderator
1,417 Views

Hi,

 

As we have not heard back from you. This thread will no longer be monitored by Intel. If you need further assistance, please post a new question.

 

Have a Good day!

 

Thanks & Regards

Shivani

 

0 Kudos
Reply