- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I'm working with oneapi 2021.2 on centos7.6 3.10.0-957.el7.x86_64.
The network card is Huawei 1822 ( driver:3.5.0.3 ) with RoCE supported.
The version of libfabric is :
fi_info: 1.11.0
libfabric: 1.11.0-impi
libfabric api: 1.11
When I run IMB with command:
/opt/x86_64/libs/compiler/intel/21.2.0/mpi/2021.2.0/bin/mpirun -hostfile /root/host2 -ppn 1 /opt/x86_64/libs/compiler/intel/21.2.0/mpi/2021.2.0/bin/IMB-MPI1
There are errors:
libfabric:94654:udp:core:fi_param_get_():280<info> variable iface=<not set>
libfabric:22194:sockets:core:ofi_get_list_of_addr():1407<info> Available addr: 150.1.68.77, iface name: enp61s0, speed: 25000
libfabric:94654:udp:core:ofi_get_list_of_addr():1407<info> Available addr: 150.1.68.79, iface name: enp61s0, speed: 25000
libfabric:22194:sockets:core:ofi_get_list_of_addr():1407<info> Available addr: 16.16.1.77, iface name: enp177s0, speed: 25000
libfabric:94654:udp:core:ofi_get_list_of_addr():1407<info> Available addr: 16.16.1.79, iface name: enp177s0, speed: 25000
libfabric:22194:sockets:core:ofi_get_list_of_addr():1407<info> Available addr: fe80::7e9c:ab7f:3930:d669, iface name: enp61s0, speed: 25000
libfabric:94654:udp:core:ofi_get_list_of_addr():1407<info> Available addr: fe80::5236:14ee:1ea:9562, iface name: enp61s0, speed: 25000
libfabric:22194:sockets:core:ofi_get_list_of_addr():1407<info> Available addr: fe80::e2e9:6e36:8dfb:a465, iface name: enp177s0, speed: 25000
libfabric:22194:sockets:core:ofi_insert_loopback_addr():1250<info> available addr: : fi_sockaddr_in://127.0.0.1:0
libfabric:22194:sockets:core:ofi_insert_loopback_addr():1265<info> available addr: : fi_sockaddr_in6://[::1]:0
libfabric:22194:sockets:core:util_getinfo_ifs():318<info> Chosen addr for using: 150.1.68.77, speed 25000
libfabric:22194:sockets:core:fi_param_get_():280<info> variable iface=<not set>
libfabric:94654:udp:core:ofi_get_list_of_addr():1407<info> Available addr: fe80::e83b:4ed6:f5:fc1f, iface name: enp177s0, speed: 25000
libfabric:94654:udp:core:ofi_get_list_of_addr():1407<info> Available addr: fe80::3b41:4688:d197:9e38, iface name: enp179s0, speed: 25000
libfabric:94654:udp:core:ofi_insert_loopback_addr():1250<info> available addr: : fi_sockaddr_in://127.0.0.1:0
libfabric:94654:udp:core:ofi_insert_loopback_addr():1265<info> available addr: : fi_sockaddr_in6://[::1]:0
libfabric:94654:udp:core:util_getinfo_ifs():318<info> Chosen addr for using: 150.1.68.79, speed 25000
libfabric:94654:core:core:fi_getinfo_():1033<debug> fi_getinfo: provider udp returned success
libfabric:94654:tcp:core:util_getinfo():147<debug> checking info
libfabric:94654:tcp:core:fi_param_get_():280<info> variable iface=<not set>
libfabric:22194:sockets:core:ofi_get_list_of_addr():1407<info> Available addr: 150.1.68.77, iface name: enp61s0, speed: 25000
libfabric:22194:sockets:core:ofi_get_list_of_addr():1407<info> Available addr: 16.16.1.77, iface name: enp177s0, speed: 25000
libfabric:94654:tcp:core:ofi_get_list_of_addr():1407<info> Available addr: 150.1.68.79, iface name: enp61s0, speed: 25000
libfabric:22194:sockets:core:ofi_get_list_of_addr():1407<info> Available addr: fe80::7e9c:ab7f:3930:d669, iface name: enp61s0, speed: 25000
libfabric:94654:tcp:core:ofi_get_list_of_addr():1407<info> Available addr: 16.16.1.79, iface name: enp177s0, speed: 25000
libfabric:22194:sockets:core:ofi_get_list_of_addr():1407<info> Available addr: fe80::e2e9:6e36:8dfb:a465, iface name: enp177s0, speed: 25000
libfabric:22194:sockets:core:ofi_insert_loopback_addr():1250<info> available addr: : fi_sockaddr_in://127.0.0.1:0
libfabric:22194:sockets:core:ofi_insert_loopback_addr():1265<info> available addr: : fi_sockaddr_in6://[::1]:0
libfabric:22194:sockets:core:util_getinfo_ifs():318<info> Chosen addr for using: 150.1.68.77, speed 25000
libfabric:22194:core:core:fi_getinfo_():1033<debug> fi_getinfo: provider sockets returned success
libfabric:22194:ofi_mrail:fabric:mrail_get_core_info():289<info> OFI_MRAIL_ADDR_STRC env variable not set!
libfabric:22194:core:core:fi_getinfo_():1021<info> fi_getinfo: provider ofi_mrail returned -61 (No data available)
Abort(1091215) on node 0 (rank 0 in comm 0): Fatal error in PMPI_Init_thread: Other MPI error, error stack:
MPIR_Init_thread(138)........:
MPID_Init(1139)..............:
MPIDI_OFI_mpi_init_hook(1238): OFI addrinfo() failed (ofi_init.c:1238:MPIDI_OFI_mpi_init_hook:No data available)
But when I use libfabric compiled from source code v1.12.0 by myself. It works.
The libfabric version:
/opt/x86_64/libs/libfabric-debug/bin/fi_info: 1.12.0
libfabric: 1.12.0
libfabric api: 1.12
Run command:
export LD_LIBRARY_PATH=/opt/x86_64/libs/libfabric-debug/lib:$LD_LIBRARY_PATH
export FI_PROVIDER_PATH=/opt/x86_64/libs/libfabric-debug/lib/libfabric
export FI_LOG_LEVEL=debug
/opt/x86_64/libs/compiler/intel/21.2.0/mpi/2021.2.0/bin/mpirun -hostfile /root/host2 -ppn 1 /opt/x86_64/libs/compiler/intel/21.2.0/mpi/2021.2.0/bin/IMB-MPI1
But when I set then environment fi provider, I got wried result:
1. # if I set export I_MPI_OFI_PROVIDER=verbs
The problem reoccurs and the error log is the same.
2. # if I set export FI_PROVIDER=verbs
It can works correctly.
In summary I have 2 question:
- When I set libfabric compiled by myself, the command works. Why? What's the difference between precompiled libfabric in oneapi and libfabric compiled by myself?
- According to the Intel® MPI Library Over Libfabric , both I_MPI_OFI_PROVIDER and FI_PROVIDER function identically and the I_MPI_OFI_PROVIDER is preferred over the FI_PROVIDER. But as described above, these two environment variables have different effects. What's the difference between I_MPI_OFI_PROVIDER and FI_PROVIDER?
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Thanks for reaching out to us.
For better practice we recommend you to use the command as below.
/opt/x86_64/libs/compiler/intel/21.2.0/mpi/2021.2.0/bin/mpirun -hostfile /root/host2 -np 1 -ppn 1 /opt/x86_64/libs/compiler/intel/21.2.0/mpi/2021.2.0/bin/IMB-MPI1
Could you please provide the Intel libfabric you have been using with the exact connect number?
Could you also re-run the above command by setting the FI_PROVIDER with Intel libfabric?
>>> When I set libfabric compiled by myself, the command works. Why? What's the difference between precompiled libfabric in oneapi and libfabric compiled by myself?
Yes, there are differences between the two libfabric versions.
>>>According to the Intel® MPI Library Over Libfabric , both I_MPI_OFI_PROVIDER and FI_PROVIDER function identically and the I_MPI_OFI_PROVIDER is preferred over the FI_PROVIDER. But as described above, these two environment variables have different effects. What's the difference between I_MPI_OFI_PROVIDER and FI_PROVIDER?
I_MPI_OFI_PROVIDER and FI_PROVIDER both offer the same functionality. I_MPI_OFI_PROVIDER is an Intel MPI Library environment variable and is an alias for FI_PROVIDER, which comes from Libfabric.
Thanks & Regards
Shivani
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
As we didn't hear back from you, Could you please provide the details that have been asked in my previous post so that we can investigate more on your issue?
Thanks & Regards
Shivani
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
As we have not heard back from you. This thread will no longer be monitored by Intel. If you need further assistance, please post a new question.
Have a Good day!
Thanks & Regards
Shivani
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page