Intel® MPI Library
Get help with building, analyzing, optimizing, and scaling high-performance computing (HPC) applications.
2161 Discussions

Intel MPI on Mellanox Infiniband

emoreno
Beginner
842 Views
Hi everybody.

I have several months trying to run Intel MPI on our Itanium cluster with Mellanox Infiniband interconnect with IBGold (It works perfectly over ethernet)

apparently, MPI can't find the DAPL provider. my /etc/dat.conf say:
ib0 u1.2 nonthreadsafe default /opt/ibgd/lib/libdapl.so ri.1.1 "InfiniHost0 1" ""
ib1 u1.2 nonthreadsafe default /opt/ibgd/lib/libdapl.so ri.1.1 "InfiniHost0 2" ""

but when I run a MPI code, I get:
mpiexec -genv I_MPI_DEVICE rdma -env I_MPI_DEBUG 4 -n 2 ./a.out
I_MPI: [0] my_dlopen(): dlopen failed: libmpi.def.so
I_MPI: [0] set_up_devices(): will use static-default device
couldn't open /dev/ts_ua_cm0: No such file or directory

using more debug value, I get something strange:
I_MPI: [0] try_one_device(): trying device: libmpi.rdma.so
I_MPI: [0] my_dlsym(): dlsym for dats_get_ia_handle failed: /usr/lib/libdat.so: undefined symbol: dats_get_ia_handle
I_MPI: [0] can_use_dapl_provider(): returning; DAPL provider not ok to use: ib0
I_MPI: [0] can_use_dapl_provider(): returning; DAPL provider not ok to use: ib1


Anybody have a hint?

Thanks.
0 Kudos
2 Replies
TimP
Honored Contributor III
842 Views
Unfortunately, this is a frequent problem with those DAPL drivers. Some have avoided it by switching to OpenIB gen2.
0 Kudos
Intel_C_Intel
Employee
842 Views
Which version of the Mellanox IBGD package are you using?
If it is 1.8.0 or later you may have to enable the DAPL before you can use the Intel MPI.
Install the Mellanox package with everything selected. This makes sure you have the DAPL software installed.
The DAPL driver is not enabled by default on these versions. To enable it you need to make a minor change to a file:
/etc/infiniband/openib.conf
Change the answer to loading UDAPL to YES on the copy on the master node. Do the same thing to all of the other nodes in the cluster. Once you have finished I recommend shutting down all of the compute nodes then rebooting the master node. This will run the openib init correctly and you should see the fabric come up as each node is turned on.
Once all of the nodes are up you should be able to use the Intel MPI with the proper switch to utilize the RDMA driver.
0 Kudos
Reply