Intel® MPI Library
Get help with building, analyzing, optimizing, and scaling high-performance computing (HPC) applications.

DAPL error (DAT_INVALID_ADDRESS)

Kyungrae
Beginner
213 Views

Hello, I am using Intel MPI version 2018.5.24 for some CFD applications with RDMA DAPL implementations.

For some applications, it runs fine for 30 mins, and then I get the kind of error below.

 

node08:CMA:14822:c3697b40: 1954025166 us(18158 us): DAPL ERR create_qp Address family not supported by protocol
[1443:node08][../../src/mpid/ch3/channels/nemesis/netmod/dapl/dapl_conn_rc.c:502] error(0x120063): ofa-v2-cma-roe-enp65s0np0: could not create DAPL endpoint: DAT_INVALID_ADDRESS(DAT_INVALID_ADDRESS_MALFORMED)

 

My environment is as below.

 

I_MPI_DAPL_PROVIDER=ofa-v2-cma-roe-enp65s0np0
I_MPI_DAT_LIBRARY=/usr/lib64/libdat2.so.2.0.0
I_MPI_DEBUG=5
I_MPI_FABRICS=shm:dapl
I_MPI_FALLBACK=0

DAT_override=/etc/rdma/dat.conf

 

where /etc/rdma/dat.conf contains

ofa-v2-cma-roe-enp65s0np0 u2.0 nonthreadsafe default libdaplofa.so.2 dapl.2.0 "enp65s0np0 1" ""

 

output from ibdev2netdev

mlx5_0 port 1 ==> enp65s0np0 (Up)

 

I have no idea why only some applications return this error or why it runs fine for 30 minutes or an hour without any problem. I appreciate any slight hint for this issue.

 

Thank you very much in advance.

Labels (1)
0 Kudos
1 Reply
TobiasK
Moderator
197 Views

@Kyungrae sorry to tell you that, but Intel MPI 2018.5.24 is ancient and really not supported anymore.

0 Kudos
Reply