Intel® oneAPI HPC Toolkit
Get help with building, analyzing, optimizing, and scaling high-performance computing (HPC) applications.
Announcements
The Intel sign-in experience has changed to support enhanced security controls. If you sign in, click here for more information.
2020 Discussions

Intel MPI update 6 with MLX provider cannot be used without mpiexec

SoftWeb_V_
Beginner
1,560 Views

Intel has introduced a new libfabric provider called MLX in Intel MPI v2019 update 6. This provider is selected by default in our servers with Mellanox Connecx-4 adapters, which is expected. However, all executables linking to the MPI libraries do not work unless they are called with `mpiexec`. This is a change of behaviour compared to Intel MPI v2019 update 5  and to the other libfabric providers in Intel MPI v2019 update 6, such as tcp or verbs.

As a result, our users cannot use any tool supporting MPI in a single-node environment.

 

Characteristics of our system:

- CPU: 2x Intel(R) Xeon(R) Gold 6126

- Adapter: Mellanox Technologies MT27700 Family [ConnectX-4]

- Operative System: Cent OS 7.7

- Related libraries: Intel MPI v2019.6, UCX v1.5.1, OFED v4.7

 

Steps to reproduce:

1) Check if the provider of libfabric is listed as mlx. This can be done with the `fi_info` tool from Intel MPI v2019.6

$ fi_info
provider: mlx
    fabric: mlx
    domain: mlx
    version: 1.5
    type: FI_EP_UNSPEC
    protocol: FI_PROTO_MLX
provider: mlx;ofi_rxm
    fabric: mlx
    domain: mlx
    version: 1.0
    type: FI_EP_RDM
    protocol: FI_PROTO_RXM

2) Compile end execute the minimal test program from Intel MPI v2019.6 without `mpiexec`

$ mpicc /path/to/impi-2019.6/test/test.c -o test
$ ./test

 

Result:

Abort(2140047) on node 0 (rank 0 in comm 0): Fatal error in PMPI_Init: Other MPI error, error stack:
MPIR_Init_thread(703)........:  
MPID_Init(958)...............:  
MPIDI_OFI_mpi_init_hook(1334):  
MPIDU_bc_table_create(444)...:

 

Expected result:

Hello world: rank 0 of 1 running on hostname.domainname

 

The expected result is what we obtain by executing the test program with `mpiexec` in Intel MPI update 6, or by changing the libfabric provider to TCP with `FI_PROVIDER=tcp` and executing the test program without `mpiexec`.

0 Kudos
1 Solution
James_T_Intel
Moderator
1,495 Views

Intel® MPI Library 2019 Update 8 is now available and has the implemented fix for this issue. I am marking this case as resolved for Intel support. Any further discussion on this thread will be considered community only. If you need additional Intel support for this issue, please start a new thread.


View solution in original post

6 Replies
GouthamK_Intel
Moderator
1,560 Views

Hi,

Thanks for reaching out to us!

We tried to reproduce your issue in our environment. We verified that it is working fine with TCP libfabric provider in v2019 update 5 and v2019 update 6.

We are working on mlx libfabric provider and we will get back to you. 

 

Regards

Goutham

GouthamK_Intel
Moderator
1,560 Views

Hi,

We are able to reproduce your issue with MLX in Intel MPI v2019 update 6.

Thanks for reporting this issue, we will escalate this to the concerned team.

 

Regards

Goutham

SoftWeb_V_
Beginner
1,560 Views

Thank you for the feedback and taking action on this issue.

 

Kind regards,

 

Alex

James_T_Intel
Moderator
1,560 Views

Our engineering team is planning to have this resolved in 2019 Update 8.

SoftWeb_V_
Beginner
1,560 Views

Thanks for the feedback. Would it be possible to have an approximate ETA for update 8? We are holding back the release of new Intel toolchains in Easybuild (https://github.com/easybuilders/easybuild) due to this issue.

James_T_Intel
Moderator
1,496 Views

Intel® MPI Library 2019 Update 8 is now available and has the implemented fix for this issue. I am marking this case as resolved for Intel support. Any further discussion on this thread will be considered community only. If you need additional Intel support for this issue, please start a new thread.


Reply