Community
cancel
Showing results for 
Search instead for 
Did you mean: 
Highlighted
Beginner
216 Views

Intel MPI update 6 with MLX provider cannot be used without mpiexec

Jump to solution

Intel has introduced a new libfabric provider called MLX in Intel MPI v2019 update 6. This provider is selected by default in our servers with Mellanox Connecx-4 adapters, which is expected. However, all executables linking to the MPI libraries do not work unless they are called with `mpiexec`. This is a change of behaviour compared to Intel MPI v2019 update 5  and to the other libfabric providers in Intel MPI v2019 update 6, such as tcp or verbs.

As a result, our users cannot use any tool supporting MPI in a single-node environment.

 

Characteristics of our system:

- CPU: 2x Intel(R) Xeon(R) Gold 6126

- Adapter: Mellanox Technologies MT27700 Family [ConnectX-4]

- Operative System: Cent OS 7.7

- Related libraries: Intel MPI v2019.6, UCX v1.5.1, OFED v4.7

 

Steps to reproduce:

1) Check if the provider of libfabric is listed as mlx. This can be done with the `fi_info` tool from Intel MPI v2019.6

$ fi_info
provider: mlx
    fabric: mlx
    domain: mlx
    version: 1.5
    type: FI_EP_UNSPEC
    protocol: FI_PROTO_MLX
provider: mlx;ofi_rxm
    fabric: mlx
    domain: mlx
    version: 1.0
    type: FI_EP_RDM
    protocol: FI_PROTO_RXM

2) Compile end execute the minimal test program from Intel MPI v2019.6 without `mpiexec`

$ mpicc /path/to/impi-2019.6/test/test.c -o test
$ ./test

 

Result:

Abort(2140047) on node 0 (rank 0 in comm 0): Fatal error in PMPI_Init: Other MPI error, error stack:
MPIR_Init_thread(703)........:  
MPID_Init(958)...............:  
MPIDI_OFI_mpi_init_hook(1334):  
MPIDU_bc_table_create(444)...:

 

Expected result:

Hello world: rank 0 of 1 running on hostname.domainname

 

The expected result is what we obtain by executing the test program with `mpiexec` in Intel MPI update 6, or by changing the libfabric provider to TCP with `FI_PROVIDER=tcp` and executing the test program without `mpiexec`.

Tags (1)
0 Kudos

Accepted Solutions
Highlighted
Moderator
151 Views

Intel® MPI Library 2019 Update 8 is now available and has the implemented fix for this issue. I am marking this case as resolved for Intel support. Any further discussion on this thread will be considered community only. If you need additional Intel support for this issue, please start a new thread.


View solution in original post

0 Kudos
6 Replies
Highlighted
Moderator
216 Views

Hi,

Thanks for reaching out to us!

We tried to reproduce your issue in our environment. We verified that it is working fine with TCP libfabric provider in v2019 update 5 and v2019 update 6.

We are working on mlx libfabric provider and we will get back to you. 

 

Regards

Goutham

0 Kudos
Highlighted
Moderator
216 Views

Hi,

We are able to reproduce your issue with MLX in Intel MPI v2019 update 6.

Thanks for reporting this issue, we will escalate this to the concerned team.

 

Regards

Goutham

0 Kudos
Highlighted
Beginner
216 Views

Thank you for the feedback and taking action on this issue.

 

Kind regards,

 

Alex

0 Kudos
Highlighted
Moderator
216 Views

Our engineering team is planning to have this resolved in 2019 Update 8.

0 Kudos
Highlighted
Beginner
216 Views

Thanks for the feedback. Would it be possible to have an approximate ETA for update 8? We are holding back the release of new Intel toolchains in Easybuild (https://github.com/easybuilders/easybuild) due to this issue.

0 Kudos
Highlighted
Moderator
152 Views

Intel® MPI Library 2019 Update 8 is now available and has the implemented fix for this issue. I am marking this case as resolved for Intel support. Any further discussion on this thread will be considered community only. If you need additional Intel support for this issue, please start a new thread.


View solution in original post

0 Kudos