Intel® oneAPI HPC Toolkit
Get help with building, analyzing, optimizing, and scaling high-performance computing (HPC) applications.
1959 Discussions

MPI runtime error: Unable to create send CQ of size 5080 on mlx5_0: Cannot allocate memory

whao
Beginner
722 Views

I was trying to run benchmarks/imb/src_c/IMB-MPI1 with Intel MPI 2021.3.0 (as well as 2021.4.0) and got the following error:

 

[wxhao@dec0100pl5app src_c]$ mpirun -np 4 ./IMB-MPI1
dec0100pl5app:rank1.IMB-MPI1: Unable to create send CQ of size 5080 on mlx5_0: Cannot allocate memory
dec0100pl5app:rank0.IMB-MPI1: Unable to create send CQ of size 5080 on mlx5_0: Cannot allocate memory
dec0100pl5app:rank1.IMB-MPI1: Unable to initialize verbs
dec0100pl5app:rank1: PSM3 can't open nic unit: 0 (err=23)
Abort(1615503) on node 1 (rank 1 in comm 0): Fatal error in PMPI_Init: Other MPI error, error stack:
MPIR_Init_thread(138)........:
MPID_Init(1169)..............:
MPIDI_OFI_mpi_init_hook(1807):
create_endpoint(2473)........: OFI endpoint open failed (ofi_init.c:2473:create_endpoint:Invalid argument)

 

The code ran fine with Intel MPI 2021.2.0

 

I am wondering what do I need to do to make it run.  Thanks.

Labels (1)
0 Kudos
1 Solution
SantoshY_Intel
Moderator
690 Views

 

Hi,

 

Thanks for reaching out to us.

 

Could you please confirm the libfabric provider(mlx/psm3/verbs) you are using?

 

Could you please try the below steps: 

 

ibv_devinfo -v

 

You can find the value of max_cq using the above command as highlighted in the attached screenshot below.

If the value-of-max_cq is less than 5080, then try setting:

 

export UCX_RC_TX_CQ_LEN=value-of-max_cq

 

Now, try to run the IMB-MPI1 benchmark. 

 

Could you please try the above steps and let us know whether it works as expected?

 

If you still face any issues, could you please let us know whether you are able to run a "sample mpi hello world" program?

 

Could you please provide us with the OS details along with the results for the below command? Also, please let us know how many nodes you are using for running the MPI benchmark.

 

I_MPI_DEBUG=30 mpirun -v -n <total-no-of-processes> -ppn <no-of-processes-per-node> IMB-MPI1

 

 

Thanks & Regards,

Santosh

 

View solution in original post

3 Replies
SantoshY_Intel
Moderator
691 Views

 

Hi,

 

Thanks for reaching out to us.

 

Could you please confirm the libfabric provider(mlx/psm3/verbs) you are using?

 

Could you please try the below steps: 

 

ibv_devinfo -v

 

You can find the value of max_cq using the above command as highlighted in the attached screenshot below.

If the value-of-max_cq is less than 5080, then try setting:

 

export UCX_RC_TX_CQ_LEN=value-of-max_cq

 

Now, try to run the IMB-MPI1 benchmark. 

 

Could you please try the above steps and let us know whether it works as expected?

 

If you still face any issues, could you please let us know whether you are able to run a "sample mpi hello world" program?

 

Could you please provide us with the OS details along with the results for the below command? Also, please let us know how many nodes you are using for running the MPI benchmark.

 

I_MPI_DEBUG=30 mpirun -v -n <total-no-of-processes> -ppn <no-of-processes-per-node> IMB-MPI1

 

 

Thanks & Regards,

Santosh

 

whao
Beginner
663 Views

Hi Santosh,

 

     Thanks for the help.

 

     My issue is in the incorrect libfabric provider.  I did not select a provider and just use the default.  We are using IPoIB on a Mallenox card.  With Intel MPI 2021.2.0, the default is TCP and I was able to run my code.  With Intel MPI 2021.3.0 and 2021.4.0, the default is PSM3 and I got the error reported earlier.  By setting I_MPI_OFI_PROVIDER to TCP, the code ran fine.

 

    Winston

SantoshY_Intel
Moderator
645 Views

Hi,


Thanks for accepting our solution. Glad to know that your issue is resolved. If you need any additional information, please post a new question as this thread will no longer be monitored by Intel.


Thanks & Regards,

Santosh


Reply