Intel® MPI Library
Get help with building, analyzing, optimizing, and scaling high-performance computing (HPC) applications.
2211 Discussions

MPI runtime error: Unable to create send CQ of size 5080 on mlx5_0:

Doma-Tacolar
Beginner
2,626 Views

Hi,

I am studying MPI FORTRAN recently, but I encounter a problem when running helloworld.

 

hello.f90 file:

 

PROGRAM hello_world_mpi

include 'mpif.h'

 

integer process_Rank, size_Of_Cluster, ierror, tag

 

call MPI_INIT(ierror)

call MPI_COMM_SIZE(MPI_COMM_WORLD, size_Of_Cluster, ierror)

call MPI_COMM_RANK(MPI_COMM_WORLD, process_Rank, ierror)

 

print *, 'Hello World from process: ', process_Rank, 'of ', size_Of_Cluster

 

call MPI_FINALIZE(ierror)

END PROGRAM

 

 

I then run below code at the terminal:

 

[tjk@master hello_world]$ mpiifort hello.f90 -o hello
[tjk@master hello_world]$ mpirun -n 2 ./hello
MPI startup(): Warning: I_MPI_PMI_LIBRARY will be ignored since the hydra process manager was found
MPI startup(): Warning: I_MPI_PMI_LIBRARY will be ignored since the hydra process manager was found
master:rank0.hello: Unable to create send CQ of size 5080 on mlx5_0: Cannot allocate memory
master:rank0.hello: Unable to initialize verbs NIC /sys/class/infiniband/mlx5_0 (unit 0:0)
master:rank0: PSM3 can't open nic unit: 0 (err=23)
Abort(1615247) on node 0 (rank 0 in comm 0): Fatal error in PMPI_Init: Other MPI error, error stack:
MPIR_Init_thread(176)........:
MPID_Init(1546)..............:
MPIDI_OFI_mpi_init_hook(1558):
create_vni_context(2135).....: OFI endpoint open failed (ofi_init.c:2135:create_vni_context:Invalid argument)

 

And I found weird that the same code runs fine by ssh romote connection.

 

 

 

 

0 Kudos
5 Replies
RabiyaSK_Intel
Employee
2,575 Views

Hi,

 

Thanks for posting in Intel Communities.

 

We weren't able to reproduce the error with Fortran MPI Helloworld sample that you have shared.

 

Here are the details of our system and the Intel MPI Library for your reference:

mpirun --version
Intel(R) MPI Library for Linux* OS, Version 2021.10 Build 20230619 (id: c2e19c2f3e)
Copyright 2003-2023, Intel Corporation.

CPU details: 

lscpu
Architecture:        x86_64
CPU op-mode(s):      32-bit, 64-bit
Byte Order:          Little Endian
CPU(s):              144
On-line CPU(s) list: 0-143
Thread(s) per core:  2
Core(s) per socket:  36
Socket(s):           2
NUMA node(s):        2
Vendor ID:           GenuineIntel
CPU family:          6
Model:               106
Model name:          Intel(R) Xeon(R) Platinum 8360Y CPU @ 2.40GHz
Stepping:            6
CPU MHz:             2401.000
CPU max MHz:         2401.0000
CPU min MHz:         800.0000
BogoMIPS:            4800.00
L1d cache:           48K
L1i cache:           32K
L2 cache:            1280K
L3 cache:            55296K
NUMA node0 CPU(s):   0-35,72-107
NUMA node1 CPU(s):   36-71,108-143
Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb cat_l3 invpcid_single ssbd mba ibrs ibpb stibp ibrs_enhanced fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid cqm rdt_a avx512f avx512dq rdseed adx avx512ifma clflushopt clwb intel_pt avx512cd sha_ni avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local split_lock_detect wbnoinvd dtherm ida arat pln pts hwp hwp_act_window hwp_epp hwp_pkg_req avx512vbmi umip pku ospke avx512_vbmi2 gfni vaes vpclmulqdq avx512_vnni avx512_bitalg tme avx512_vpopcntdq la57 rdpid fsrm md_clear pconfig flush_l1d arch_capabilities

 

Operating system details:

cat /etc/os-release
NAME="Rocky Linux"
VERSION="8.8 (Green Obsidian)"
ID="rocky"
ID_LIKE="rhel centos fedora"
VERSION_ID="8.8"
PLATFORM_ID="platform:el8"
PRETTY_NAME="Rocky Linux 8.8 (Green Obsidian)"
ANSI_COLOR="0;32"
LOGO="fedora-logo-icon"
CPE_NAME="cpe:/o:rocky:rocky:8:GA"
HOME_URL="https://rockylinux.org/"
BUG_REPORT_URL="https://bugs.rockylinux.org/"
SUPPORT_END="2029-05-31"
ROCKY_SUPPORT_PRODUCT="Rocky-Linux-8"
ROCKY_SUPPORT_PRODUCT_VERSION="8.8"
REDHAT_SUPPORT_PRODUCT="Rocky Linux"
REDHAT_SUPPORT_PRODUCT_VERSION="8.8"

 

Here's also a screenshot of the program and it's execution for your reference:

RabiyaSK_Intel_0-1699954627788.png

 

Could you please provide the following details so that we could reproduce your issue at our end?

1. The Intel MPI Library version or Intel OneAPI HPC Toolkit Version

2. CPU, Operating System and Hardware details

3. The job scheduler and the MPI program launcher being used

 

Thanks & Regards,

Shaik Rabiya

 

0 Kudos
Doma-Tacolar
Beginner
2,533 Views
sorry for waitting me so long, I believe the following might be useful.
1. I not sure the oneapi version, but I checked the compiler version
[tjk@master intel]$ icc --version 1
icc: remark #10441: The Intel(R) C++ Compiler Classic (ICC) is deprecated and will be removed from product release in the second half of 2023. The Intel(R) oneAPI DPC++/C++ Compiler (ICX) is the recommended compiler moving forward. Please transition to use this compiler. Use '-diag-disable=10441' to disable this message.
icc: error #10236: File not found: '1'
icc (ICC) 2021.9.0 20230302
Copyright (C) 1985-2023 Intel Corporation. All rights reserved.

[tjk@master intel]$ ifort --version 1
ifort: error #10236: File not found: '1'
ifort (IFORT) 2021.9.0 20230302
Copyright (C) 1985-2023 Intel Corporation. All rights reserved.

2. I not sure about hardware, but here's part of cpu and
operating system information
[root@master ~]# cat /etc/os-release
NAME="CentOS Stream"
VERSION="8"
ID="centos"
ID_LIKE="rhel fedora"
VERSION_ID="8"
PLATFORM_ID="platform:el8"
PRETTY_NAME="CentOS Stream 8"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:centos:centos:8"
HOME_URL="https://centos.org/"
BUG_REPORT_URL="https://bugzilla.redhat.com/"
REDHAT_SUPPORT_PRODUCT="Red Hat Enterprise Linux 8"
REDHAT_SUPPORT_PRODUCT_VERSION="CentOS Stream"

processor : 127
vendor_id : GenuineIntel
cpu family : 6
model : 106
model name : Intel(R) Xeon(R) Platinum 8358P CPU @ 2.60GHz
stepping : 6
microcode : 0xd000389
cpu MHz : 3400.000
cache size : 49152 KB
physical id : 1
siblings : 64
core id : 31
cpu cores : 32
apicid : 191
initial apicid : 191
fpu : yes
fpu_exception : yes
cpuid level : 27
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb cat_l3 invpcid_single ssbd mba ibrs ibpb stibp ibrs_enhanced tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid cqm rdt_a avx512f avx512dq rdseed adx smap avx512ifma clflushopt clwb intel_pt avx512cd sha_ni avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local split_lock_detect wbnoinvd dtherm ida arat pln pts avx512vbmi umip pku ospke avx512_vbmi2 gfni vaes vpclmulqdq avx512_vnni avx512_bitalg tme avx512_vpopcntdq la57 rdpid fsrm md_clear pconfig flush_l1d arch_capabilities
bugs : spectre_v1 spectre_v2 spec_store_bypass swapgs mmio_stale_data eibrs_pbrsb
bogomips : 5228.91
clflush size : 64
cache_alignment : 64
address sizes : 46 bits physical, 57 bits virtual
power management:

3. I not sure what is a mpi luncher, this machine is currently installed with slurm for job management.

Sorry again for the incomplete information, if more is needed could you please tell me how to obtain the necessary information you need?(terminal command)
0 Kudos
RabiyaSK_Intel
Employee
2,519 Views

Hi,


We regret to inform you that Intel MPI Library isn't supported on CentOS. Please go through the below link for system requirements of Intel MPI:

https://www.intel.com/content/www/us/en/developer/articles/system-requirements/mpi-library-system-requirements.html


Please try on the supported Operating system and if the problem still persists, you can reach out to us.


Thanks & Regards,

Shaik Rabiya


0 Kudos
RabiyaSK_Intel
Employee
2,448 Views

Hi,

 

We haven't heard back from you. Could you please confirm if you are facing the same problem on a supported Operating System? 

 

Thanks & Regards,

Shaik Rabiya

 

0 Kudos
RabiyaSK_Intel
Employee
2,291 Views

Hi,


We haven't heard back from you. We have some suggestions to you and other users with the same error.

This error may be caused due to improper MLX installation.

If you are using a single node, you can try setting:

     $ export I_MPI_FABRICS=shm

  or                                   

     $ export I_MPI_FABRICS=ofi 

     $ export FI_PROVIDER=tcp

I hope that this can resolve your issue once you return to check the thread. Unfortunately we would have to close the thread, if you need any additional information you can raise a new question as this thread will no longer be monitored by Intel.


Thanks & Regards,

Shaik Rabiya


0 Kudos
Reply