Intel® MPI Library
Get help with building, analyzing, optimizing, and scaling high-performance computing (HPC) applications.
2229 Discussions

Rocky Linux 8 using IntelMPI occur error when normal users, good in root user.

haoyahao3
Beginner
236 Views

Here is a strange error when using IntelMPI: After successful install and using IntelMPI runtime and complier, I have complied the fowllowing codes for mpi in  hello.f90 :

 

PROGRAM hello_world_mpi

include 'mpif.h'

 

integer process_Rank, size_Of_Cluster, ierror, tag

 

call MPI_INIT(ierror)

call MPI_COMM_SIZE(MPI_COMM_WORLD, size_Of_Cluster, ierror)

call MPI_COMM_RANK(MPI_COMM_WORLD, process_Rank, ierror)

 

print *, 'Hello World from process: ', process_Rank, 'of ', size_Of_Cluster

 

call MPI_FINALIZE(ierror)

END PROGRAM

 

I have complied it by using intel ifork complier by

 

 mpiifort hello.f90 -o hello

 

Then when I try to run such program as root user:

 

> mpirun -n 2 ./hello

 Hello World from process:            1 of            2
 Hello World from process:            0 of            2

 

 Everything goes right in root user. When I try to run this in any other user, it get:

 

> mpirun -n 2 ./hello

[1733578910.431461221] RFRLServer7:rank89.hello: Unable to create send CQ of size 5080 on mlx5_bond_0: Cannot allocate memory
[1733578910.433054057] RFRLServer7:rank89.hello: Unable to initialize verbs NIC /sys/class/infiniband/mlx5_bond_0 (unit 0:0)
RFRLServer7:rank89: PSM3 can't open nic unit: 0 (err=23)
Abort(1615247) on node 0 (rank 0 in comm 0): Fatal error in PMPI_Init: Unknown error class, error stack:
MPIR_Init_thread(193)........: 
MPID_Init(1715)..............: 
MPIDI_OFI_mpi_init_hook(1673): 
create_vni_context(2242).....: OFI endpoint open failed (ofi_init.c:2242:create_vni_context:Invalid argument)

 

 It seems a wrong with this.

The ifork information: ifort (IFORT) 2021.13.1 20240703

The complier is 2024.2.

The intel MPI is Intel(R) MPI Library for Linux* OS, Version 2021.14 Build 20241121 (id: e7829d6).

My cpu is Intel(R) Xeon(R) Platinum 8360H CPU @ 3.00GHz.

The sysmtem is rocky linux 8.10.

Is there anyone could help me with this problem?

The same problem occur when I use the latest complier ifx. 

Labels (2)
0 Kudos
2 Replies
TobiasK
Moderator
184 Views

@haoyahao3 
Please show the full output of

I_MPI_DEBUG=10 I_MPI_HYDRA_DEBUG=1 mpirun -np 2 IMB-MPI1

0 Kudos
haoyahao3
Beginner
138 Views

Here is the ouput for your information:

[user01@RFRLServer7 ~]$ mpirun -np 2 IMB-MPI1
[mpiexec@RFRLServer7] Launch arguments: /opt/intel/oneapi/mpi/2021.14/bin//hydra_bstrap_proxy --upstream-host RFRLServer7 --upstream-port 43083 --pgid 0 --launcher ssh --launcher-number 0 --base-path /opt/intel/oneapi/mpi/2021.14/bin/ --tree-width 16 --tree-level 1 --time-left -1 --launch-type 2 --debug --proxy-id 0 --node-id 0 --subtree-size 1 --upstream-fd 7 /opt/intel/oneapi/mpi/2021.14/bin//hydra_pmi_proxy --usize -1 --auto-cleanup 1 --abort-signal 9 
[proxy:0:0@RFRLServer7] pmi cmd from fd 9: cmd=init pmi_version=1 pmi_subversion=1
[proxy:0:0@RFRLServer7] PMI response: cmd=response_to_init pmi_version=1 pmi_subversion=1 rc=0
[proxy:0:0@RFRLServer7] pmi cmd from fd 9: cmd=get_maxes
[proxy:0:0@RFRLServer7] PMI response: cmd=maxes kvsname_max=256 keylen_max=64 vallen_max=4096
[proxy:0:0@RFRLServer7] pmi cmd from fd 9: cmd=get_appnum
[proxy:0:0@RFRLServer7] PMI response: cmd=appnum appnum=0
[proxy:0:0@RFRLServer7] pmi cmd from fd 9: cmd=get_my_kvsname
[proxy:0:0@RFRLServer7] PMI response: cmd=my_kvsname kvsname=kvs_3502253_0
[proxy:0:0@RFRLServer7] pmi cmd from fd 9: cmd=get kvsname=kvs_3502253_0 key=PMI_process_mapping
[proxy:0:0@RFRLServer7] PMI response: cmd=get_result rc=0 msg=success value=(vector,(0,1,2))
[proxy:0:0@RFRLServer7] pmi cmd from fd 6: cmd=init pmi_version=1 pmi_subversion=1
[proxy:0:0@RFRLServer7] PMI response: cmd=response_to_init pmi_version=1 pmi_subversion=1 rc=0
[proxy:0:0@RFRLServer7] pmi cmd from fd 6: cmd=get_maxes
[proxy:0:0@RFRLServer7] PMI response: cmd=maxes kvsname_max=256 keylen_max=64 vallen_max=4096
[proxy:0:0@RFRLServer7] pmi cmd from fd 6: cmd=get_appnum
[proxy:0:0@RFRLServer7] PMI response: cmd=appnum appnum=0
[proxy:0:0@RFRLServer7] pmi cmd from fd 6: cmd=get_my_kvsname
[proxy:0:0@RFRLServer7] PMI response: cmd=my_kvsname kvsname=kvs_3502253_0
[proxy:0:0@RFRLServer7] pmi cmd from fd 6: cmd=get kvsname=kvs_3502253_0 key=PMI_process_mapping
[proxy:0:0@RFRLServer7] PMI response: cmd=get_result rc=0 msg=success value=(vector,(0,1,2))
[0] MPI startup(): Intel(R) MPI Library, Version 2021.14  Build 20241121 (id: e7829d6)
[0] MPI startup(): Copyright (C) 2003-2024 Intel Corporation.  All rights reserved.
[0] MPI startup(): library kind: release
[proxy:0:0@RFRLServer7] pmi cmd from fd 9: cmd=barrier_in
[proxy:0:0@RFRLServer7] pmi cmd from fd 6: cmd=put kvsname=kvs_3502253_0 key=-bcast-1-0 value=2F6465762F73686D2F496E74656C5F4D50495F6C4E676D597A
[proxy:0:0@RFRLServer7] PMI response: cmd=put_result rc=0 msg=success
[proxy:0:0@RFRLServer7] pmi cmd from fd 6: cmd=barrier_in
[proxy:0:0@RFRLServer7] PMI response: cmd=barrier_out
[proxy:0:0@RFRLServer7] PMI response: cmd=barrier_out
[proxy:0:0@RFRLServer7] pmi cmd from fd 9: cmd=get kvsname=kvs_3502253_0 key=-bcast-1-0
[proxy:0:0@RFRLServer7] PMI response: cmd=get_result rc=0 msg=success value=2F6465762F73686D2F496E74656C5F4D50495F6C4E676D597A
[0] MPI startup(): libfabric loaded: libfabric.so.1 
[0] MPI startup(): libfabric version: 1.21.0-impi
[0] MPI startup(): max number of MPI_Request per vci: 67108864 (pools: 1)
[0] MPI startup(): libfabric provider: psm3
[1733885461.826111329] RFRLServer7:rank0.IMB-MPI1: Unable to create send CQ of size 5080 on mlx5_bond_0: Cannot allocate memory
[1733885461.826473879] RFRLServer7:rank0.IMB-MPI1: Unable to initialize verbs NIC /sys/class/infiniband/mlx5_bond_0 (unit 0:0)
RFRLServer7:rank0: PSM3 can't open nic unit: 0 (err=23)
Abort(1615247) on node 0 (rank 0 in comm 0): Fatal error in PMPI_Init_thread: Unknown error class, error stack:
MPIR_Init_thread(193)........: 
MPID_Init(1715)..............: 
MPIDI_OFI_mpi_init_hook(1673): 
create_vni_context(2242).....: OFI endpoint open failed (ofi_init.c:2242:create_vni_context:Invalid argument)
[proxy:0:0@RFRLServer7] pmi cmd from fd 6: cmd=abort exitcode=1615247

 

Is there anything to help you solve this problem?

0 Kudos
Reply