Intel® MPI Library
Get help with building, analyzing, optimizing, and scaling high-performance computing (HPC) applications.
2242 Discussions

Hello World program cannot run on cluster

WangWJ
Novice
603 Views

I have trouble when trying to run a hello world program on cluster.Here is my program:

#include"mpi.h"
#include<iostream>
int main(int argc, char *argv[])
{
    int myid,numprocs;
    MPI_Status status;
    MPI_Init(&argc,&argv);
    MPI_Comm_rank(MPI_COMM_WORLD,&myid);
    MPI_Comm_size(MPI_COMM_WORLD,&numprocs);
    std::cout<<"process: "<<myid<<" of "<<numprocs<<" hello world"<<std::endl;
    MPI_Finalize();
    return 0;
}

 I have complied it by using  gxx by

mpigxx main.cpp

It runs ok on both host1 and host 2 when I use

mpirun -n 4 ./a.out

But when I try to run on the cluster:

mpirun -n 4 -ppn 2 -hosts host1,host2 ./a.out

there is a problem with it:

Abort(1615247) on node 0 (rank 0 in comm 0): Fatal error in PMPI_Init: Unknown error class, error stack:
MPIR_Init_thread(193)........:
MPID_Init(1715)..............:
MPIDI_OFI_mpi_init_hook(1724):
MPIDU_bc_table_create(340)...: Missing hostname or invalid host/port description in business card
Abort(1615247) on node 0 (rank 0 in comm 0): Fatal error in PMPI_Init: Unknown error class, error stack:
MPIR_Init_thread(193)............:
MPID_Init(1715)..................:
MPIDI_OFI_mpi_init_hook(1739)....:
insert_addr_table_roots_only(492): OFI get address vector map failed

Here are some more informations with I_MPI_DEBUG=10

[0] MPI startup(): Intel(R) MPI Library, Version 2021.14  Build 20241121 (id: e7829d6)
[0] MPI startup(): Copyright (C) 2003-2024 Intel Corporation.  All rights reserved.
[0] MPI startup(): library kind: release
[0] MPI startup(): libfabric loaded: libfabric.so.1
[0] MPI startup(): libfabric version: 1.21.0-impi
[0] MPI startup(): max number of MPI_Request per vci: 67108864 (pools: 1)
[0] MPI startup(): libfabric provider: shm
Abort(1615247) on node 0 (rank 0 in comm 0): Fatal error in PMPI_Init: Unknown error class, error stack:
MPIR_Init_thread(193)........:
MPID_Init(1715)..............:
MPIDI_OFI_mpi_init_hook(1724):
MPIDU_bc_table_create(340)...: Missing hostname or invalid host/port description in business card

 Is there anyone could help me with this problem?

Labels (1)
0 Kudos
1 Solution
WangWJ
Novice
224 Views

Thanks. This helps me a lot. I change the OS of all nodes to Ubuntu20.04 and set FI_PROVIDER=tcp. It now runs well.

View solution in original post

0 Kudos
7 Replies
TobiasK
Moderator
509 Views

@WangWJ 

[0] MPI startup(): libfabric provider: shm

 

Do you know who set this? can you please post the output of 
"export" ?

0 Kudos
WangWJ
Novice
452 Views

Libfabric provider was Automatically setted by MPI. I did not set any other MPI related environment variables except I_MPI_DEBUG.

0 Kudos
TobiasK
Moderator
388 Views

@WangWJ 
can you provide more details on your environment, like OS/HW/SW?

0 Kudos
WangWJ
Novice
331 Views

host1:
OS: CentOS Linux release 7.5.1804
CPU: Intel Xeon Gold 6248R CPU @ 3.00GHz   32cores
GCC:4.8.5
IntelMPI:Version 2021.14 Build 20241121

host2:
OS: Ubuntu20.04.6 LTS
CPU: Intel Xeon Gold 6248R CPU @ 3.00GHz   32cores
GCC:9.4.0
IntelMPI:Version 2021.14 Build 20241121

 

Both of them are HuaWeiYun cloud servers

0 Kudos
TobiasK
Moderator
309 Views

@WangWJ CentOS 7.5 is not supported anymore, additionally please use the same OS/SW stack on all nodes

0 Kudos
WangWJ
Novice
225 Views

Thanks. This helps me a lot. I change the OS of all nodes to Ubuntu20.04 and set FI_PROVIDER=tcp. It now runs well.

0 Kudos
dusktilldawn
New Contributor I
291 Views

Make sure that the hostnames host1 and host2 are correctly configured in your /etc/hosts file or DNS system, and that they can communicate with each other.

Verify that you can SSH from one node to the other (host1 to host2, and vice versa) without requiring a password. If passwordless SSH isn't set up, MPI won't be able to launch processes across nodes.

The error mentions "OFI" (which is part of the network fabric layer). Ensure that your cluster nodes have proper network configuration and are able to communicate via the correct interfaces.

The error could also be related to a mismatch in MPI versions or configuration. Ensure both nodes are using the same MPI library and version.

Your mpirun command looks fine, but you can try simplifying it to ensure it's not a syntax issue:

mpirun -n 4 -hostfile hosts.txt ./a.out


In hosts.txt, list your hosts like:

host1 slots=2
host2 slots=2


Double-check your environment variables (I_MPI_DEBUG, etc.) for any misconfiguration, as they can cause initialization errors.

Try these steps, and if it still doesn't work, providing more details about your network setup or MPI installation might help narrow down the issue.

0 Kudos
Reply