Intel® MPI Library
Get help with building, analyzing, optimizing, and scaling high-performance computing (HPC) applications.
公告
FPGA community forums and blogs on community.intel.com are migrating to the new Altera Community and are read-only. For urgent support needs during this transition, please visit the FPGA Design Resources page or contact an Altera Authorized Distributor.

MPI Hello World fails

kmccall882
初学者
3,882 次查看

I'm using IntelMpi 2021.5.1 on RedHat 8.5 with an NFS file system.   My simple Hello World program fails with many error messages.  Any help would be appreciated.

 

Here is the program:

int main(int argc, char *argv[])
{
    int rank, world_size, error_codes[1];
    char hostname[128];
    MPI_Comm intercom;
    MPI_Info info;

    MPI_Init(&argc, &argv);

    MPI_Comm_rank(MPI_COMM_WORLD, &rank);
    MPI_Comm_size(MPI_COMM_WORLD, &world_size);

    gethostname(hostname, 127);

    std::cout << "Hello from process on " << hostname << std::endl;

    MPI_Finalize();
}

 

I sourced /opt/intel/oneapi/setvars.sh before building the executable, and

here is the run command:

$ export I_MPI_PIN_RESPECT_CPUSET=0; mpirun ./parent_simple

 

Here are the abridged error messages.   I eliminated many repetitions:

 

[1646934040.094426] [rocci:306332:0] ib_verbs.h:84 UCX ERROR ibv_exp_query_device(mlx5_0) returned 95: Operation not supported

 

[1646934040.113276] [rocci:306320:0] select.c:434 UCX ERROR no active messages transport to <no debug data>: self/memory - Destination is unreachable, rdmacm/sockaddr - no am bcopy

Abort(1090703) on node 0 (rank 0 in comm 0): Fatal error in PMPI_Init: Other MPI error, error stack:
MPIR_Init_thread(143)........:
MPID_Init(1310)..............:
MPIDI_OFI_mpi_init_hook(1974): OFI get address vector map failed
[1646934040.113302] [rocci:306315:0] select.c:434 UCX ERROR no active messages transport to <no debug data>: self/memory - Destination is unreachable, rdmacm/sockaddr - no am bcopy

 

 

标签 (1)
0 项奖励
3 回复数
VarshaS_Intel
主持人
3,841 次查看

Hi,

 

Thanks for posting in Intel Communities.

 

We are unable to reproduce your issue at our end. We tried with your sample reproducer code and we were able to get the expected results.

 

We followed the below steps using the latest Intel MPI 2021.5 on a Linux machine:

 

1. Please find the below command:

 

For Compiling, use the below command:
mpiicc -o hello hello.cpp
For Running the MPI program, use the below command:
export I_MPI_PIN_RESPECT_CPUSET=0;mpirun -bootstrap ssh -n 1 -ppn 1 ./hello

 

 

Could you please provide us with the OS details and FI provider you are using?

 

Please run the below command for cluster checker and share with us the complete log file.

 

clck -f ./<nodefile> -F mpi_prereq_user

 

Please find the attached screenshot for the expected results.

 

Thanks & Regards,

Varsha

 

 

0 项奖励
Xiao_Z_Intel
员工
3,728 次查看

Hi Kurt,

 

Have you run the cluster checker as Varsha suggested? Could you please also run the following items and share with us the detailed results including the complete log files?

 

  1. share the output of ucx_info -d and fi_info -v
  2. run the code by enabling debug options using I_MPI_DEBUG=10 and FI_LOG_LEVEL=debug
  3. run the code with tcp as your OFI* provider (FI_PROVIDER=tcp ) and enable debug options

 

Thanks,

Xiao

 

0 项奖励
Xiao_Z_Intel
员工
3,652 次查看

Hi Kurt,


We did not heard back from you for the additional information and will close this thread. If you require additional assistance from Intel, please start a new thread. Any further interaction in this thread will be considered community only.


Best,

Xiao


0 项奖励
回复