I'm using IntelMpi 2021.5.1 on RedHat 8.5 with an NFS file system. My simple Hello World program fails with many error messages. Any help would be appreciated.
Here is the program:
int main(int argc, char *argv[])
{
int rank, world_size, error_codes[1];
char hostname[128];
MPI_Comm intercom;
MPI_Info info;
MPI_Init(&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Comm_size(MPI_COMM_WORLD, &world_size);
gethostname(hostname, 127);
std::cout << "Hello from process on " << hostname << std::endl;
MPI_Finalize();
}
I sourced /opt/intel/oneapi/setvars.sh before building the executable, and
here is the run command:
$ export I_MPI_PIN_RESPECT_CPUSET=0; mpirun ./parent_simple
Here are the abridged error messages. I eliminated many repetitions:
[1646934040.094426] [rocci:306332:0] ib_verbs.h:84 UCX ERROR ibv_exp_query_device(mlx5_0) returned 95: Operation not supported
[1646934040.113276] [rocci:306320:0] select.c:434 UCX ERROR no active messages transport to <no debug data>: self/memory - Destination is unreachable, rdmacm/sockaddr - no am bcopy
Abort(1090703) on node 0 (rank 0 in comm 0): Fatal error in PMPI_Init: Other MPI error, error stack:
MPIR_Init_thread(143)........:
MPID_Init(1310)..............:
MPIDI_OFI_mpi_init_hook(1974): OFI get address vector map failed
[1646934040.113302] [rocci:306315:0] select.c:434 UCX ERROR no active messages transport to <no debug data>: self/memory - Destination is unreachable, rdmacm/sockaddr - no am bcopy
链接已复制
Hi,
Thanks for posting in Intel Communities.
We are unable to reproduce your issue at our end. We tried with your sample reproducer code and we were able to get the expected results.
We followed the below steps using the latest Intel MPI 2021.5 on a Linux machine:
1. Please find the below command:
For Compiling, use the below command:
mpiicc -o hello hello.cpp
For Running the MPI program, use the below command:
export I_MPI_PIN_RESPECT_CPUSET=0;mpirun -bootstrap ssh -n 1 -ppn 1 ./hello
Could you please provide us with the OS details and FI provider you are using?
Please run the below command for cluster checker and share with us the complete log file.
clck -f ./<nodefile> -F mpi_prereq_user
Please find the attached screenshot for the expected results.
Thanks & Regards,
Varsha
Hi Kurt,
Have you run the cluster checker as Varsha suggested? Could you please also run the following items and share with us the detailed results including the complete log files?
- share the output of ucx_info -d and fi_info -v
- run the code by enabling debug options using I_MPI_DEBUG=10 and FI_LOG_LEVEL=debug
- run the code with tcp as your OFI* provider (FI_PROVIDER=tcp ) and enable debug options
Thanks,
Xiao
Hi Kurt,
We did not heard back from you for the additional information and will close this thread. If you require additional assistance from Intel, please start a new thread. Any further interaction in this thread will be considered community only.
Best,
Xiao
