Intel® oneAPI HPC Toolkit
Get help with building, analyzing, optimizing, and scaling high-performance computing (HPC) applications.
1959 Discussions

intel mpi error - address not mapped to object at address/MPIDIG_context_id_to_comm

psing51
New Contributor I
589 Views

Hi,
I am trying a mpmd run of an application with intel psxe 2020u4 (with intel mpi version as 2019u12). 

 

 

export OMP_NUM_THREADS=1

export UCX_TLS=self,sm,dc
export I_MPI_DEBUG=30
export FI_LOG_LEVEL=debug
time mpirun -v -np 4716 $PWD/app1 : -n 54 $PWD/app2

 

 


i get following issue  after ~20 minutes of run- 

 

 

....
==== backtrace (tid: 441318) ====
 0 0x0000000000012b20 .annobin_sigaction.c()  sigaction.c:0
 1 0x00000000004716c1 MPIDIG_context_id_to_comm()  /localdisk/jenkins/workspace/workspace/ch4-build-linux-2019/impi-ch4-build-linux_build/CONF/impi-ch4-build-
linux-release/label/impi-ch4-build-linux-intel64/_buildspace/release/../../src/mpid/ch4/src/ch4_impl.h:67
 2 0x0000000000472036 MPIDI_POSIX_mpi_iprobe()  /localdisk/jenkins/workspace/workspace/ch4-build-linux-2019/impi-ch4-build-linux_build/CONF/impi-ch4-build-lin
ux-release/label/impi-ch4-build-linux-intel64/_buildspace/release/../../src/mpid/ch4/shm/src/../src/../posix/posix_probe.h:46
 3 0x0000000000472036 MPIDI_SHM_mpi_iprobe()  /localdisk/jenkins/workspace/workspace/ch4-build-linux-2019/impi-ch4-build-linux_build/CONF/impi-ch4-build-linux
-release/label/impi-ch4-build-linux-intel64/_buildspace/release/../../src/mpid/ch4/shm/src/../src/shm_p2p.h:407
 4 0x0000000000472036 MPIDI_iprobe_unsafe()  /localdisk/jenkins/workspace/workspace/ch4-build-linux-2019/impi-ch4-build-linux_build/CONF/impi-ch4-build-linux-
release/label/impi-ch4-build-linux-intel64/_buildspace/release/../../src/mpid/ch4/src/ch4_probe.h:94
 5 0x0000000000472036 MPIDI_iprobe_safe()  /localdisk/jenkins/workspace/workspace/ch4-build-linux-2019/impi-ch4-build-linux_build/CONF/impi-ch4-build-linux-release/label/impi-ch4-build-linux-intel64/_buildspace/release/../../src/mpid/ch4/src/ch4_probe.h:223
 6 0x0000000000472036 MPID_Iprobe()  /localdisk/jenkins/workspace/workspace/ch4-build-linux-2019/impi-ch4-build-linux_build/CONF/impi-ch4-build-linux-release/label/impi-ch4-build-linux-intel64/_buildspace/release/../../src/mpid/ch4/src/ch4_probe.h:373
 7 0x0000000000472036 PMPI_Iprobe()  /localdisk/jenkins/workspace/workspace/ch4-build-linux-2019/impi-ch4-build-linux_build/CONF/impi-ch4-build-linux-release/label/impi-ch4-build-linux-intel64/_buildspace/release/../../src/mpi/pt2pt/iprobe.c:110
 8 0x00000000006a2226 xios::CContextServer::eventLoop()  ???:0
 9 0x00000000006893ea xios::CContext::checkBuffersAndListen()  ???:0
10 0x0000000000ab3857 xios::CServer::eventLoop()  ???:0
11 0x00000000006a71a2 xios::CXios::initServerSide()  ???:0
12 0x000000000040ee50 MAIN__()  ???:0
13 0x0000000000c539a6 main()  ???:0
14 0x00000000000237b3 __libc_start_main()  ???:0
15 0x000000000040ed6e _start()  ???:0
....

 

 

the log size is ~5GB. I plan to share the logs using google drive.
Will update the link soon.

0 Kudos
1 Solution
psing51
New Contributor I
534 Views

Hi,
The code is nemo 3.6 compiled with gcom6.1, xios 2.0 and OASIS3-MCT4.0.
fi_info command is unavailable for normal users on our clusters.

I am in process of uploading the logs.
Meanwhile i recompiled the code with openmpi (with icc), and (as expected) the code works fine!

View solution in original post

4 Replies
VarshaS_Intel
Moderator
553 Views

Hi,

 

Thanks for reaching out to us.

 

Could you please let us know which application you are using so that we can try reproducing your issue at our end?

 

If providing a log takes time, then for the time being, could you please provide the following details to investigate more on your issue?

 

1. The FI_PROVIDER(mlx/psm2/psm3) you are using, please find the below command to get the details

fi_info

2. The sample reproducer code along with the steps, so that we can investigate more on your issue.

 

Thanks & Regards,

Varsha

 

psing51
New Contributor I
535 Views

Hi,
The code is nemo 3.6 compiled with gcom6.1, xios 2.0 and OASIS3-MCT4.0.
fi_info command is unavailable for normal users on our clusters.

I am in process of uploading the logs.
Meanwhile i recompiled the code with openmpi (with icc), and (as expected) the code works fine!

VarshaS_Intel
Moderator
473 Views

Hi,


Could you please let us know if you are able to run the application using Intel MPI without any issues? If not, could you please provide us with the entire system configuration along with the steps to reproduce the issue?


And also, could you please let us know if you are able to run the application without any issues using other versions of Intel MPI?


Thanks & Regards,

Varsha


VarshaS_Intel
Moderator
447 Views

Hi,


We have not heard back from you. This thread will no longer be monitored by Intel. If you need further assistance, please post a new question.


Thanks & Regards,

Varsha


Reply