- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Dear All,
I am using IntelMPI + ifort + MKL to compile Quantum-Espresso 6.1. Everthing works fine except invoking scalapack routines. Calls to PDPOTRF may exit with non-zero error code under certain circumstance. In an example, with 2 nodes * 8 processors per node the program works but with 4 nodes * 4 processors per node the program fails. If I_MPI_DEBUG is used, for the failed case there are following messages just before the call exit with code 970, while for the working case there is no such messages:
[10#18754:18754@node09] MRAILI_Ext_sendq_send(): rail 0,vbuf 0x2676900, operation 2, size 12272, lkey 1879682311 [10#18754:18754@node09] MRAILI_Ext_sendq_send(): rail 0,vbuf 0x2675640, operation 2, size 12272, lkey 1879682311 [10#18754:18754@node09] MRAILI_Ext_sendq_send(): rail 0,vbuf 0x26742b8, operation 2, size 12272, lkey 1879682311 [10#18754:18754@node09] MRAILI_Ext_sendq_send(): rail 0,vbuf 0x2676b58, operation 2, size 12272, lkey 1879682311 [10#18754:18754@node09] MRAILI_Ext_sendq_send(): rail 0,vbuf 0x26769c8, operation 2, size 12272, lkey 1879682311 [10#18754:18754@node09] MRAILI_Ext_sendq_send(): rail 0,vbuf 0x2676c20, operation 2, size 12272, lkey 1879682311 [10#18754:18754@node09] MRAILI_Ext_sendq_send(): rail 0,vbuf 0x2675fa0, operation 2, size 12272, lkey 1879682311 [10#18754:18754@node09] MRAILI_Ext_sendq_send(): rail 0,vbuf 0x2676068, operation 2, size 12272, lkey 1879682311 [10#18754:18754@node09] MRAILI_Ext_sendq_send(): rail 0,vbuf 0x2676a90, operation 2, size 12272, lkey 1879682311 [10#18754:18754@node09] MRAILI_Ext_sendq_send(): rail 0,vbuf 0x2676e78, operation 2, size 12272, lkey 1879682311 [10#18754:18754@node09] MRAILI_Ext_sendq_send(): rail 0,vbuf 0x2678778, operation 2, size 12272, lkey 1879682311 [10#18754:18754@node09] MRAILI_Ext_sendq_send(): rail 0,vbuf 0x2675898, operation 2, size 12272, lkey 1879682311 [10#18754:18754@node09] MRAILI_Ext_sendq_send(): rail 0,vbuf 0x2675a28, operation 2, size 12272, lkey 1879682311 [10#18754:18754@node09] MRAILI_Ext_sendq_send(): rail 0,vbuf 0x2675bb8, operation 2, size 12272, lkey 1879682311 [10#18754:18754@node09] MRAILI_Ext_sendq_send(): rail 0,vbuf 0x2674f38, operation 2, size 12272, lkey 1879682311 [10#18754:18754@node09] MRAILI_Ext_sendq_send(): rail 0,vbuf 0x2676ce8, operation 2, size 12272, lkey 1879682311 [10#18754:18754@node09] MRAILI_Ext_sendq_send(): rail 0,vbuf 0x2676130, operation 2, size 12272, lkey 1879682311 [10#18754:18754@node09] MRAILI_Ext_sendq_send(): rail 0,vbuf 0x2674768, operation 2, size 12272, lkey 1879682311 [10#18754:18754@node09] MRAILI_Ext_sendq_send(): rail 0,vbuf 0x2674448, operation 2, size 12272, lkey 1879682311 [10#18754:18754@node09] MRAILI_Ext_sendq_send(): rail 0,vbuf 0x2674b50, operation 2, size 12272, lkey 1879682311 [10#18754:18754@node09] MRAILI_Ext_sendq_send(): rail 0,vbuf 0x2675e10, operation 2, size 12272, lkey 1879682311 [10#18754:18754@node09] MRAILI_Ext_sendq_send(): rail 0,vbuf 0x2675708, operation 2, size 2300, lkey 1879682311
Could you provide any suggestion about what is the possible cause here? Thanks very much.
Feng
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
According to your debug message, it probably caused by MPI. I tried to find the meaning of MRAILI_Ext_sendq_send, it seems like a class defined by MPICH that Intel MPI also related to. I will transfer your problem to cluster forum. Thanks.
Best regards,
Fiona
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Feng,
can you please inform the version of Intel MPI Library, your MPI environment "env | grep I_MPI", and if possible, can you please also share your debug file? You can send the logs via the private message if you think you do not want share here, or you can also submit the issue at Online Service Center (supporttickets.intel.com/).
Thanks.
Zhuowei
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Si, Zhuowei (Intel) wrote:
Hi Feng,
can you please inform the version of Intel MPI Library, your MPI environment "env | grep I_MPI", and if possible, can you please also share your debug file? You can send the logs via the private message if you think you do not want share here, or you can also submit the issue at Online Service Center (supporttickets.intel.com/).
Thanks.
Zhuowei
Hi Zhuowei,
This is Parallel Studio XE 2017sp1, ifort version 17.0.0.098. There is nothing in env except I_MPI_ROOT. The debug file is very large so I sent it by pm. I could not open the Online Service Center as after login it keeps redirecting me to a website ends with "null" in the whole day, but I have just successfully login and submitted the issue. Thanks very much.
Feng

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page