Intel® MPI Library
Get help with building, analyzing, optimizing, and scaling high-performance computing (HPC) applications.
2161 Discussions

MPI program aborts with an "Assertion failed in file ch4_shm_coll.c" message

ombrophile
Beginner
7,704 Views

Hi,

I have written a Fortran code to solve some differential equations. Moreover, this makes use of MPI. Initially, the code worked fine when a simple 1D decomposition of the input arrays was being used. Recently, in order to enable computation on larger-sized arrays, I had modified the code to use 2D decomposition using the MPI topology feature. However, upon running, the code sometimes exits with the following error:

 

Assertion failed in file ../../src/mpid/ch4/src/intel/ch4_shm_coll.c at line 2266: comm->shm_numa_layout[my_numa_node].base_addr
Assertion failed in file ../../src/mpid/ch4/src/intel/ch4_shm_coll.c at line 2266: comm->shm_numa_layout[my_numa_node].base_addr
Assertion failed in file ../../src/mpid/ch4/src/intel/ch4_shm_coll.c at line 2266: comm->shm_numa_layout[my_numa_node].base_addr
Assertion failed in file ../../src/mpid/ch4/src/intel/ch4_shm_coll.c at line 2266: comm->shm_numa_layout[my_numa_node].base_addr
Assertion failed in file ../../src/mpid/ch4/src/intel/ch4_shm_coll.c at line 2266: comm->shm_numa_layout[my_numa_node].base_addr
Assertion failed in file ../../src/mpid/ch4/src/intel/ch4_shm_coll.c at line 2266: comm->shm_numa_layout[my_numa_node].base_addr
Assertion failed in file ../../src/mpid/ch4/src/intel/ch4_shm_coll.c at line 2266: comm->shm_numa_layout[my_numa_node].base_addr
Assertion failed in file ../../src/mpid/ch4/src/intel/ch4_shm_coll.c at line 2266: comm->shm_numa_layout[my_numa_node].base_addr
/opt/intel/oneapi/mpi/2021.5.1//lib/release/libmpi.so.12(MPL_backtrace_show+0x1c) [0x7f595a5c7bcc]
/opt/intel/oneapi/mpi/2021.5.1//lib/release/libmpi.so.12(MPIR_Assert_fail+0x21) [0x7f5959fa1df1]
/opt/intel/oneapi/mpi/2021.5.1//lib/release/libmpi.so.12(+0x2b1eb9) [0x7f5959c70eb9]
/opt/intel/oneapi/mpi/2021.5.1//lib/release/libmpi.so.12(+0x1a8c18) [0x7f5959b67c18]
/opt/intel/oneapi/mpi/2021.5.1//lib/release/libmpi.so.12(+0x1717ec) [0x7f5959b307ec]
/opt/intel/oneapi/mpi/2021.5.1//lib/release/libmpi.so.12(+0x2b4387) [0x7f5959c73387]
/opt/intel/oneapi/mpi/2021.5.1//lib/release/libmpi.so.12(PMPI_Allreduce+0x561) [0x7f5959ace6e1]
/opt/intel/oneapi/mpi/2021.5.1//lib/libmpifort.so.12(+0xdd95a) [0x7f595b2d195a]
/opt/intel/oneapi/mpi/2021.5.1//lib/libmpifort.so.12(mpi_allreduce_f08ts_+0x208) [0x7f595b23c758]
./cav_2d.exe() [0x45a512]
./cav_2d.exe() [0x408612]
./cav_2d.exe() [0x404ae2]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf3) [0x7f59596640b3]
./cav_2d.exe() [0x4049ee]
Abort(1) on node 6: Internal error
/opt/intel/oneapi/mpi/2021.5.1//lib/release/libmpi.so.12(MPL_backtrace_show+0x1c) [0x7feb1f8a5bcc]
/opt/intel/oneapi/mpi/2021.5.1//lib/release/libmpi.so.12(MPIR_Assert_fail+0x21) [0x7feb1f27fdf1]
/opt/intel/oneapi/mpi/2021.5.1//lib/release/libmpi.so.12(+0x2b1eb9) [0x7feb1ef4eeb9]
/opt/intel/oneapi/mpi/2021.5.1//lib/release/libmpi.so.12(+0x1a8c18) [0x7feb1ee45c18]
/opt/intel/oneapi/mpi/2021.5.1//lib/release/libmpi.so.12(+0x1717ec) [0x7feb1ee0e7ec]
/opt/intel/oneapi/mpi/2021.5.1//lib/release/libmpi.so.12(+0x2b4387) [0x7feb1ef51387]
/opt/intel/oneapi/mpi/2021.5.1//lib/release/libmpi.so.12(PMPI_Allreduce+0x561) [0x7feb1edac6e1]
/opt/intel/oneapi/mpi/2021.5.1//lib/libmpifort.so.12(+0xdd95a) [0x7feb205af95a]
/opt/intel/oneapi/mpi/2021.5.1//lib/libmpifort.so.12(mpi_allreduce_f08ts_+0x208) [0x7feb2051a758]
./cav_2d.exe() [0x45a512]
./cav_2d.exe() [0x408612]
./cav_2d.exe() [0x404ae2]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf3) [0x7feb1e9420b3]
./cav_2d.exe() [0x4049ee]
Abort(1) on node 12: Internal error
/opt/intel/oneapi/mpi/2021.5.1//lib/release/libmpi.so.12(MPL_backtrace_show+0x1c) [0x7ff95a227bcc]
/opt/intel/oneapi/mpi/2021.5.1//lib/release/libmpi.so.12(MPIR_Assert_fail+0x21) [0x7ff959c01df1]
/opt/intel/oneapi/mpi/2021.5.1//lib/release/libmpi.so.12(+0x2b1eb9) [0x7ff9598d0eb9]
/opt/intel/oneapi/mpi/2021.5.1//lib/release/libmpi.so.12(+0x1a8c18) [0x7ff9597c7c18]
/opt/intel/oneapi/mpi/2021.5.1//lib/release/libmpi.so.12(+0x1717ec) [0x7ff9597907ec]
/opt/intel/oneapi/mpi/2021.5.1//lib/release/libmpi.so.12(+0x2b4387) [0x7ff9598d3387]
/opt/intel/oneapi/mpi/2021.5.1//lib/release/libmpi.so.12(PMPI_Allreduce+0x561) [0x7ff95972e6e1]
/opt/intel/oneapi/mpi/2021.5.1//lib/libmpifort.so.12(+0xdd95a) [0x7ff95af3195a]
/opt/intel/oneapi/mpi/2021.5.1//lib/libmpifort.so.12(mpi_allreduce_f08ts_+0x208) [0x7ff95ae9c758]
./cav_2d.exe() [0x45a512]
./cav_2d.exe() [0x408612]
./cav_2d.exe() [0x404ae2]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf3) [0x7ff9592c40b3]
./cav_2d.exe() [0x4049ee]
Abort(1) on node 18: Internal error
/opt/intel/oneapi/mpi/2021.5.1//lib/release/libmpi.so.12(MPL_backtrace_show+0x1c) [0x7f8f90d31bcc]
/opt/intel/oneapi/mpi/2021.5.1//lib/release/libmpi.so.12(MPIR_Assert_fail+0x21) [0x7f8f9070bdf1]
/opt/intel/oneapi/mpi/2021.5.1//lib/release/libmpi.so.12(+0x2b1eb9) [0x7f8f903daeb9]
/opt/intel/oneapi/mpi/2021.5.1//lib/release/libmpi.so.12(+0x1a8c18) [0x7f8f902d1c18]
/opt/intel/oneapi/mpi/2021.5.1//lib/release/libmpi.so.12(+0x1717ec) [0x7f8f9029a7ec]
/opt/intel/oneapi/mpi/2021.5.1//lib/release/libmpi.so.12(+0x2b4387) [0x7f8f903dd387]
/opt/intel/oneapi/mpi/2021.5.1//lib/release/libmpi.so.12(PMPI_Allreduce+0x561) [0x7f8f902386e1]
/opt/intel/oneapi/mpi/2021.5.1//lib/libmpifort.so.12(+0xdd95a) [0x7f8f91a3b95a]
/opt/intel/oneapi/mpi/2021.5.1//lib/libmpifort.so.12(mpi_allreduce_f08ts_+0x208) [0x7f8f919a6758]
./cav_2d.exe() [0x45a512]
./cav_2d.exe() [0x408612]
./cav_2d.exe() [0x404ae2]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf3) [0x7f8f8fdce0b3]
./cav_2d.exe() [0x4049ee]
Abort(1) on node 24: Internal error
/opt/intel/oneapi/mpi/2021.5.1//lib/release/libmpi.so.12(MPL_backtrace_show+0x1c) [0x7f0ae71b4bcc]
/opt/intel/oneapi/mpi/2021.5.1//lib/release/libmpi.so.12(MPL_backtrace_show+0x1c) [0x7ff9725e3bcc]
/opt/intel/oneapi/mpi/2021.5.1//lib/release/libmpi.so.12(MPIR_Assert_fail+0x21) [0x7ff971fbddf1]
/opt/intel/oneapi/mpi/2021.5.1//lib/release/libmpi.so.12(+0x2b1eb9) [0x7ff971c8ceb9]
/opt/intel/oneapi/mpi/2021.5.1//lib/release/libmpi.so.12(+0x1a8c18) [0x7ff971b83c18]
/opt/intel/oneapi/mpi/2021.5.1//lib/release/libmpi.so.12(+0x1717ec) [0x7ff971b4c7ec]
/opt/intel/oneapi/mpi/2021.5.1//lib/release/libmpi.so.12(+0x2b4387) [0x7ff971c8f387]
/opt/intel/oneapi/mpi/2021.5.1//lib/release/libmpi.so.12(PMPI_Allreduce+0x561) [0x7ff971aea6e1]
/opt/intel/oneapi/mpi/2021.5.1//lib/libmpifort.so.12(+0xdd95a) [0x7ff9732ed95a]
/opt/intel/oneapi/mpi/2021.5.1//lib/libmpifort.so.12(mpi_allreduce_f08ts_+0x208) [0x7ff973258758]
./cav_2d.exe() [0x45a512]
./cav_2d.exe() [0x408612]
./cav_2d.exe() [0x404ae2]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf3) [0x7ff9716800b3]
./cav_2d.exe() [0x4049ee]
Abort(1) on node 36: Internal error
/opt/intel/oneapi/mpi/2021.5.1//lib/release/libmpi.so.12(MPL_backtrace_show+0x1c) [0x7f86f181cbcc]
/opt/intel/oneapi/mpi/2021.5.1//lib/release/libmpi.so.12(MPIR_Assert_fail+0x21) [0x7f86f11f6df1]
/opt/intel/oneapi/mpi/2021.5.1//lib/release/libmpi.so.12(+0x2b1eb9) [0x7f86f0ec5eb9]
/opt/intel/oneapi/mpi/2021.5.1//lib/release/libmpi.so.12(+0x1a8c18) [0x7f86f0dbcc18]
/opt/intel/oneapi/mpi/2021.5.1//lib/release/libmpi.so.12(+0x1717ec) [0x7f86f0d857ec]
/opt/intel/oneapi/mpi/2021.5.1//lib/release/libmpi.so.12(+0x2b4387) [0x7f86f0ec8387]
/opt/intel/oneapi/mpi/2021.5.1//lib/release/libmpi.so.12(PMPI_Allreduce+0x561) [0x7f86f0d236e1]
/opt/intel/oneapi/mpi/2021.5.1//lib/libmpifort.so.12(+0xdd95a) [0x7f86f252695a]
/opt/intel/oneapi/mpi/2021.5.1//lib/libmpifort.so.12(mpi_allreduce_f08ts_+0x208) [0x7f86f2491758]
./cav_2d.exe() [0x45a512]
./cav_2d.exe() [0x408612]
./cav_2d.exe() [0x404ae2]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf3) [0x7f86f08b90b3]
./cav_2d.exe() [0x4049ee]
Abort(1) on node 42: Internal error
/opt/intel/oneapi/mpi/2021.5.1//lib/release/libmpi.so.12(MPL_backtrace_show+0x1c) [0x7f3e968dfbcc]
/opt/intel/oneapi/mpi/2021.5.1//lib/release/libmpi.so.12(MPIR_Assert_fail+0x21) [0x7f3e962b9df1]
/opt/intel/oneapi/mpi/2021.5.1//lib/release/libmpi.so.12(+0x2b1eb9) [0x7f3e95f88eb9]
/opt/intel/oneapi/mpi/2021.5.1//lib/release/libmpi.so.12(+0x1a8c18) [0x7f3e95e7fc18]
/opt/intel/oneapi/mpi/2021.5.1//lib/release/libmpi.so.12(+0x1717ec) [0x7f3e95e487ec]
/opt/intel/oneapi/mpi/2021.5.1//lib/release/libmpi.so.12(+0x2b4387) [0x7f3e95f8b387]
/opt/intel/oneapi/mpi/2021.5.1//lib/release/libmpi.so.12(PMPI_Allreduce+0x561) [0x7f3e95de66e1]
/opt/intel/oneapi/mpi/2021.5.1//lib/libmpifort.so.12(+0xdd95a) [0x7f3e975e995a]
/opt/intel/oneapi/mpi/2021.5.1//lib/libmpifort.so.12(mpi_allreduce_f08ts_+0x208) [0x7f3e97554758]
./cav_2d.exe() [0x45a512]
./cav_2d.exe() [0x408612]
./cav_2d.exe() [0x404ae2]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf3) [0x7f3e9597c0b3]
./cav_2d.exe() [0x4049ee]
Abort(1) on node 0: Internal error
/opt/intel/oneapi/mpi/2021.5.1//lib/release/libmpi.so.12(MPIR_Assert_fail+0x21) [0x7f0ae6b8edf1]
/opt/intel/oneapi/mpi/2021.5.1//lib/release/libmpi.so.12(+0x2b1eb9) [0x7f0ae685deb9]
/opt/intel/oneapi/mpi/2021.5.1//lib/release/libmpi.so.12(+0x1a8c18) [0x7f0ae6754c18]
/opt/intel/oneapi/mpi/2021.5.1//lib/release/libmpi.so.12(+0x1717ec) [0x7f0ae671d7ec]
/opt/intel/oneapi/mpi/2021.5.1//lib/release/libmpi.so.12(+0x2b4387) [0x7f0ae6860387]
/opt/intel/oneapi/mpi/2021.5.1//lib/release/libmpi.so.12(PMPI_Allreduce+0x561) [0x7f0ae66bb6e1]
/opt/intel/oneapi/mpi/2021.5.1//lib/libmpifort.so.12(+0xdd95a) [0x7f0ae7ebe95a]
/opt/intel/oneapi/mpi/2021.5.1//lib/libmpifort.so.12(mpi_allreduce_f08ts_+0x208) [0x7f0ae7e29758]
./cav_2d.exe() [0x45a512]
./cav_2d.exe() [0x408612]
./cav_2d.exe() [0x404ae2]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf3) [0x7f0ae62510b3]
./cav_2d.exe() [0x4049ee]
Abort(1) on node 30: Internal error

 

Please note that the above error is encounterd only occassionally, and particularly in hardwares employing newer 2nd generation Xeon processors.

 

I understand that the above information may not be sufficient to know what might be wrong. Please let me know what additional information I must provide in order to help me debug this.

 

Thanks in advance.

Labels (2)
0 Kudos
24 Replies
ombrophile
Beginner
1,276 Views

Hi,

 

Assuming that this thread still is active, is there any chance that the cause for the above-mentioned issue was figured out? If needed, do let me know if you need further information regarding the code.

0 Kudos
ombrophile
Beginner
1,151 Views

Hi,

 

I just wanted to inform that this issue still exists with the latest version of Intel compiler. Hence, it seems that the issue has not been sorted yet. So, kindly let me know of any workarounds for the time being.

0 Kudos
ShivaniK_Intel
Moderator
1,065 Views

Hi,


Thank you for your patience. The issue raised by you has been fixed in version 2023.2. If the issue persists with the new release, please start a new discussion thread in the community forum and we would investigate it further. 


Thanks & Regards

Shivani


ombrophile
Beginner
1,058 Views

Based on some preliminary tests, it does seem that this issue did not arise upon running with version 2023.2. Will let you know if I see anything otherwise. Thanks a lot.

0 Kudos
Reply