Community
cancel
Showing results for 
Search instead for 
Did you mean: 
Highlighted
121 Views

Assertion failed in ch4_shm_coll.c at line 2147

Hi, I am using Intel Compiler 2019 (icc 19.1.0.166 20191121) with Intel MPI Version 2019 Update 6 Build 20191024. I am getting the following error quite reproducibly, when running GPAW on 16-56 MPI tasks:

Assertion failed in file ../../src/mpid/ch4/src/intel/ch4_shm_coll.c at line 2147: comm->shm_numa_layout[my_numa_node].base_addr
Assertion failed in file ../../src/mpid/ch4/src/intel/ch4_shm_coll.c at line 2147: comm->shm_numa_layout[my_numa_node].base_addr
Assertion failed in file ../../src/mpid/ch4/src/intel/ch4_shm_coll.c at line 2147: comm->shm_numa_layout[my_numa_node].base_addr
/sw-pmpv/sdk/intel/intel_2020/compilers_and_libraries_2020.0.166/linux/mpi/intel64/lib/release/libmpi.so.12(MPL_backtrace_show+0x34) [0x2ae4e9f811d4]
/sw-pmpv/sdk/intel/intel_2020/compilers_and_libraries_2020.0.166/linux/mpi/intel64/lib/release/libmpi.so.12(MPIR_Assert_fail+0x21) [0x2ae4e9709031]
/sw-pmpv/sdk/intel/intel_2020/compilers_and_libraries_2020.0.166/linux/mpi/intel64/lib/release/libmpi.so.12(+0x285c1f) [0x2ae4e9883c1f]
/sw-pmpv/sdk/intel/intel_2020/compilers_and_libraries_2020.0.166/linux/mpi/intel64/lib/release/libmpi.so.12(+0x153487) [0x2ae4e9751487]
/sw-pmpv/sdk/intel/intel_2020/compilers_and_libraries_2020.0.166/linux/mpi/intel64/lib/release/libmpi.so.12(+0x199bda) [0x2ae4e9797bda]
/sw-pmpv/sdk/intel/intel_2020/compilers_and_libraries_2020.0.166/linux/mpi/intel64/lib/release/libmpi.so.12(+0x180069) [0x2ae4e977e069]
/sw-pmpv/sdk/intel/intel_2020/compilers_and_libraries_2020.0.166/linux/mpi/intel64/lib/release/libmpi.so.12(+0x16bd8e) [0x2ae4e9769d8e]
 

I would be grateful for any advice.

Best regards,

Michal Krompiec

0 Kudos
4 Replies
Highlighted
121 Views

More details, from a run with I_MPI_DEBUG=5:

[0] MPI startup(): libfabric version: 1.9.0a1-impi
[0] MPI startup(): libfabric provider: tcp;ofi_rxm
[0] MPI startup(): Rank    Pid      Node name   Pin cpu
[0] MPI startup(): 0       52103    deda1x1717  0
[0] MPI startup(): 1       52104    deda1x1717  1
[0] MPI startup(): 2       52105    deda1x1717  2
[0] MPI startup(): 3       52106    deda1x1717  3
[0] MPI startup(): 4       52107    deda1x1717  4
[0] MPI startup(): 5       52108    deda1x1717  5
[0] MPI startup(): 6       52109    deda1x1717  6
[0] MPI startup(): 7       52110    deda1x1717  7
[0] MPI startup(): 8       52111    deda1x1717  8
[0] MPI startup(): 9       52112    deda1x1717  9
[0] MPI startup(): 10      52113    deda1x1717  28
[0] MPI startup(): 11      52114    deda1x1717  29
[0] MPI startup(): 12      52115    deda1x1717  30
[0] MPI startup(): 13      52116    deda1x1717  31
[0] MPI startup(): 14      52117    deda1x1717  32
[0] MPI startup(): 15      52118    deda1x1717  33
[0] MPI startup(): 16      52119    deda1x1717  34
[0] MPI startup(): 17      52120    deda1x1717  35
[0] MPI startup(): 18      52121    deda1x1717  52
[0] MPI startup(): 19      52122    deda1x1717  53
[0] MPI startup(): I_MPI_ROOT=/sw-pmpv/sdk/intel/intel_2020/compilers_and_libraries_2020.0.166/linux/mpi
[0] MPI startup(): I_MPI_HYDRA_TOPOLIB=ipl
[0] MPI startup(): I_MPI_INTERNAL_MEM_POLICY=default
[0] MPI startup(): I_MPI_DEBUG=5
 

[GPAW output]

Assertion failed in file ../../src/mpid/ch4/src/intel/ch4_shm_coll.c at line 2147: comm->shm_numa_layout[my_numa_node].base_addr
/sw-pmpv/sdk/intel/intel_2020/compilers_and_libraries_2020.0.166/linux/mpi/intel64/lib/release/libmpi.so.12(MPL_backtrace_show+0x34) [0x2b3bd59861d4]
/sw-pmpv/sdk/intel/intel_2020/compilers_and_libraries_2020.0.166/linux/mpi/intel64/lib/release/libmpi.so.12(MPIR_Assert_fail+0x21) [0x2b3bd510e031]
/sw-pmpv/sdk/intel/intel_2020/compilers_and_libraries_2020.0.166/linux/mpi/intel64/lib/release/libmpi.so.12(+0x285c1f) [0x2b3bd5288c1f]
/sw-pmpv/sdk/intel/intel_2020/compilers_and_libraries_2020.0.166/linux/mpi/intel64/lib/release/libmpi.so.12(+0x153487) [0x2b3bd5156487]
/sw-pmpv/sdk/intel/intel_2020/compilers_and_libraries_2020.0.166/linux/mpi/intel64/lib/release/libmpi.so.12(+0x199bda) [0x2b3bd519cbda]
/sw-pmpv/sdk/intel/intel_2020/compilers_and_libraries_2020.0.166/linux/mpi/intel64/lib/release/libmpi.so.12(+0x180069) [0x2b3bd5183069]
/sw-pmpv/sdk/intel/intel_2020/compilers_and_libraries_2020.0.166/linux/mpi/intel64/lib/release/libmpi.so.12(+0x16bd8e) [0x2b3bd516ed8e]
/sw-pmpv/sdk/intel/intel_2020/compilers_and_
Abort(1) on node 14: Internal error
 

0 Kudos
Highlighted
121 Views

Updating Intel MPI to 2019 update 7 solves the problem.

0 Kudos
Highlighted
Moderator
121 Views

Hi Michal,

Thanks for reaching out to us.

Glad to know that your issue is resolved!

Since your issue is resolved, could you please let us know if we can close this thread?

Have a good day!!

 

Regards

Goutham

 

0 Kudos
Highlighted
Moderator
121 Views

Hi Michal,

As your issue is resolved, we are closing this thread.

Feel free to raise a new thread in case of any further issues. 

 

 

Thanks & Regards

Goutham

0 Kudos