- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi, I am using Intel Compiler 2019 (icc 19.1.0.166 20191121) with Intel MPI Version 2019 Update 6 Build 20191024. I am getting the following error quite reproducibly, when running GPAW on 16-56 MPI tasks:
Assertion failed in file ../../src/mpid/ch4/src/intel/ch4_shm_coll.c at line 2147: comm->shm_numa_layout[my_numa_node].base_addr
Assertion failed in file ../../src/mpid/ch4/src/intel/ch4_shm_coll.c at line 2147: comm->shm_numa_layout[my_numa_node].base_addr
Assertion failed in file ../../src/mpid/ch4/src/intel/ch4_shm_coll.c at line 2147: comm->shm_numa_layout[my_numa_node].base_addr
/sw-pmpv/sdk/intel/intel_2020/compilers_and_libraries_2020.0.166/linux/mpi/intel64/lib/release/libmpi.so.12(MPL_backtrace_show+0x34) [0x2ae4e9f811d4]
/sw-pmpv/sdk/intel/intel_2020/compilers_and_libraries_2020.0.166/linux/mpi/intel64/lib/release/libmpi.so.12(MPIR_Assert_fail+0x21) [0x2ae4e9709031]
/sw-pmpv/sdk/intel/intel_2020/compilers_and_libraries_2020.0.166/linux/mpi/intel64/lib/release/libmpi.so.12(+0x285c1f) [0x2ae4e9883c1f]
/sw-pmpv/sdk/intel/intel_2020/compilers_and_libraries_2020.0.166/linux/mpi/intel64/lib/release/libmpi.so.12(+0x153487) [0x2ae4e9751487]
/sw-pmpv/sdk/intel/intel_2020/compilers_and_libraries_2020.0.166/linux/mpi/intel64/lib/release/libmpi.so.12(+0x199bda) [0x2ae4e9797bda]
/sw-pmpv/sdk/intel/intel_2020/compilers_and_libraries_2020.0.166/linux/mpi/intel64/lib/release/libmpi.so.12(+0x180069) [0x2ae4e977e069]
/sw-pmpv/sdk/intel/intel_2020/compilers_and_libraries_2020.0.166/linux/mpi/intel64/lib/release/libmpi.so.12(+0x16bd8e) [0x2ae4e9769d8e]
I would be grateful for any advice.
Best regards,
Michal Krompiec
- Tags:
- Cluster Computing
- General Support
- Intel® Cluster Ready
- Message Passing Interface (MPI)
- Parallel Computing
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
More details, from a run with I_MPI_DEBUG=5:
[0] MPI startup(): libfabric version: 1.9.0a1-impi
[0] MPI startup(): libfabric provider: tcp;ofi_rxm
[0] MPI startup(): Rank Pid Node name Pin cpu
[0] MPI startup(): 0 52103 deda1x1717 0
[0] MPI startup(): 1 52104 deda1x1717 1
[0] MPI startup(): 2 52105 deda1x1717 2
[0] MPI startup(): 3 52106 deda1x1717 3
[0] MPI startup(): 4 52107 deda1x1717 4
[0] MPI startup(): 5 52108 deda1x1717 5
[0] MPI startup(): 6 52109 deda1x1717 6
[0] MPI startup(): 7 52110 deda1x1717 7
[0] MPI startup(): 8 52111 deda1x1717 8
[0] MPI startup(): 9 52112 deda1x1717 9
[0] MPI startup(): 10 52113 deda1x1717 28
[0] MPI startup(): 11 52114 deda1x1717 29
[0] MPI startup(): 12 52115 deda1x1717 30
[0] MPI startup(): 13 52116 deda1x1717 31
[0] MPI startup(): 14 52117 deda1x1717 32
[0] MPI startup(): 15 52118 deda1x1717 33
[0] MPI startup(): 16 52119 deda1x1717 34
[0] MPI startup(): 17 52120 deda1x1717 35
[0] MPI startup(): 18 52121 deda1x1717 52
[0] MPI startup(): 19 52122 deda1x1717 53
[0] MPI startup(): I_MPI_ROOT=/sw-pmpv/sdk/intel/intel_2020/compilers_and_libraries_2020.0.166/linux/mpi
[0] MPI startup(): I_MPI_HYDRA_TOPOLIB=ipl
[0] MPI startup(): I_MPI_INTERNAL_MEM_POLICY=default
[0] MPI startup(): I_MPI_DEBUG=5
[GPAW output]
Assertion failed in file ../../src/mpid/ch4/src/intel/ch4_shm_coll.c at line 2147: comm->shm_numa_layout[my_numa_node].base_addr
/sw-pmpv/sdk/intel/intel_2020/compilers_and_libraries_2020.0.166/linux/mpi/intel64/lib/release/libmpi.so.12(MPL_backtrace_show+0x34) [0x2b3bd59861d4]
/sw-pmpv/sdk/intel/intel_2020/compilers_and_libraries_2020.0.166/linux/mpi/intel64/lib/release/libmpi.so.12(MPIR_Assert_fail+0x21) [0x2b3bd510e031]
/sw-pmpv/sdk/intel/intel_2020/compilers_and_libraries_2020.0.166/linux/mpi/intel64/lib/release/libmpi.so.12(+0x285c1f) [0x2b3bd5288c1f]
/sw-pmpv/sdk/intel/intel_2020/compilers_and_libraries_2020.0.166/linux/mpi/intel64/lib/release/libmpi.so.12(+0x153487) [0x2b3bd5156487]
/sw-pmpv/sdk/intel/intel_2020/compilers_and_libraries_2020.0.166/linux/mpi/intel64/lib/release/libmpi.so.12(+0x199bda) [0x2b3bd519cbda]
/sw-pmpv/sdk/intel/intel_2020/compilers_and_libraries_2020.0.166/linux/mpi/intel64/lib/release/libmpi.so.12(+0x180069) [0x2b3bd5183069]
/sw-pmpv/sdk/intel/intel_2020/compilers_and_libraries_2020.0.166/linux/mpi/intel64/lib/release/libmpi.so.12(+0x16bd8e) [0x2b3bd516ed8e]
/sw-pmpv/sdk/intel/intel_2020/compilers_and_
Abort(1) on node 14: Internal error
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Updating Intel MPI to 2019 update 7 solves the problem.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Michal,
Thanks for reaching out to us.
Glad to know that your issue is resolved!
Since your issue is resolved, could you please let us know if we can close this thread?
Have a good day!!
Regards
Goutham
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Michal,
As your issue is resolved, we are closing this thread.
Feel free to raise a new thread in case of any further issues.
Thanks & Regards
Goutham

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page