- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi, I recently installed intel-hpckit and it worked well with some codes. But now I face an assert issue when I try to run BerkeleyGW.
I tried to use different CPUs (4, 9, 16, 19) and get the same error.
Assertion failed in file ../../src/mpid/ch4/src/intel/ch4_shm_coll.c at line 2255: comm->shm_numa_layout[my_numa_node].base_addr
Assertion failed in file ../../src/mpid/ch4/src/intel/ch4_shm_coll.c at line 2255: comm->shm_numa_layout[my_numa_node].base_addr
/opt/intel/oneapi/mpi/2021.11/lib/libmpi.so.12(MPL_backtrace_show+0x1c) [0x7f2aab0e306c]
/opt/intel/oneapi/mpi/2021.11/lib/libmpi.so.12(MPIR_Assert_fail+0x21) [0x7f2aaaa8cf01]
/opt/intel/oneapi/mpi/2021.11/lib/libmpi.so.12(+0x38e694) [0x7f2aaa7c7694]
/opt/intel/oneapi/mpi/2021.11/lib/libmpi.so.12(+0x221c66) [0x7f2aaa65ac66]
/opt/intel/oneapi/mpi/2021.11/lib/libmpi.so.12(+0x256d8c) [0x7f2aaa68fd8c]
/opt/intel/oneapi/mpi/2021.11/lib/libmpi.so.12(+0x26d930) [0x7f2aaa6a6930]
/opt/intel/oneapi/mpi/2021.11/lib/libmpi.so.12(+0x23e7a1) [0x7f2aaa6777a1]
/opt/intel/oneapi/mpi/2021.11/lib/libmpi.so.12(MPL_backtrace_show+0x1c) [0x7fddb4e2406c]
/opt/intel/oneapi/mpi/2021.11/lib/libmpi.so.12(MPIR_Assert_fail+0x21) [0x7fddb47cdf01]
/opt/intel/oneapi/mpi/2021.11/lib/libmpi.so.12(+0x38e694) [0x7fddb4508694]
/opt/intel/oneapi/mpi/2021.11/lib/libmpi.so.12(+0x221c66) [0x7fddb439bc66]
/opt/intel/oneapi/mpi/2021.11/lib/libmpi.so.12(+0x256d8c) [0x7fddb43d0d8c]
/opt/intel/oneapi/mpi/2021.11/lib/libmpi.so.12(+0x26d930) [0x7fddb43e7930]
/opt/intel/oneapi/mpi/2021.11/lib/libmpi.so.12(+0x23e7a1) [0x7fddb43b87a1]
/opt/intel/oneapi/mpi/2021.11/lib/libmpi.so.12(+0x21cce3) [0x7fddb4396ce3]
/opt/intel/oneapi/mpi/2021.11/lib/libmpi.so.12(+0x392daa) [0x7fddb450cdaa]
/opt/intel/oneapi/mpi/2021.11/lib/libmpi.so.12(MPI_Bcast+0x417) [0x7fddb42fd917]
epsilon.cplx.x() [0x2f13112]
epsilon.cplx.x() [0x6550c7]
epsilon.cplx.x() [0x5e1cc6]
epsilon.cplx.x() [0x5ad5a8]
epsilon.cplx.x() [0x436ae4]
epsilon.cplx.x() [0x504e66]
epsilon.cplx.x() [0x4f6a04]
epsilon.cplx.x() [0x409922]
/opt/intel/oneapi/mpi/2021.11/lib/libmpi.so.12(+0x21cce3) [0x7f2aaa655ce3]
/opt/intel/oneapi/mpi/2021.11/lib/libmpi.so.12(+0x392daa) [0x7f2aaa7cbdaa]
/opt/intel/oneapi/mpi/2021.11/lib/libmpi.so.12(MPI_Bcast+0x417) [0x7f2aaa5bc917]
epsilon.cplx.x() [0x2f13112]
epsilon.cplx.x() [0x6550c7]
epsilon.cplx.x() [0x5e1cc6]
epsilon.cplx.x() [0x5ad5a8]
epsilon.cplx.x() [0x436ae4]
epsilon.cplx.x() [0x504e66]
epsilon.cplx.x() [0x4f6a04]
epsilon.cplx.x() [0x409922]
/lib64/libc.so.6(__libc_start_main+0xe5) [0x7f2aa985bd85]
epsilon.cplx.x() [0x409829]
/lib64/libc.so.6(__libc_start_main+0xe5) [0x7fddb359cd85]
epsilon.cplx.x() [0x409829]
Abort(1) on node 2: Internal error
Abort(1) on node 0: Internal error
Below is the CPU information and OS details:
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 20
On-line CPU(s) list: 0-19
Thread(s) per core: 1
Core(s) per socket: 12
Socket(s): 1
NUMA node(s): 1
Vendor ID: GenuineIntel
BIOS Vendor ID: Intel(R) Corporation
CPU family: 6
Model: 151
Model name: 12th Gen Intel(R) Core(TM) i7-12700
BIOS Model name: 12th Gen Intel(R) Core(TM) i7-12700
Stepping: 2
CPU MHz: 2100.000
CPU max MHz: 4900.0000
CPU min MHz: 800.0000
BogoMIPS: 4224.00
Virtualization: VT-x
L1d cache: 48K
L1i cache: 32K
L2 cache: 1280K
L3 cache: 25600K
NUMA node0 CPU(s): 0-19
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb invpcid_single ssbd ibrs ibpb stibp ibrs_enhanced tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid rdseed adx smap clflushopt clwb intel_pt sha_ni xsaveopt xsavec xgetbv1 xsaves split_lock_detect avx_vnni dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp hwp_pkg_req hfi umip pku ospke waitpkg gfni vaes vpclmulqdq tme rdpid movdiri movdir64b fsrm md_clear serialize pconfig arch_lbr flush_l1d arch_capabilities
NAME="Rocky Linux"
VERSION="8.8 (Green Obsidian)"
ID="rocky"
ID_LIKE="rhel centos fedora"
VERSION_ID="8.8"
PLATFORM_ID="platform:el8"
PRETTY_NAME="Rocky Linux 8.8 (Green Obsidian)"
ANSI_COLOR="0;32"
LOGO="fedora-logo-icon"
CPE_NAME="cpe:/o:rocky:rocky:8:GA"
HOME_URL="https://rockylinux.org/"
BUG_REPORT_URL="https://bugs.rockylinux.org/"
SUPPORT_END="2029-05-31"
ROCKY_SUPPORT_PRODUCT="Rocky-Linux-8"
ROCKY_SUPPORT_PRODUCT_VERSION="8.8"
REDHAT_SUPPORT_PRODUCT="Rocky Linux"
REDHAT_SUPPORT_PRODUCT_VERSION="8.8"
I saw some similar problems and tried some solutions such as updating the Toolkit (sudo dnf upgrade intel-hpckit) and using the command I_MPI_SHM_HEAP_VSIZE=0 but I still get the same error.
Thanks, and regards,
Ragab
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Thanks for posting in Intel communities!
Kindly, provide the detailed recreation steps for this issue.
Regards,
Veena
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Veena,
Thank you for your reply.
I am trying to do GW calculations using Quantum Espresso and BerkeleyGW codes. To do so, I executed the following commands:
1- mpirun -np 16 pw.x < scf.in > scf.out
2- mpirun -np 16 pw.x < WFN.in > WFN.out
2- mpirun -np 16 pw.x < WFNq.in > WFNq.out
2- mpirun -np 16 pw.x < WFN_co.in > WFN_co.out
2- mpirun -np 16 epsilon.cplx.x < epsilon.inp > epsilon.out
2- mpirun -np 16 sigma.cplx.x < sigma.inp > sigma.out
All steps went correctly except the last two, which gave the Assertion failure mentioned above after a few iterations. I have attached the epsilon.out file.
Note:
- The epsilon and sigma output files contain the following error:
WARNING: The number of cpus does not divide evenly in the optimal number of pools.
1cpus are doing no work
- The epsilon and sigma calculations went as expected when I used one CPU:
mpirun -np 1 epsilon.cplx.x < epsilon.inp > epsilon.out
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The calculation crashes after quite some time, please make sure that you are not just running out of memory. You can have a look at the kernel logs and check for OOM killer entries.
Please also reach out to the developers of the codes who may be helpful with triaging your issue. With the information you provided we cannot do much here.
Best
Tobias
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page