- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I'm getting an assert from the Intel MPI library (2021.6.0) as follows:
Assertion failed in file ../../src/mpid/ch4/src/intel/ch4_shm_coll.c at line 2266: comm->shm_numa_layout[my_numa_node].base_addr
Assertion failed in file ../../src/mpid/ch4/src/intel/ch4_shm_coll.c at line 2266: comm->shm_numa_layout[my_numa_node].base_addr
Assertion failed in file ../../src/mpid/ch4/src/intel/ch4_shm_coll.c at line 2266: comm->shm_numa_layout[my_numa_node].base_addr
Assertion failed in file ../../src/mpid/ch4/src/intel/ch4_shm_coll.c at line 2266: comm->shm_numa_layout[my_numa_node].base_addr
/apps/intel-mpi/2021.5.1/lib/release/libmpi.so.12(MPL_backtrace_show+0x1c) [0x151d52c6abcc]
/apps/intel-mpi/2021.5.1/lib/release/libmpi.so.12(MPIR_Assert_fail+0x21) [0x151d52644df1]
/apps/intel-mpi/2021.5.1/lib/release/libmpi.so.12(MPL_backtrace_show+0x1c) [0x14e4f01f8bcc]
/apps/intel-mpi/2021.5.1/lib/release/libmpi.so.12(MPL_backtrace_show+0x1c) [0x14cb51d87bcc]
/apps/intel-mpi/2021.5.1/lib/release/libmpi.so.12(+0x2b1eb9) [0x151d52313eb9]
/apps/intel-mpi/2021.5.1/lib/release/libmpi.so.12(MPIR_Assert_fail+0x21) [0x14e4efbd2df1]
/apps/intel-mpi/2021.5.1/lib/release/libmpi.so.12(+0x176602) [0x151d521d8602]
/apps/intel-mpi/2021.5.1/lib/release/libmpi.so.12(+0x2b1eb9) [0x14e4ef8a1eb9]
/apps/intel-mpi/2021.5.1/lib/release/libmpi.so.12(MPIR_Assert_fail+0x21) [0x14cb51761df1]
/apps/intel-mpi/2021.5.1/lib/release/libmpi.so.12(+0x1ab82d) [0x151d5220d82d]
/apps/intel-mpi/2021.5.1/lib/release/libmpi.so.12(+0x176602) [0x14e4ef766602]
/apps/intel-mpi/2021.5.1/lib/release/libmpi.so.12(+0x19d1cc) [0x151d521ff1cc]
/apps/intel-mpi/2021.5.1/lib/release/libmpi.so.12(+0x1ab82d) [0x14e4ef79b82d]
/apps/intel-mpi/2021.5.1/lib/release/libmpi.so.12(+0x2b1eb9) [0x14cb51430eb9]
/apps/intel-mpi/2021.5.1/lib/release/libmpi.so.12(+0x1717ec) [0x151d521d37ec]
/apps/intel-mpi/2021.5.1/lib/release/libmpi.so.12(+0x19d1cc) [0x14e4ef78d1cc]
/apps/intel-mpi/2021.5.1/lib/release/libmpi.so.12(+0x176602) [0x14cb512f5602]
/apps/intel-mpi/2021.5.1/lib/release/libmpi.so.12(+0x2b389f) [0x151d5231589f]
/apps/intel-mpi/2021.5.1/lib/release/libmpi.so.12(+0x1717ec) [0x14e4ef7617ec]
/apps/intel-mpi/2021.5.1/lib/release/libmpi.so.12(+0x1ab82d) [0x14cb5132a82d]
/apps/intel-mpi/2021.5.1/lib/release/libmpi.so.12(+0x6d8895) [0x151d5273a895]
/apps/intel-mpi/2021.5.1/lib/release/libmpi.so.12(+0x2b389f) [0x14e4ef8a389f]
/apps/intel-mpi/2021.5.1/lib/release/libmpi.so.12(+0x6d7c10) [0x151d52739c10]
/apps/intel-mpi/2021.5.1/lib/release/libmpi.so.12(+0x6d8895) [0x14e4efcc8895]
/apps/intel-mpi/2021.5.1/lib/release/libmpi.so.12(MPL_backtrace_show+0x1c) [0x15145f730bcc]
/apps/intel-mpi/2021.5.1/lib/release/libmpi.so.12(+0x19d1cc) [0x14cb5131c1cc]
/apps/intel-mpi/2021.5.1/lib/release/libmpi.so.12(+0x1717ec) [0x14cb512f07ec]
/apps/intel-mpi/2021.5.1/lib/release/libmpi.so.12(+0x6d7c10) [0x14e4efcc7c10]
/apps/intel-mpi/2021.5.1/lib/release/libmpi.so.12(+0x29457d) [0x14e4ef88457d]
/apps/intel-mpi/2021.5.1/lib/release/libmpi.so.12(+0x2b389f) [0x14cb5143289f]
/apps/intel-mpi/2021.5.1/lib/release/libmpi.so.12(MPIR_Assert_fail+0x21) [0x15145f10adf1]
/apps/intel-mpi/2021.5.1/lib/release/libmpi.so.12(+0x2b2b52) [0x14e4ef8a2b52]
/apps/intel-mpi/2021.5.1/lib/release/libmpi.so.12(+0x6d8895) [0x14cb51857895]
/apps/intel-mpi/2021.5.1/lib/release/libmpi.so.12(MPI_Win_create+0x3dc) [0x14e4efe969dc]
/apps/intel-mpi/2021.5.1/lib/release/libmpi.so.12(+0x6d7c10) [0x14cb51856c10]
Which seems to be originating from the MPI_Win_create call. I'm not exactly sure what's triggering it but seems to be related to creating Windows that do not expose any memory. i.e. there doesn't seem to be any limit to creating Windows with distinct pointers and sizes but a few hundred of something that does not expose memory triggers the above.
So, firstly are the following valid in Intel MPI (they are with OpenMPI)?:
- MPI_Win_create(NULL, 0, sizeof(int) ....)
- MPI_Win_create(&dummy, 0, sizeof(int) ...)
- MPI_Win_create(&dummy, 1*sizeof(int), sizeof(int) ...)
Where dummy is defined as a global int.
For cases where a process does not expose any memory, I am using 1. (I've tried the constant MPI_BOTTOM but get the same assert), which causes the assert. So then I tried 2., to trick it but that also trips up, so I am now finally using 3. which does not cause the asserts but doesn't seem right
Thanks for any advice
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Thank you for posting in Intel Communities.
Could you please provide the following details to investigate more on your issue?
1. OS details and CPU details.
2. Complete reproducer code and steps to reproduce your issue?
3. MPI Library version(2021.6 /2021.5.1). You can find the MPI Library version using the below Command:
mpirun --version
4. Provide the complete debug log using the below command:
I_MPI_DEBUG=10 mpirun -n <num of processess> -ppn <process per node>./a.out
Thanks & Regards,
Hemanth
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Hemanth,
This will do it:
#include <stdlib.h>
#include <stdio.h>
#include <mpi.h>
#define NVARS 600
int main(int argc, char **argv)
{
int i;
MPI_Win *mpi_wins;
MPI_Init(&argc, &argv);
mpi_wins = (MPI_Win *)malloc(NVARS * sizeof(MPI_Win));
for (i=0; i<NVARS; i++)
MPI_Win_create(NULL, 0, sizeof(int), MPI_INFO_NULL,
MPI_COMM_WORLD, &mpi_wins[i]);
MPI_Finalize();
return(0);
}
[ffr599@gadi-login-08 ems-sim-w2w]$ uname -a
Linux gadi-login-08.gadi.nci.org.au 4.18.0-348.20.1.el8.nci.x86_64 #1 SMP Wed Mar 16 11:37:35 AEDT 2022 x86_64 x86_64 x86_64 GNU/Linux
Running on this HPC cluster: https://nci.org.au/our-systems/hpc-systems
OS: Rocky Linux 8
CPU: Intel(R) Xeon(R) Platinum 8268 CPU @ 2.90GHz
[ffr599@gadi-login-08 ems-sim-w2w]$ icc --version
icc (ICC) 2021.6.0 20220226
Copyright (C) 1985-2022 Intel Corporation. All rights reserved.
[ffr599@gadi-login-08 ems-sim-w2w]$ mpirun --version
Intel(R) MPI Library for Linux* OS, Version 2021.6 Build 20220227 (id: 28877f3f32)
Copyright 2003-2022, Intel Corporation.
Output of the following command:
I_MPI_DEBUG=10 mpirun -np 4 ./test_win
[0] MPI startup(): Intel(R) MPI Library, Version 2021.6 Build 20220227 (id: 28877f3f32)
[0] MPI startup(): Copyright (C) 2003-2022 Intel Corporation. All rights reserved.
[0] MPI startup(): library kind: release
[0] MPI startup(): shm segment size (1452 MB per rank) * (4 local ranks) = 5811 MB total
[0] MPI startup(): libfabric version: 1.13.2rc1-impi
[0] MPI startup(): max number of MPI_Request per vci: 67108864 (pools: 1)
[0] MPI startup(): libfabric provider: mlx
[0] MPI startup(): File "/apps/intel-mpi/2021.6.0/etc/tuning_clx-ap_shm-ofi_mlx_100.dat" not found
[0] MPI startup(): Load tuning file: "/apps/intel-mpi/2021.6.0/etc/tuning_clx-ap_shm-ofi.dat"
[0] MPI startup(): threading: mode: direct
[0] MPI startup(): threading: vcis: 1
[0] MPI startup(): threading: app_threads: -1
[0] MPI startup(): threading: runtime: generic
[0] MPI startup(): threading: progress_threads: 0
[0] MPI startup(): threading: async_progress: 0
[0] MPI startup(): threading: lock_level: global
[0] MPI startup(): tag bits available: 20 (TAG_UB value: 1048575)
[0] MPI startup(): source bits available: 21 (Maximal number of rank: 2097151)
[0] MPI startup(): Rank Pid Node name Pin cpu
[0] MPI startup(): 0 2184546 gadi-login-08.gadi.nci.org.au {0,1,2,3,7,8,12,13,14,18,19,20}
[0] MPI startup(): 1 2184547 gadi-login-08.gadi.nci.org.au {4,5,6,9,10,11,15,16,17,21,22,23}
[0] MPI startup(): 2 2184548 gadi-login-08.gadi.nci.org.au {24,25,26,27,31,32,36,37,38,42,43,44}
[0] MPI startup(): 3 2184549 gadi-login-08.gadi.nci.org.au {28,29,30,33,34,35,39,40,41,45,46,47}
[0] MPI startup(): I_MPI_LIBRARY_KIND=release
[0] MPI startup(): I_MPI_CC=icc
[0] MPI startup(): I_MPI_CXX=icpc
[0] MPI startup(): I_MPI_F90=ifort
[0] MPI startup(): I_MPI_F77=ifort
[0] MPI startup(): I_MPI_ROOT=/apps/intel-mpi/2021.6.0
[0] MPI startup(): I_MPI_LINK=opt
[0] MPI startup(): I_MPI_MPIRUN=mpirun
[0] MPI startup(): I_MPI_HYDRA_BOOTSTRAP_EXEC=/opt/pbs/default/bin/pbs_tmrsh
[0] MPI startup(): I_MPI_HYDRA_TOPOLIB=hwloc
[0] MPI startup(): I_MPI_HYDRA_BRANCH_COUNT=0
[0] MPI startup(): I_MPI_HYDRA_BOOTSTRAP=rsh
[0] MPI startup(): I_MPI_INTERNAL_MEM_POLICY=default
[0] MPI startup(): I_MPI_DEBUG=10
Assertion failed in file ../../src/mpid/ch4/src/intel/ch4_shm_coll.c at line 2279: comm->shm_numa_layout[my_numa_node].base_addr
Assertion failed in file ../../src/mpid/ch4/src/intel/ch4_shm_coll.c at line 2279: comm->shm_numa_layout[my_numa_node].base_addr
Assertion failed in file ../../src/mpid/ch4/src/intel/ch4_shm_coll.c at line 2279: comm->shm_numa_layout[my_numa_node].base_addr
Assertion failed in file ../../src/mpid/ch4/src/intel/ch4_shm_coll.c at line 2279: comm->shm_numa_layout[my_numa_node].base_addr
/apps/intel-mpi/2021.6.0/lib/release/libmpi.so.12(MPL_backtrace_show+0x1c) [0x7f938647952c]
/apps/intel-mpi/2021.6.0/lib/release/libmpi.so.12(MPIR_Assert_fail+0x21) [0x7f9385dfdc91]
/apps/intel-mpi/2021.6.0/lib/release/libmpi.so.12(+0x264cc6) [0x7f9385b38cc6]
/apps/intel-mpi/2021.6.0/lib/release/libmpi.so.12(+0x16a7a2) [0x7f9385a3e7a2]
/apps/intel-mpi/2021.6.0/lib/release/libmpi.so.12(+0x19e9cd) [0x7f9385a729cd]
/apps/intel-mpi/2021.6.0/lib/release/libmpi.so.12(+0x190598) [0x7f9385a64598]
/apps/intel-mpi/2021.6.0/lib/release/libmpi.so.12(+0x165780) [0x7f9385a39780]
/apps/intel-mpi/2021.6.0/lib/release/libmpi.so.12(+0x26672f) [0x7f9385b3a72f]
/apps/intel-mpi/2021.6.0/lib/release/libmpi.so.12(+0x632df8) [0x7f9385f06df8]
/apps/intel-mpi/2021.6.0/lib/release/libmpi.so.12(+0x247467) [0x7f9385b1b467]
/apps/intel-mpi/2021.6.0/lib/release/libmpi.so.12(+0x2659e2) [0x7f9385b399e2]
/apps/intel-mpi/2021.6.0/lib/release/libmpi.so.12(MPI_Win_create+0x3dc) [0x7f9386070a0c]
./test_win() [0x400f14]
/lib64/libc.so.6(__libc_start_main+0xf3) [0x7f93847ef493]
./test_win() [0x400dde]
/apps/intel-mpi/2021.6.0/lib/release/libmpi.so.12(MPL_backtrace_show+0x1c) [0x7fb745c4e52c]
/apps/intel-mpi/2021.6.0/lib/release/libmpi.so.12(MPIR_Assert_fail+0x21) [0x7fb7455d2c91]
/apps/intel-mpi/2021.6.0/lib/release/libmpi.so.12(+0x264cc6) [0x7fb74530dcc6]
/apps/intel-mpi/2021.6.0/lib/release/libmpi.so.12(+0x16a7a2) [0x7fb7452137a2]
/apps/intel-mpi/2021.6.0/lib/release/libmpi.so.12(+0x19e9cd) [0x7fb7452479cd]
/apps/intel-mpi/2021.6.0/lib/release/libmpi.so.12(+0x190598) [0x7fb745239598]
/apps/intel-mpi/2021.6.0/lib/release/libmpi.so.12(+0x165780) [0x7fb74520e780]
/apps/intel-mpi/2021.6.0/lib/release/libmpi.so.12(+0x26672f) [0x7fb74530f72f]
/apps/intel-mpi/2021.6.0/lib/release/libmpi.so.12(+0x632df8) [0x7fb7456dbdf8]
/apps/intel-mpi/2021.6.0/lib/release/libmpi.so.12(+0x247467) [0x7fb7452f0467]
/apps/intel-mpi/2021.6.0/lib/release/libmpi.so.12(+0x2659e2) [0x7fb74530e9e2]
/apps/intel-mpi/2021.6.0/lib/release/libmpi.so.12(MPI_Win_create+0x3dc) [0x7fb745845a0c]
./test_win() [0x400f14]
/lib64/libc.so.6(__libc_start_main+0xf3) [0x7fb743fc4493]
./test_win() [0x400dde]
Abort(1) on node 3: Internal error
Abort(1) on node 0: Internal error
/apps/intel-mpi/2021.6.0/lib/release/libmpi.so.12(MPL_backtrace_show+0x1c) [0x7f9d4dcda52c]
/apps/intel-mpi/2021.6.0/lib/release/libmpi.so.12(MPIR_Assert_fail+0x21) [0x7f9d4d65ec91]
/apps/intel-mpi/2021.6.0/lib/release/libmpi.so.12(+0x264cc6) [0x7f9d4d399cc6]
/apps/intel-mpi/2021.6.0/lib/release/libmpi.so.12(+0x16a7a2) [0x7f9d4d29f7a2]
/apps/intel-mpi/2021.6.0/lib/release/libmpi.so.12(+0x19e9cd) [0x7f9d4d2d39cd]
/apps/intel-mpi/2021.6.0/lib/release/libmpi.so.12(+0x190598) [0x7f9d4d2c5598]
/apps/intel-mpi/2021.6.0/lib/release/libmpi.so.12(+0x165780) [0x7f9d4d29a780]
/apps/intel-mpi/2021.6.0/lib/release/libmpi.so.12(+0x26672f) [0x7f9d4d39b72f]
/apps/intel-mpi/2021.6.0/lib/release/libmpi.so.12(+0x632df8) [0x7f9d4d767df8]
/apps/intel-mpi/2021.6.0/lib/release/libmpi.so.12(+0x247467) [0x7f9d4d37c467]
/apps/intel-mpi/2021.6.0/lib/release/libmpi.so.12(+0x2659e2) [0x7f9d4d39a9e2]
/apps/intel-mpi/2021.6.0/lib/release/libmpi.so.12(MPI_Win_create+0x3dc) [0x7f9d4d8d1a0c]
./test_win() [0x400f14]
/lib64/libc.so.6(__libc_start_main+0xf3) [0x7f9d4c050493]
./test_win() [0x400dde]
Abort(1) on node 1: Internal error
/apps/intel-mpi/2021.6.0/lib/release/libmpi.so.12(MPL_backtrace_show+0x1c) [0x7fc5eacc752c]
/apps/intel-mpi/2021.6.0/lib/release/libmpi.so.12(MPIR_Assert_fail+0x21) [0x7fc5ea64bc91]
/apps/intel-mpi/2021.6.0/lib/release/libmpi.so.12(+0x264cc6) [0x7fc5ea386cc6]
/apps/intel-mpi/2021.6.0/lib/release/libmpi.so.12(+0x16a7a2) [0x7fc5ea28c7a2]
/apps/intel-mpi/2021.6.0/lib/release/libmpi.so.12(+0x19e9cd) [0x7fc5ea2c09cd]
/apps/intel-mpi/2021.6.0/lib/release/libmpi.so.12(+0x190598) [0x7fc5ea2b2598]
/apps/intel-mpi/2021.6.0/lib/release/libmpi.so.12(+0x165780) [0x7fc5ea287780]
/apps/intel-mpi/2021.6.0/lib/release/libmpi.so.12(+0x26672f) [0x7fc5ea38872f]
/apps/intel-mpi/2021.6.0/lib/release/libmpi.so.12(+0x632df8) [0x7fc5ea754df8]
/apps/intel-mpi/2021.6.0/lib/release/libmpi.so.12(+0x247467) [0x7fc5ea369467]
/apps/intel-mpi/2021.6.0/lib/release/libmpi.so.12(+0x2659e2) [0x7fc5ea3879e2]
/apps/intel-mpi/2021.6.0/lib/release/libmpi.so.12(MPI_Win_create+0x3dc) [0x7fc5ea8bea0c]
./test_win() [0x400f14]
/lib64/libc.so.6(__libc_start_main+0xf3) [0x7fc5e903d493]
./test_win() [0x400dde]
Abort(1) on node 2: Internal error
[ffr599@gadi-login-08 ems-sim-w2w]$
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
we tried to run the code at our end using the below specifications, but we couldn't reproduce your issue.
Os details: Rocky Linux 8
MPI version: 2021.6
Job Scheduler: Slurm
FI_Provider: MLX
Could you please confirm, if you are using PBS Job scheduler?
Thanks & Regards,
Hemanth
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Hemanth,
I'm one of the admins for the cluster in question here. I can reproduce this issue just in a plain SSH session, no batch job required. It also only seems to be occur for -np 3 or -np 4 -- otherwise, it runs fine with either more or less ranks.
It's also not the first call to MPI_Win_create that generates the assertion failure -- it's only when i=454. And if I change the test code to actually expose something rather than just passing a NULL address and 0 for the size, then it works for all rank counts.
Given the particular assertion that's failing, I'm wondering if it's hardware dependent, e.g. the association of the ranks with cores and NUMA domains, and that's why you can't reproduce it on your end.
Let us know if there's any other information that would be helpful to debug this, or if it would be better submitted via IPS.
Thanks,
Ben
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Thanks for your information.
Could you please provide the CPU information by using the below command:
lscpu
Thanks & Regards,
Hemanth
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Hemanth,
Here it is:
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 48
On-line CPU(s) list: 0-47
Thread(s) per core: 1
Core(s) per socket: 24
Socket(s): 2
NUMA node(s): 4
Vendor ID: GenuineIntel
CPU family: 6
Model: 85
Model name: Intel(R) Xeon(R) Platinum 8268 CPU @ 2.90GHz
Stepping: 7
CPU MHz: 2900.000
CPU max MHz: 3900.0000
CPU min MHz: 1200.0000
BogoMIPS: 5800.00
L1d cache: 32K
L1i cache: 32K
L2 cache: 1024K
L3 cache: 36608K
NUMA node0 CPU(s): 0-3,7,8,12-14,18-20
NUMA node1 CPU(s): 4-6,9-11,15-17,21-23
NUMA node2 CPU(s): 24-27,31-33,37-39,43,44
NUMA node3 CPU(s): 28-30,34-36,40-42,45-47
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb cat_l3 cdp_l3 invpcid_single intel_ppin ssbd mba ibrs ibpb stibp ibrs_enhanced fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid cqm mpx rdt_a avx512f avx512dq rdseed adx smap clflushopt clwb intel_pt avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local dtherm ida arat pln pts hwp hwp_act_window hwp_epp hwp_pkg_req pku ospke avx512_vnni md_clear flush_l1d arch_capabilities
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
We are working on your issue and will get back to you soon.
Thanks & Regards,
Hemanth
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Could you please provide the OS details(sub version of the OS) using the below command:
$cat /etc/os-release
Thanks & Regards,
Hemanth
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Hemanth.
Here it is.
Cheers,
-Farhan
[ffr599@gadi-login-07 ~]$ cat /etc/os-release
NAME="Rocky Linux"
VERSION="8.6 (Green Obsidian)"
ID="rocky"
ID_LIKE="rhel centos fedora"
VERSION_ID="8.6"
PLATFORM_ID="platform:el8"
PRETTY_NAME="Rocky Linux 8.6 (Green Obsidian)"
ANSI_COLOR="0;32"
CPE_NAME="cpe:/o:rocky:rocky:8:GA"
HOME_URL="https://rockylinux.org/"
BUG_REPORT_URL="https://bugs.rockylinux.org/"
ROCKY_SUPPORT_PRODUCT="Rocky Linux"
ROCKY_SUPPORT_PRODUCT_VERSION="8"
REDHAT_SUPPORT_PRODUCT="Rocky Linux"
REDHAT_SUPPORT_PRODUCT_VERSION="8"
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
We are working on your issue internally and will get back to you soon.
Thanks & Regards,
Hemanth
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi frizwi/Ben,
Can you please rerun your application with the below environment variable and share your findings?
$ I_MPI_SHM_HEAP_VSIZE=0
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
With I_MPI_SHM_HEAP_VSIZE=0, it works with no errors on my test program.
Next I'll try my full application to see how that goes. What are the consequences of setting this env?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello frizwi.
With I_MPI_SHM_HEAP_VSIZE=0, the shared memory allocator is disabled. Some implementations of collective operations rely on SHM heap. This setting will therefore disable such algorithms, which might possibly result in a performance hit. The performance hit, if any, will depend on the nature of your application.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi frizwi,
Just wanted to check if there is anything else we could help you with before closing this thread?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
In light of the workaround provided (& confirmed) and subsequent inactivity on this thread, this issue is assumed to be resolved and we will no longer respond to this thread. If you require additional assistance from Intel, please start a new thread. Any further interaction in this thread will be considered community only.
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page