Intel® oneAPI HPC Toolkit
Get help with building, analyzing, optimizing, and scaling high-performance computing (HPC) applications.
1959 Discussions

MLX provider not working with oneAPI 2022.2/MPI 2021.6

Antonio_D
Beginner
636 Views

Hello,

I have an MLX provider issue with Intel MPI 2021.6 with all code built with oneAPI 2022.2.  My script:

export OMP_NUM_THREADS=1
export MKL_NUM_THREADS=1
export MKL_DYNAMIC=FALSE
export UCX_TLS=sm,rc_mlx5,dc_mlx5,ud_mlx5,self
export LD_PRELOAD=$I_MPI_ROOT/lib/libmpi_shm_heap_proxy.so
export I_MPI_HYDRA_BOOTSTRAP=lsf
export I_MPI_HYDRA_RMK=lsf
export I_MPI_HYDRA_TOPOLIB=hwloc
export I_MPI_HYDRA_IFACE=ib0
export I_MPI_PLATFORM=clx-ap
export I_MPI_EXTRA_FILESYSTEM=1
export I_MPI_EXTRA_FILESYSTEM_FORCE=gpfs
export I_MPI_FABRICS=shm:ofi
export I_MPI_SHM=clx-ap
export I_MPI_SHM_HEAP=1
export I_MPI_OFI_PROVIDER=mlx
export I_MPI_PIN_CELL=core
export I_MPI_DEBUG=6
mpirun -n 96 ./executable

 The output:

[0] MPI startup(): Intel(R) MPI Library, Version 2021.6 Build 20220227 (id: 28877f3f32)
[0] MPI startup(): Copyright (C) 2003-2022 Intel Corporation. All rights reserved.
[0] MPI startup(): library kind: release
impi_shm_mbind_local(): mbind(p=0x14ad3ea72000, size=4294967296) error=1 "Operation not permitted"

//SNIP//

impi_shm_mbind_local(): mbind(p=0x1458ca7f7000, size=4294967296) error=1 "Operation not permitted"

[0] MPI startup(): libfabric version: 1.13.2rc1-impi
Abort(2139535) on node 0 (rank 0 in comm 0): Fatal error in PMPI_Init: Other MPI error, error stack:
MPIR_Init_thread(178)........:
MPID_Init(1532)..............:
MPIDI_OFI_mpi_init_hook(1512):
open_fabric(2566)............:
find_provider(2684)..........: OFI fi_getinfo() failed (ofi_init.c:2684:find_provider:No data available)
Abort(2139535) on node 0 (rank 0 in comm 0): Fatal error in PMPI_Init: Other MPI error, error stack:
MPIR_Init_thread(178)........:
MPID_Init(1532)..............:
MPIDI_OFI_mpi_init_hook(1512):
open_fabric(2566)............:
find_provider(2684)..........: OFI fi_getinfo() failed (ofi_init.c:2684:find_provider:No data available)

 

I do have Mellanox UCX Framework v1.8 installed and it is recognized:

[dipasqua@ec-hub1-sc1 ~]$ ucx_info -v
# UCT version=1.8.0 revision
# configured with: --prefix=/apps/rocs/2020.08/cascadelake/software/UCX/1.8.0-GCCcore-9.3.0 --build=x86_64-pc-linux-gnu --host=x86_64-pc-linux-gnu --with-rdmacm=/apps/rocs/2020.08/prefix/usr --with-verbs=/apps/rocs/2020.08/prefix/usr --with-knem=/apps/rocs/2020.08/prefix/usr --enable-optimizations --enable-cma --enable-mt --without-java --disable-doxygen-doc
[dipasqua@ec-hub1-sc1 ~]$ fi_info -l
psm2:
version: 113.20
mlx:
version: 1.4
psm3:
version: 1102.0
ofi_rxm:
version: 113.20
verbs:
version: 113.20
tcp:
version: 113.20
sockets:
version: 113.20
shm:
version: 114.0
ofi_hook_noop:
version: 113.20
[dipasqua@ec-hub1-sc1 ~]$ ucx_info -d | grep Transport
# Transport: posix
# Transport: sysv
# Transport: self
# Transport: tcp
# Transport: tcp
# Transport: rc_verbs
# Transport: rc_mlx5
# Transport: dc_mlx5
# Transport: ud_verbs
# Transport: ud_mlx5
# Transport: rc_verbs
# Transport: rc_mlx5
# Transport: ud_verbs
# Transport: ud_mlx5
# Transport: rc_verbs
# Transport: rc_mlx5
# Transport: dc_mlx5
# Transport: ud_verbs
# Transport: ud_mlx5
# Transport: cma
# Transport: knem 

 

Everything works just fine with oneAPI 2022.1 (Intel MPI 2021.5), however, with all settings the same.  Any ideas or do we have a bug?

 

Regards,

Antonio

0 Kudos
13 Replies
ShivaniK_Intel
Moderator
612 Views

Hi,


Thanks for reaching out to us.


Could you please let us know the details of the OS you have been using?


Also please provide the output of lscpu command.


Thanks & Regards

Shivani


Antonio_D
Beginner
598 Views

Hello,

 

O/S version is:

[dipasqua@ec-hub1-sc1 ~]$ cat /etc/redhat-release
Red Hat Enterprise Linux release 8.2 (Ootpa)
[dipasqua@ec-hub1-sc1 ~]$ uname -r
4.18.0-193.71.1.el8_2.x86_64

We are in a heterogeneous environment.  The head and some of the compute nodes are:

[dipasqua@ec-hub1-sc1 ~]$ lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 96
On-line CPU(s) list: 0-95
Thread(s) per core: 1
Core(s) per socket: 48
Socket(s): 2
NUMA node(s): 4
Vendor ID: GenuineIntel
CPU family: 6
Model: 85
Model name: Intel(R) Xeon(R) Platinum 9242 CPU @ 2.30GHz
Stepping: 7
CPU MHz: 2755.047
CPU max MHz: 3800.0000
CPU min MHz: 1000.0000
BogoMIPS: 4600.00
Virtualization: VT-x
L1d cache: 32K
L1i cache: 32K
L2 cache: 1024K
L3 cache: 36608K
NUMA node0 CPU(s): 0-23
NUMA node1 CPU(s): 24-47
NUMA node2 CPU(s): 48-71
NUMA node3 CPU(s): 72-95
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb cat_l3 cdp_l3 invpcid_single ssbd mba ibrs ibpb stibp ibrs_enhanced tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm cqm mpx rdt_a avx512f avx512dq rdseed adx smap clflushopt clwb intel_pt avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local dtherm ida arat pln pts hwp hwp_act_window hwp_pkg_req pku ospke avx512_vnni md_clear flush_l1d arch_capabilities 

The newer compute nodes are:

[dipasqua@sc1nc080is10 ~]$ lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 128
On-line CPU(s) list: 0-63
Off-line CPU(s) list: 64-127
Thread(s) per core: 1
Core(s) per socket: 32
Socket(s): 2
NUMA node(s): 2
Vendor ID: GenuineIntel
CPU family: 6
Model: 106
Model name: Intel(R) Xeon(R) Platinum 8358 CPU @ 2.60GHz
Stepping: 6
CPU MHz: 2891.275
BogoMIPS: 5200.00
Virtualization: VT-x
L1d cache: 48K
L1i cache: 32K
L2 cache: 1280K
L3 cache: 49152K
NUMA node0 CPU(s): 0-31
NUMA node1 CPU(s): 32-63
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb cat_l3 invpcid_single ssbd mba ibrs ibpb stibp ibrs_enhanced tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm cqm rdt_a avx512f avx512dq rdseed adx smap avx512ifma clflushopt clwb intel_pt avx512cd sha_ni avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local wbnoinvd dtherm ida arat pln pts avx512vbmi umip pku ospke avx512_vbmi2 gfni vaes vpclmulqdq avx512_vnni avx512_bitalg tme avx512_vpopcntdq la57 rdpid md_clear pconfig flush_l1d arch_capabilities 

Please let me know if you need any additional information to troubleshoot.


Regards,
Antonio 

ShivaniK_Intel
Moderator
554 Views

Hi,


Could you please let us know whether you want to use the Intel MPI SHM custom allocator that is enabled by I_MPI_SHM_HEAP=1?


For more details regarding I_MPI_SHM_HEAP please refer to the below link.


https://www.intel.com/content/www/us/en/develop/documentation/mpi-developer-reference-linux/top/envi...

other-environment-variables.html


Could you also please let us know whether you have seen any benefit in using the I_MPI_SHM_HEAP with IMPI 2021.5 since it should be ignored if the Kernel version is too low?


Thanks & Regards

Shivani


Antonio_D
Beginner
547 Views

Hello,

 

We would like to use the Intel MPI SHM custom allocator as we believe in some benchmarking that we have done that there are some performance improvements.  We are going off of the following presentation:

 

https://indico.cern.ch/event/813377/contributions/3525116/attachments/1913847/3163350/16.15-17.00_Du...

 

We are running Kernel 4.18.0-193.71.1.el8_2.x86_64, which is greater than 4.7 as noted in the Intel MPI documentation and should be working.

 

Regards,

Antonio

ShivaniK_Intel
Moderator
476 Views

Hi,


Could you please let us know the outputs with the below 3 experiments?


1. If the code is working fine with the TCP network by exporting I_MPI_FABRICS_PROVIDER=tcp and removing the 

  ib related variables?

  

2. Run your code sample on a homogenous selection of nodes by removing the I_MPI_PLATFORM variable?


3. Run your code or IMB benchmark on your system single node with the same environment variables?


command: $ mpirun -n 36 -ppn 36 IMB-MPI1 -npmin 36 alltoall -iter 1000,800 -time 4800


Thanks & Regards

Shivani


Antonio_D
Beginner
433 Views

Hello,

 

1. The code will run using the TCP network but is slow:

 

[0] MPI startup(): Intel(R) MPI Library, Version 2021.6 Build 20220227 (id: 28877f3f32)
[0] MPI startup(): Copyright (C) 2003-2022 Intel Corporation. All rights reserved.
[0] MPI startup(): library kind: release
impi_shm_mbind_local(): mbind(p=0x145aa2dcd000, size=4294967296) error=1 "Operation not permitted"

//SNIP//

mbind_interleave(): mbind(p=0x148650c8f000, size=110047232) error=1 "Operation not permitted"

[0] MPI startup(): libfabric version: 1.13.2rc1-impi
[0] MPI startup(): max number of MPI_Request per vci: 67108864 (pools: 1)
[0] MPI startup(): libfabric provider: tcp;ofi_rxm
[0] MPI startup(): File "/projects/site/gred/smpg/software/oneAPI/2022.2/mpi/2021.6.0/etc/tuning_clx-ap_shm-ofi_tcp-ofi-rxm_100.dat" not found
[0] MPI startup(): Load tuning file: "/projects/site/gred/smpg/software/oneAPI/2022.2/mpi/2021.6.0/etc/tuning_clx-ap_shm-ofi.dat"
[0] MPI startup(): threading: mode: direct
[0] MPI startup(): threading: vcis: 1
[0] MPI startup(): threading: app_threads: -1
[0] MPI startup(): threading: runtime: generic
[0] MPI startup(): threading: progress_threads: 0
[0] MPI startup(): threading: async_progress: 0
[0] MPI startup(): threading: lock_level: global
[0] MPI startup(): tag bits available: 19 (TAG_UB value: 524287)
[0] MPI startup(): source bits available: 20 (Maximal number of rank: 1048575)
[0] MPI startup(): Rank Pid Node name Pin cpu
[0] MPI startup(): 0 2587800 sc1nc049is09 {0}
[0] MPI startup(): 1 2587801 sc1nc049is09 {1}
[0] MPI startup(): 2 2587802 sc1nc049is09 {2}
[0] MPI startup(): 3 2587803 sc1nc049is09 {3}
[0] MPI startup(): 4 2587804 sc1nc049is09 {4}
[0] MPI startup(): 5 2587805 sc1nc049is09 {5}
[0] MPI startup(): 6 2587806 sc1nc049is09 {6}
[0] MPI startup(): 7 2587807 sc1nc049is09 {7}
[0] MPI startup(): 8 2587808 sc1nc049is09 {8}
[0] MPI startup(): 9 2587809 sc1nc049is09 {9}
[0] MPI startup(): 10 2587810 sc1nc049is09 {10}
[0] MPI startup(): 11 2587811 sc1nc049is09 {11}
[0] MPI startup(): 12 2587812 sc1nc049is09 {12}
[0] MPI startup(): 13 2587813 sc1nc049is09 {13}
[0] MPI startup(): 14 2587814 sc1nc049is09 {14}
[0] MPI startup(): 15 2587815 sc1nc049is09 {15}
[0] MPI startup(): 16 2587816 sc1nc049is09 {16}
[0] MPI startup(): 17 2587817 sc1nc049is09 {17}
[0] MPI startup(): 18 2587818 sc1nc049is09 {18}
[0] MPI startup(): 19 2587819 sc1nc049is09 {19}
[0] MPI startup(): 20 2587820 sc1nc049is09 {20}
[0] MPI startup(): 21 2587821 sc1nc049is09 {21}
[0] MPI startup(): 22 2587822 sc1nc049is09 {22}
[0] MPI startup(): 23 2587823 sc1nc049is09 {23}
[0] MPI startup(): 24 2587824 sc1nc049is09 {24}
[0] MPI startup(): 25 2587825 sc1nc049is09 {25}
[0] MPI startup(): 26 2587826 sc1nc049is09 {26}
[0] MPI startup(): 27 2587827 sc1nc049is09 {27}
[0] MPI startup(): 28 2587828 sc1nc049is09 {28}
[0] MPI startup(): 29 2587829 sc1nc049is09 {29}
[0] MPI startup(): 30 2587830 sc1nc049is09 {30}
[0] MPI startup(): 31 2587831 sc1nc049is09 {31}
[0] MPI startup(): 32 764471 sc1nc077is10 {0}
[0] MPI startup(): 33 764472 sc1nc077is10 {1}
[0] MPI startup(): 34 764473 sc1nc077is10 {2}
[0] MPI startup(): 35 764474 sc1nc077is10 {3}
[0] MPI startup(): 36 764475 sc1nc077is10 {4}
[0] MPI startup(): 37 764476 sc1nc077is10 {5}
[0] MPI startup(): 38 764477 sc1nc077is10 {6}
[0] MPI startup(): 39 764478 sc1nc077is10 {7}
[0] MPI startup(): 40 764479 sc1nc077is10 {8}
[0] MPI startup(): 41 764480 sc1nc077is10 {9}
[0] MPI startup(): 42 764481 sc1nc077is10 {10}
[0] MPI startup(): 43 764482 sc1nc077is10 {11}
[0] MPI startup(): 44 764483 sc1nc077is10 {12}
[0] MPI startup(): 45 764484 sc1nc077is10 {13}
[0] MPI startup(): 46 764485 sc1nc077is10 {14}
[0] MPI startup(): 47 764486 sc1nc077is10 {15}
[0] MPI startup(): 48 764487 sc1nc077is10 {16}
[0] MPI startup(): 49 764488 sc1nc077is10 {17}
[0] MPI startup(): 50 764489 sc1nc077is10 {18}
[0] MPI startup(): 51 764490 sc1nc077is10 {19}
[0] MPI startup(): 52 764491 sc1nc077is10 {20}
[0] MPI startup(): 53 764492 sc1nc077is10 {21}
[0] MPI startup(): 54 764493 sc1nc077is10 {22}
[0] MPI startup(): 55 764494 sc1nc077is10 {23}
[0] MPI startup(): 56 764495 sc1nc077is10 {24}
[0] MPI startup(): 57 764496 sc1nc077is10 {25}
[0] MPI startup(): 58 764497 sc1nc077is10 {26}
[0] MPI startup(): 59 764498 sc1nc077is10 {27}
[0] MPI startup(): 60 764499 sc1nc077is10 {28}
[0] MPI startup(): 61 764500 sc1nc077is10 {29}
[0] MPI startup(): 62 764501 sc1nc077is10 {31}
[0] MPI startup(): 63 764502 sc1nc077is10 {32}
[0] MPI startup(): 64 1842352 sc1nc037is08 {0}
[0] MPI startup(): 65 1842353 sc1nc037is08 {1}
[0] MPI startup(): 66 1842354 sc1nc037is08 {2}
[0] MPI startup(): 67 1842355 sc1nc037is08 {3}
[0] MPI startup(): 68 1842356 sc1nc037is08 {4}
[0] MPI startup(): 69 1842357 sc1nc037is08 {5}
[0] MPI startup(): 70 1842358 sc1nc037is08 {6}
[0] MPI startup(): 71 1842359 sc1nc037is08 {7}
[0] MPI startup(): 72 1842360 sc1nc037is08 {8}
[0] MPI startup(): 73 1842361 sc1nc037is08 {9}
[0] MPI startup(): 74 1842362 sc1nc037is08 {15}
[0] MPI startup(): 75 1842363 sc1nc037is08 {16}
[0] MPI startup(): 76 1842364 sc1nc037is08 {17}
[0] MPI startup(): 77 1842365 sc1nc037is08 {18}
[0] MPI startup(): 78 1842366 sc1nc037is08 {19}
[0] MPI startup(): 79 1842367 sc1nc037is08 {20}
[0] MPI startup(): 80 1842368 sc1nc037is08 {21}
[0] MPI startup(): 81 1842369 sc1nc037is08 {22}
[0] MPI startup(): 82 1842370 sc1nc037is08 {23}
[0] MPI startup(): 83 1842371 sc1nc037is08 {24}
[0] MPI startup(): 84 1842372 sc1nc037is08 {25}
[0] MPI startup(): 85 1842373 sc1nc037is08 {26}
[0] MPI startup(): 86 1842374 sc1nc037is08 {27}
[0] MPI startup(): 87 1842375 sc1nc037is08 {28}
[0] MPI startup(): 88 1842376 sc1nc037is08 {29}
[0] MPI startup(): 89 1842377 sc1nc037is08 {32}
[0] MPI startup(): 90 1842378 sc1nc037is08 {33}
[0] MPI startup(): 91 1842379 sc1nc037is08 {34}
[0] MPI startup(): 92 1842380 sc1nc037is08 {35}
[0] MPI startup(): 93 1842381 sc1nc037is08 {36}
[0] MPI startup(): 94 1842382 sc1nc037is08 {37}
[0] MPI startup(): 95 1842383 sc1nc037is08 {38}
[0] MPI startup(): I_MPI_ROOT=/projects/site/gred/smpg/software/oneAPI/2022.2/mpi/2021.6.0
[0] MPI startup(): I_MPI_MPIRUN=mpirun
[0] MPI startup(): I_MPI_HYDRA_RMK=lsf
[0] MPI startup(): I_MPI_HYDRA_TOPOLIB=hwloc
[0] MPI startup(): I_MPI_HYDRA_BOOTSTRAP=lsf
[0] MPI startup(): I_MPI_PIN_CELL=core
[0] MPI startup(): I_MPI_INTERNAL_MEM_POLICY=default
[0] MPI startup(): I_MPI_FABRICS=shm:ofi
[0] MPI startup(): I_MPI_SHM_HEAP=1
[0] MPI startup(): I_MPI_SHM=clx-ap
[0] MPI startup(): I_MPI_OFI_PROVIDER=tcp
[0] MPI startup(): I_MPI_PLATFORM=clx-ap
[0] MPI startup(): I_MPI_DEBUG=6
------------------------------------------------------------
Invoking FHI-aims ...

 2. Removing I_MPI_PLATFORM using homogenous nodes gives the same problem:

[0] MPI startup(): Intel(R) MPI Library, Version 2021.6 Build 20220227 (id: 28877f3f32)
[0] MPI startup(): Copyright (C) 2003-2022 Intel Corporation. All rights reserved.
[0] MPI startup(): library kind: release
impi_shm_mbind_local(): mbind(p=0x1505014f8000, size=4294967296) error=1 "Operation not permitted"

//SNIP//

mbind_interleave(): mbind(p=0x14e0f5c9c000, size=80408576) error=1 "Operation not permitted"

Abort(2139535) on node 0 (rank 0 in comm 0): Fatal error in PMPI_Init: Other MPI error, error stack:
MPIR_Init_thread(178)........:
MPID_Init(1532)..............:
MPIDI_OFI_mpi_init_hook(1512):
open_fabric(2566)............:
find_provider(2684)..........: OFI fi_getinfo() failed (ofi_init.c:2684:find_provider:No data available)

------------------------------------------------------------
Sender: LSF System <lsfadmin@sc1nc064is12>
Subject: Job 14098526: <fhibench> in cluster <sc1> Done

Job <fhibench> was submitted from host <sc1nc001is01> by user <dipasqua> in cluster <sc1> at Thu Sep 22 10:57:21 2022
Job was executed on host(s) <32*sc1nc064is12>, in queue <preempt>, as user <dipasqua> in cluster <sc1> at Thu Sep 22 10:57:21 2022
<32*sc1nc040is08>
<32*sc1nc053is09>
</home/dipasqua> was used as the home directory.
</projects/site/gred/smpg/test/fhiaims/test2> was used as the working directory.
Started at Thu Sep 22 10:57:21 2022
Terminated at Thu Sep 22 10:57:25 2022
Results reported at Thu Sep 22 10:57:25 2022

Your job looked like:

------------------------------------------------------------
# LSBATCH: User input
#!/bin/bash
#BSUB -J fhibench
#BSUB -n 96
#BSUB -q preempt
#BSUB -R rusage[mem=4G]
#BSUB -R span[block=32]
#BSUB -R "model == HPE_APOLLO2000_64"
#BSUB -R affinity[core(1):cpubind=core:membind=localonly:distribute=pack]
#BSUB -R select[hname!=sc1nc069is01]
#BSUB -R select[hname!=sc1nc037is08]
#BSUB -R select[hname!=sc1nc077is10]
#BSUB -R select[hname!=sc1nc049is09]
#BSUB -o fhibench.o%J
#BSUB -e fhibench.e%J
#. /projects/global/smpg/software/oneAPI/setvars.sh
export OMP_NUM_THREADS=1
export MKL_NUM_THREADS=1
export MKL_DYNAMIC=FALSE
export UCX_TLS=sm,rc_mlx5,dc_mlx5,ud_mlx5,self
export LD_PRELOAD=$I_MPI_ROOT/lib/libmpi_shm_heap_proxy.so
export I_MPI_HYDRA_BOOTSTRAP=lsf
export I_MPI_HYDRA_RMK=lsf
export I_MPI_HYDRA_TOPOLIB=hwloc
export I_MPI_HYDRA_IFACE=ib0
export I_MPI_FABRICS=shm:ofi
export I_MPI_SHM_HEAP=1
export I_MPI_OFI_PROVIDER=mlx
export I_MPI_PIN_CELL=core
export I_MPI_DEBUG=6
mpirun -n 96 /projects/site/gred/smpg/software/FHI-aims/bin/aims.220117.scalapack.mpi.x 2>&1 | tee FHIaims.out

3. IMB Benchmark works:

[0] MPI startup(): Intel(R) MPI Library, Version 2021.6 Build 20220227 (id: 28877f3f32)
[0] MPI startup(): Copyright (C) 2003-2022 Intel Corporation. All rights reserved.
[0] MPI startup(): library kind: release
impi_shm_mbind_local(): mbind(p=0x1531255be000, size=4294967296) error=-1 "Unknown error -1"

impi_shm_mbind_local(): mbind(p=0x14c3d8bb1000, size=4294967296) error=-1 "Unknown error -1"

impi_shm_mbind_local(): mbind(p=0x14f9e2ec1000, size=4294967296) error=-1 "Unknown error -1"

impi_shm_mbind_local(): mbind(p=0x1485669ee000, size=4294967296) error=-1 "Unknown error -1"

[0] MPI startup(): libfabric version: 1.13.2rc1-impi
[0] MPI startup(): max number of MPI_Request per vci: 67108864 (pools: 1)
[0] MPI startup(): libfabric provider: mlx
[0] MPI startup(): File "/projects/site/gred/smpg/software/oneAPI/2022.2/mpi/2021.6.0/etc/tuning_clx-ap_shm-ofi_mlx_10.dat" not found
[0] MPI startup(): Load tuning file: "/projects/site/gred/smpg/software/oneAPI/2022.2/mpi/2021.6.0/etc/tuning_clx-ap_shm-ofi.dat"
[0] MPI startup(): threading: mode: direct
[0] MPI startup(): threading: vcis: 1
[0] MPI startup(): threading: app_threads: -1
[0] MPI startup(): threading: runtime: generic
[0] MPI startup(): threading: progress_threads: 0
[0] MPI startup(): threading: async_progress: 0
[0] MPI startup(): threading: lock_level: global
[0] MPI startup(): tag bits available: 20 (TAG_UB value: 1048575)
[0] MPI startup(): source bits available: 21 (Maximal number of rank: 2097151)
[0] MPI startup(): Rank Pid Node name Pin cpu
[0] MPI startup(): 0 538357 sc1nc051is09 {0}
[0] MPI startup(): 1 538358 sc1nc051is09 {1}
[0] MPI startup(): 2 538359 sc1nc051is09 {2}
[0] MPI startup(): 3 538360 sc1nc051is09 {3}
[0] MPI startup(): 4 538361 sc1nc051is09 {4}
[0] MPI startup(): 5 538362 sc1nc051is09 {5}
[0] MPI startup(): 6 538363 sc1nc051is09 {6}
[0] MPI startup(): 7 538364 sc1nc051is09 {7}
[0] MPI startup(): 8 538365 sc1nc051is09 {8}
[0] MPI startup(): 9 538366 sc1nc051is09 {9}
[0] MPI startup(): 10 538367 sc1nc051is09 {10}
[0] MPI startup(): 11 538368 sc1nc051is09 {11}
[0] MPI startup(): 12 538369 sc1nc051is09 {12}
[0] MPI startup(): 13 538370 sc1nc051is09 {13}
[0] MPI startup(): 14 538371 sc1nc051is09 {14}
[0] MPI startup(): 15 538372 sc1nc051is09 {15}
[0] MPI startup(): 16 538373 sc1nc051is09 {16}
[0] MPI startup(): 17 538374 sc1nc051is09 {17}
[0] MPI startup(): 18 538375 sc1nc051is09 {18}
[0] MPI startup(): 19 538376 sc1nc051is09 {19}
[0] MPI startup(): 20 538377 sc1nc051is09 {20}
[0] MPI startup(): 21 538378 sc1nc051is09 {21}
[0] MPI startup(): 22 538379 sc1nc051is09 {22}
[0] MPI startup(): 23 538380 sc1nc051is09 {23}
[0] MPI startup(): 24 538381 sc1nc051is09 {24}
[0] MPI startup(): 25 538382 sc1nc051is09 {25}
[0] MPI startup(): 26 538383 sc1nc051is09 {26}
[0] MPI startup(): 27 538384 sc1nc051is09 {27}
[0] MPI startup(): 28 538385 sc1nc051is09 {28}
[0] MPI startup(): 29 538386 sc1nc051is09 {29}
[0] MPI startup(): 30 538387 sc1nc051is09 {30}
[0] MPI startup(): 31 538388 sc1nc051is09 {31}
[0] MPI startup(): 32 538389 sc1nc051is09 {56}
[0] MPI startup(): 33 538390 sc1nc051is09 {57}
[0] MPI startup(): 34 538391 sc1nc051is09 {58}
[0] MPI startup(): 35 538392 sc1nc051is09 {59}
[0] MPI startup(): I_MPI_ROOT=/projects/site/gred/smpg/software/oneAPI/2022.2/mpi/2021.6.0
[0] MPI startup(): I_MPI_MPIRUN=mpirun
[0] MPI startup(): I_MPI_HYDRA_RMK=lsf
[0] MPI startup(): I_MPI_HYDRA_TOPOLIB=hwloc
[0] MPI startup(): I_MPI_HYDRA_IFACE=ib0
[0] MPI startup(): I_MPI_HYDRA_BOOTSTRAP=lsf
[0] MPI startup(): I_MPI_PIN_CELL=core
[0] MPI startup(): I_MPI_INTERNAL_MEM_POLICY=default
[0] MPI startup(): I_MPI_FABRICS=shm:ofi
[0] MPI startup(): I_MPI_SHM_HEAP=1
[0] MPI startup(): I_MPI_SHM=clx-ap
[0] MPI startup(): I_MPI_OFI_PROVIDER=mlx
[0] MPI startup(): I_MPI_PLATFORM=clx-ap
[0] MPI startup(): I_MPI_DEBUG=6
#----------------------------------------------------------------
# Intel(R) MPI Benchmarks 2021.4, MPI-1 part
#----------------------------------------------------------------
# Date : Thu Sep 22 10:57:15 2022
# Machine : x86_64
# System : Linux
# Release : 4.18.0-193.71.1.el8_2.x86_64
# Version : #1 SMP Mon Dec 6 10:02:41 EST 2021
# MPI Version : 3.1
# MPI Thread Environment:


# Calling sequence was:

# IMB-MPI1 -npmin 36 alltoall -iter 1000,800 -time 4800

# Minimum message length in bytes: 0
# Maximum message length in bytes: 4194304
#
# MPI_Datatype : MPI_BYTE
# MPI_Datatype for reductions : MPI_FLOAT
# MPI_Op : MPI_SUM
#
#

# List of Benchmarks to run:

# Alltoall

#----------------------------------------------------------------
# Benchmarking Alltoall
# #processes = 36
#----------------------------------------------------------------
#bytes #repetitions t_min[usec] t_max[usec] t_avg[usec]
0 1000 0.08 1.71 0.14
1 1000 8.02 10.61 9.49
2 1000 7.80 10.42 9.31
4 1000 8.24 11.05 9.82
8 1000 8.51 11.40 10.18
16 1000 8.79 12.38 10.89
32 1000 10.29 14.35 12.66
64 1000 11.39 16.68 14.35
128 1000 11.99 17.41 15.01
256 1000 15.29 22.50 19.19
512 1000 19.94 50.44 36.31
1024 1000 25.24 71.50 50.77
2048 1000 67.02 73.84 69.65
4096 1000 103.65 115.28 108.63
8192 1000 51.63 77.03 67.32
16384 1000 106.61 115.76 110.12
32768 1000 344.58 361.05 351.96
65536 1000 1181.96 1220.20 1200.34
131072 1000 2720.16 2792.54 2754.35
262144 1000 6358.54 6653.75 6499.16

 

3a. IMB Benchmark also works when run over 2 nodes:

 

[0] MPI startup(): Intel(R) MPI Library, Version 2021.6 Build 20220227 (id: 28877f3f32)
[0] MPI startup(): Copyright (C) 2003-2022 Intel Corporation. All rights reserved.
[0] MPI startup(): library kind: release
impi_shm_mbind_local(): mbind(p=0x146300dc7000, size=4294967296) error=-1 "Unknown error -1"

//SNIP//

impi_shm_mbind_local(): mbind(p=0x14fef7216000, size=4294967296) error=-1 "Unknown error -1"

[0] MPI startup(): libfabric version: 1.13.2rc1-impi
[0] MPI startup(): max number of MPI_Request per vci: 67108864 (pools: 1)
[0] MPI startup(): libfabric provider: mlx
[0] MPI startup(): File "/projects/site/gred/smpg/software/oneAPI/2022.2/mpi/2021.6.0/etc/tuning_clx-ap_shm-ofi_mlx_100.dat" not found
[0] MPI startup(): Load tuning file: "/projects/site/gred/smpg/software/oneAPI/2022.2/mpi/2021.6.0/etc/tuning_clx-ap_shm-ofi.dat"
[0] MPI startup(): threading: mode: direct
[0] MPI startup(): threading: vcis: 1
[0] MPI startup(): threading: app_threads: -1
[0] MPI startup(): threading: runtime: generic
[0] MPI startup(): threading: progress_threads: 0
[0] MPI startup(): threading: async_progress: 0
[0] MPI startup(): threading: lock_level: global
[0] MPI startup(): tag bits available: 20 (TAG_UB value: 1048575)
[0] MPI startup(): source bits available: 21 (Maximal number of rank: 2097151)
[0] MPI startup(): Rank Pid Node name Pin cpu
[0] MPI startup(): 0 231723 sc1nc040is08 {32}
[0] MPI startup(): 1 231724 sc1nc040is08 {33}
[0] MPI startup(): 2 231725 sc1nc040is08 {34}
[0] MPI startup(): 3 231726 sc1nc040is08 {35}
[0] MPI startup(): 4 231727 sc1nc040is08 {36}
[0] MPI startup(): 5 231728 sc1nc040is08 {37}
[0] MPI startup(): 6 231729 sc1nc040is08 {38}
[0] MPI startup(): 7 231730 sc1nc040is08 {39}
[0] MPI startup(): 8 231731 sc1nc040is08 {40}
[0] MPI startup(): 9 231732 sc1nc040is08 {41}
[0] MPI startup(): 10 231733 sc1nc040is08 {42}
[0] MPI startup(): 11 231734 sc1nc040is08 {43}
[0] MPI startup(): 12 231735 sc1nc040is08 {44}
[0] MPI startup(): 13 231736 sc1nc040is08 {45}
[0] MPI startup(): 14 231737 sc1nc040is08 {46}
[0] MPI startup(): 15 231738 sc1nc040is08 {47}
[0] MPI startup(): 16 231739 sc1nc040is08 {48}
[0] MPI startup(): 17 231740 sc1nc040is08 {49}
[0] MPI startup(): 18 542229 sc1nc051is09 {0}
[0] MPI startup(): 19 542230 sc1nc051is09 {1}
[0] MPI startup(): 20 542231 sc1nc051is09 {2}
[0] MPI startup(): 21 542232 sc1nc051is09 {3}
[0] MPI startup(): 22 542233 sc1nc051is09 {4}
[0] MPI startup(): 23 542234 sc1nc051is09 {5}
[0] MPI startup(): 24 542235 sc1nc051is09 {6}
[0] MPI startup(): 25 542236 sc1nc051is09 {7}
[0] MPI startup(): 26 542237 sc1nc051is09 {8}
[0] MPI startup(): 27 542238 sc1nc051is09 {9}
[0] MPI startup(): 28 542239 sc1nc051is09 {10}
[0] MPI startup(): 29 542240 sc1nc051is09 {11}
[0] MPI startup(): 30 542241 sc1nc051is09 {12}
[0] MPI startup(): 31 542242 sc1nc051is09 {13}
[0] MPI startup(): 32 542243 sc1nc051is09 {14}
[0] MPI startup(): 33 542244 sc1nc051is09 {15}
[0] MPI startup(): 34 542245 sc1nc051is09 {16}
[0] MPI startup(): 35 542246 sc1nc051is09 {17}
[0] MPI startup(): I_MPI_ROOT=/projects/site/gred/smpg/software/oneAPI/2022.2/mpi/2021.6.0
[0] MPI startup(): I_MPI_MPIRUN=mpirun
[0] MPI startup(): I_MPI_HYDRA_RMK=lsf
[0] MPI startup(): I_MPI_HYDRA_TOPOLIB=hwloc
[0] MPI startup(): I_MPI_HYDRA_IFACE=ib0
[0] MPI startup(): I_MPI_HYDRA_BOOTSTRAP=lsf
[0] MPI startup(): I_MPI_PIN_CELL=core
[0] MPI startup(): I_MPI_INTERNAL_MEM_POLICY=default
[0] MPI startup(): I_MPI_FABRICS=shm:ofi
[0] MPI startup(): I_MPI_SHM_HEAP=1
[0] MPI startup(): I_MPI_SHM=clx-ap
[0] MPI startup(): I_MPI_OFI_PROVIDER=mlx
[0] MPI startup(): I_MPI_PLATFORM=clx-ap
[0] MPI startup(): I_MPI_DEBUG=6
#----------------------------------------------------------------
# Intel(R) MPI Benchmarks 2021.4, MPI-1 part
#----------------------------------------------------------------
# Date : Thu Sep 22 11:24:01 2022
# Machine : x86_64
# System : Linux
# Release : 4.18.0-193.71.1.el8_2.x86_64
# Version : #1 SMP Mon Dec 6 10:02:41 EST 2021
# MPI Version : 3.1
# MPI Thread Environment:


# Calling sequence was:

# IMB-MPI1 -npmin 36 alltoall -iter 1000,800 -time 4800

# Minimum message length in bytes: 0
# Maximum message length in bytes: 4194304
#
# MPI_Datatype : MPI_BYTE
# MPI_Datatype for reductions : MPI_FLOAT
# MPI_Op : MPI_SUM
#
#

# List of Benchmarks to run:

# Alltoall

#----------------------------------------------------------------
# Benchmarking Alltoall
# #processes = 36
#----------------------------------------------------------------
#bytes #repetitions t_min[usec] t_max[usec] t_avg[usec]
0 1000 0.07 0.67 0.09
1 1000 8.04 11.39 9.50
2 1000 6.73 11.19 9.26
4 1000 6.96 11.82 9.79
8 1000 7.05 12.17 9.99
16 1000 8.40 13.74 11.27
32 1000 10.08 10.82 10.40
64 1000 9.45 10.02 9.69
128 1000 11.47 12.01 11.71
256 1000 16.86 17.31 17.15
512 1000 27.10 28.39 27.84
1024 1000 46.05 46.47 46.28
2048 1000 53.47 66.50 62.08
4096 1000 95.11 124.16 115.93
8192 1000 157.63 240.59 223.38
16384 1000 342.41 481.91 443.78
32768 1000 596.04 1110.06 982.47
65536 1000 1693.21 2531.72 2307.89
131072 1000 3525.39 5584.15 4980.08
262144 1000 7188.74 7357.82 7282.80
524288 1000 14442.73 14942.80 14711.39
1048576 800 29806.12 30867.70 30439.01
2097152 400 59923.80 60215.06 60092.24
4194304 200 120764.27 121485.78 121170.79


# All processes entering MPI_Finalize

 3b. My program does not work on a single node:

[0] MPI startup(): Intel(R) MPI Library, Version 2021.6 Build 20220227 (id: 28877f3f32)
[0] MPI startup(): Copyright (C) 2003-2022 Intel Corporation. All rights reserved.
[0] MPI startup(): library kind: release
impi_shm_mbind_local(): mbind(p=0x149fb6157000, size=4294967296) error=1 "Operation not permitted"

//SNIP//

impi_shm_mbind_local(): mbind(p=0x14f4f4bdf000, size=4294967296) error=1 "Operation not permitted"

[0] MPI startup(): libfabric version: 1.13.2rc1-impi
Abort(2139535) on node 0 (rank 0 in comm 0): Fatal error in PMPI_Init: Other MPI error, error stack:
MPIR_Init_thread(178)........:
MPID_Init(1532)..............:
MPIDI_OFI_mpi_init_hook(1512):
open_fabric(2566)............:
find_provider(2684)..........: OFI fi_getinfo() failed (ofi_init.c:2684:find_provider:No data available)

------------------------------------------------------------
Sender: LSF System <lsfadmin@sc1nc049is09>
Subject: Job 14098787: <fhibench> in cluster <sc1> Done

Job <fhibench> was submitted from host <sc1nc001is01> by user <dipasqua> in cluster <sc1> at Thu Sep 22 11:34:13 2022
Job was executed on host(s) <32*sc1nc049is09>, in queue <preempt>, as user <dipasqua> in cluster <sc1> at Thu Sep 22 11:34:13 2022
</home/dipasqua> was used as the home directory.
</projects/site/gred/smpg/test/fhiaims/test4> was used as the working directory.
Started at Thu Sep 22 11:34:13 2022
Terminated at Thu Sep 22 11:34:16 2022
Results reported at Thu Sep 22 11:34:16 2022

Your job looked like:

------------------------------------------------------------
# LSBATCH: User input
#!/bin/bash
#BSUB -J fhibench
#BSUB -n 32
#BSUB -q preempt
#BSUB -R rusage[mem=4G]
#BSUB -R span[block=32]
#BSUB -R "model == HPE_APOLLO2000_64"
#BSUB -R affinity[core(1):cpubind=core:membind=localonly:distribute=pack]
#BSUB -o fhibench.o%J
#BSUB -e fhibench.e%J
export OMP_NUM_THREADS=1
export MKL_NUM_THREADS=1
export MKL_DYNAMIC=FALSE
export UCX_TLS=sm,rc_mlx5,dc_mlx5,ud_mlx5,self
export LD_PRELOAD=$I_MPI_ROOT/lib/libmpi_shm_heap_proxy.so
export I_MPI_HYDRA_BOOTSTRAP=lsf
export I_MPI_HYDRA_RMK=lsf
export I_MPI_HYDRA_TOPOLIB=hwloc
export I_MPI_HYDRA_IFACE=ib0
export I_MPI_PLATFORM=clx-ap
export I_MPI_FABRICS=shm:ofi
export I_MPI_SHM=clx-ap
export I_MPI_SHM_HEAP=1
export I_MPI_OFI_PROVIDER=mlx
export I_MPI_PIN_CELL=core
export I_MPI_DEBUG=6
mpirun -n 32 /projects/site/gred/smpg/software/FHI-aims/bin/aims.220117.scalapack.mpi.x 2>&1 | tee FHIaims.out
------------------------------------------------------------

Successfully completed.

Resource usage summary:

CPU time : 6.00 sec.
Max Memory : 441 MB
Average Memory : 441.00 MB
Total Requested Memory : 131072.00 MB
Delta Memory : 130631.00 MB
Max Swap : -
Max Processes : 41
Max Threads : 43
Run time : 0 sec.
Turnaround time : 3 sec.

The output (if any) is above this job summary.




PS:

Read file <fhibench.e14098787> for stderr output of this job.

Regards,

Antonio

Antonio_D
Beginner
433 Views

Hello,

 

1.  Yes, the code runs fine with the TCP network by exporting I_MPI_FABRICS_PROVIDER=tcp and removing the ib related variables.

2.  The code still fails on a homogenous selection of nodes and removing the I_MPI_PLATFORM variable.

3.  The IMB benchmark works correctly, it appears:

[0] MPI startup(): Intel(R) MPI Library, Version 2021.6 Build 20220227 (id: 28877f3f32)
[0] MPI startup(): Copyright (C) 2003-2022 Intel Corporation. All rights reserved.
[0] MPI startup(): library kind: release
impi_shm_mbind_local(): mbind(p=0x146300dc7000, size=4294967296) error=-1 "Unknown error -1"

//SNIP//

impi_shm_mbind_local(): mbind(p=0x14fef7216000, size=4294967296) error=-1 "Unknown error -1"

[0] MPI startup(): libfabric version: 1.13.2rc1-impi
[0] MPI startup(): max number of MPI_Request per vci: 67108864 (pools: 1)
[0] MPI startup(): libfabric provider: mlx
[0] MPI startup(): File "/projects/site/gred/smpg/software/oneAPI/2022.2/mpi/2021.6.0/etc/tuning_clx-ap_shm-ofi_mlx_100.dat" not found
[0] MPI startup(): Load tuning file: "/projects/site/gred/smpg/software/oneAPI/2022.2/mpi/2021.6.0/etc/tuning_clx-ap_shm-ofi.dat"
[0] MPI startup(): threading: mode: direct
[0] MPI startup(): threading: vcis: 1
[0] MPI startup(): threading: app_threads: -1
[0] MPI startup(): threading: runtime: generic
[0] MPI startup(): threading: progress_threads: 0
[0] MPI startup(): threading: async_progress: 0
[0] MPI startup(): threading: lock_level: global
[0] MPI startup(): tag bits available: 20 (TAG_UB value: 1048575)
[0] MPI startup(): source bits available: 21 (Maximal number of rank: 2097151)
[0] MPI startup(): Rank Pid Node name Pin cpu
[0] MPI startup(): 0 231723 sc1nc040is08 {32}
[0] MPI startup(): 1 231724 sc1nc040is08 {33}
[0] MPI startup(): 2 231725 sc1nc040is08 {34}
[0] MPI startup(): 3 231726 sc1nc040is08 {35}
[0] MPI startup(): 4 231727 sc1nc040is08 {36}
[0] MPI startup(): 5 231728 sc1nc040is08 {37}
[0] MPI startup(): 6 231729 sc1nc040is08 {38}
[0] MPI startup(): 7 231730 sc1nc040is08 {39}
[0] MPI startup(): 8 231731 sc1nc040is08 {40}
[0] MPI startup(): 9 231732 sc1nc040is08 {41}
[0] MPI startup(): 10 231733 sc1nc040is08 {42}
[0] MPI startup(): 11 231734 sc1nc040is08 {43}
[0] MPI startup(): 12 231735 sc1nc040is08 {44}
[0] MPI startup(): 13 231736 sc1nc040is08 {45}
[0] MPI startup(): 14 231737 sc1nc040is08 {46}
[0] MPI startup(): 15 231738 sc1nc040is08 {47}
[0] MPI startup(): 16 231739 sc1nc040is08 {48}
[0] MPI startup(): 17 231740 sc1nc040is08 {49}
[0] MPI startup(): 18 542229 sc1nc051is09 {0}
[0] MPI startup(): 19 542230 sc1nc051is09 {1}
[0] MPI startup(): 20 542231 sc1nc051is09 {2}
[0] MPI startup(): 21 542232 sc1nc051is09 {3}
[0] MPI startup(): 22 542233 sc1nc051is09 {4}
[0] MPI startup(): 23 542234 sc1nc051is09 {5}
[0] MPI startup(): 24 542235 sc1nc051is09 {6}
[0] MPI startup(): 25 542236 sc1nc051is09 {7}
[0] MPI startup(): 26 542237 sc1nc051is09 {8}
[0] MPI startup(): 27 542238 sc1nc051is09 {9}
[0] MPI startup(): 28 542239 sc1nc051is09 {10}
[0] MPI startup(): 29 542240 sc1nc051is09 {11}
[0] MPI startup(): 30 542241 sc1nc051is09 {12}
[0] MPI startup(): 31 542242 sc1nc051is09 {13}
[0] MPI startup(): 32 542243 sc1nc051is09 {14}
[0] MPI startup(): 33 542244 sc1nc051is09 {15}
[0] MPI startup(): 34 542245 sc1nc051is09 {16}
[0] MPI startup(): 35 542246 sc1nc051is09 {17}
[0] MPI startup(): I_MPI_ROOT=/projects/site/gred/smpg/software/oneAPI/2022.2/mpi/2021.6.0
[0] MPI startup(): I_MPI_MPIRUN=mpirun
[0] MPI startup(): I_MPI_HYDRA_RMK=lsf
[0] MPI startup(): I_MPI_HYDRA_TOPOLIB=hwloc
[0] MPI startup(): I_MPI_HYDRA_IFACE=ib0
[0] MPI startup(): I_MPI_HYDRA_BOOTSTRAP=lsf
[0] MPI startup(): I_MPI_PIN_CELL=core
[0] MPI startup(): I_MPI_INTERNAL_MEM_POLICY=default
[0] MPI startup(): I_MPI_FABRICS=shm:ofi
[0] MPI startup(): I_MPI_SHM_HEAP=1
[0] MPI startup(): I_MPI_SHM=clx-ap
[0] MPI startup(): I_MPI_OFI_PROVIDER=mlx
[0] MPI startup(): I_MPI_PLATFORM=clx-ap
[0] MPI startup(): I_MPI_DEBUG=6
#----------------------------------------------------------------
# Intel(R) MPI Benchmarks 2021.4, MPI-1 part
#----------------------------------------------------------------
# Date : Thu Sep 22 11:24:01 2022
# Machine : x86_64
# System : Linux
# Release : 4.18.0-193.71.1.el8_2.x86_64
# Version : #1 SMP Mon Dec 6 10:02:41 EST 2021
# MPI Version : 3.1
# MPI Thread Environment:


# Calling sequence was:

# IMB-MPI1 -npmin 36 alltoall -iter 1000,800 -time 4800

# Minimum message length in bytes: 0
# Maximum message length in bytes: 4194304
#
# MPI_Datatype : MPI_BYTE
# MPI_Datatype for reductions : MPI_FLOAT
# MPI_Op : MPI_SUM
#
#

# List of Benchmarks to run:

# Alltoall

#----------------------------------------------------------------
# Benchmarking Alltoall
# #processes = 36
#----------------------------------------------------------------
#bytes #repetitions t_min[usec] t_max[usec] t_avg[usec]
0 1000 0.07 0.67 0.09
1 1000 8.04 11.39 9.50
2 1000 6.73 11.19 9.26
4 1000 6.96 11.82 9.79
8 1000 7.05 12.17 9.99
16 1000 8.40 13.74 11.27
32 1000 10.08 10.82 10.40
64 1000 9.45 10.02 9.69
128 1000 11.47 12.01 11.71
256 1000 16.86 17.31 17.15
512 1000 27.10 28.39 27.84
1024 1000 46.05 46.47 46.28
2048 1000 53.47 66.50 62.08
4096 1000 95.11 124.16 115.93
8192 1000 157.63 240.59 223.38
16384 1000 342.41 481.91 443.78
32768 1000 596.04 1110.06 982.47
65536 1000 1693.21 2531.72 2307.89
131072 1000 3525.39 5584.15 4980.08
262144 1000 7188.74 7357.82 7282.80
524288 1000 14442.73 14942.80 14711.39
1048576 800 29806.12 30867.70 30439.01
2097152 400 59923.80 60215.06 60092.24
4194304 200 120764.27 121485.78 121170.79


# All processes entering MPI_Finalize

 

However, my code still fails running on a single node.

 

Regards,

Antonio 

ShivaniK_Intel
Moderator
340 Views

Hi,


Could you please let us know whether you face a similar issue with Intel oneAPI 2022.3(MPI-2021.7)?


Thanks & Regards

Shivani


Antonio_D
Beginner
294 Views

I see the same error with the latest release Intel oneAPI 2022.3(MPI-2021.7).

ShivaniK_Intel
Moderator
96 Views

Hi,


We are working on it and will get back to you.


Thanks & Regards

Shivani


ShivaniK_Intel
Moderator
68 Views

Hi,


Could you please provide us with a sample reproducer code to investigate more?


Thanks & Regards

Shivani



Antonio_D
Beginner
48 Views

Hello,

 

Yes, I can, but since the code is 3rd party, I would rather not post it here.  Is there another way I can get you the code?

 

Regards,

Antonio

ShivaniK_Intel
Moderator
13 Views

Hi,


As we did not hear back from you could you please provide us with the sample reproducer?


Thanks & Regards

Shivani


Reply