Intel® MPI Library
Get help with building, analyzing, optimizing, and scaling high-performance computing (HPC) applications.
Announcements
FPGA community forums and blogs have moved to the Altera Community. Existing Intel Community members can sign in with their current credentials.
2276 Discussions

3rd gen Xeon showed slower performance with intel MPI library

Kuni
New Contributor I
25,789 Views

Now we are studying network traffic of HPC use. For this, we are using Intel MPI Library (latest - Intel HPC kit at 12/10/2022) and Nas Parallel Benchmark (3.4.2). Before measuring network traffic, I measured the performance without using network traffic.  We used following platform:  

 

machine1. Xeon Silver 4310 server 8ch 64GB RAM, Hyper thread on, CentOS 7.9, Turbo ON

machine 2. Xeon Silver 4214 server 6ch 96GB RAM Hyper thread on CentOS 7.9, no Turbo

machine 3.  4 core 8GB RAM virtual machine on machine 1. CentOS 7.9

machine 4.  4 core 8GB RAM vitual machine on machine 2. CentOS 7.9

 

Results: 

Test 1.  mpirun -n 4 ./bin/bt.B.x (4 process smaller array - 102 x 102 x 102)

machine 1.  49.87 sec

machine 2. 62.02 sec

machine 3. 43.92 sec

machine 4. 63.11 sec

 

Test 2. mpirun -n 4 ./bin/bt.C.x  (4 process larger array - 162 x 162 x 162)

machine 1. 388.57 sec

machine 2. 253.40 sec

machine 3. 201.79 sec

machine 4. 256.78 sec

 

In case of the above test 1, the result was understandable and performance diffrence was not strange and expected results were shown.

 

However, 2nd test. I saw very strange results. There is two unexped things.

1. Newer (3rd) generation of Xeon showed much slower result than older (2nd) generation of Xeon on real machine.

2. Newer (3rd) generation of Xeon showed big improvement , if the benchmark was executed on the virtual machine. 

 

In case of the memory of the machine 1 and the machine 2, machine 2's memory is 1/3 x bigger, however, the using memory of the test 2 (bt.C.x) only consume 4GB (free command result), then it the memory size difference might not make such big effects to execution results. 

 

I also executed the tests with openmpi 4.1 the following is the results:

Test 1.  mpirun -np 4 ./bin/bt.B.x (4 process smaller array)

machine 1.  52.31 sec

machine 2.  61.73 sec

 

Test 2. mpirun -np 4 ./bin/bt.C.x  (4 process larger array)

machine 1. 198.70 sec

machine 2. 252.31 sec

 

Then it seems that Intel MPI and 3rd Gen Xeon and some large array treatment may cause performance down.  Then it seems that I can not use Intel MPI with  3rd Gen Xeon. But Intel MPI is much easier to specify fabric and then I want to use it our network traffic evaluation if possible.  Then, I want to know following things to use Intel MPI library:

 

1. Why 3rd Gen Xeon showed slow performance? Why it was not shown with my vitrual machine case even with 3rd Gen Xeon?  

2. Why the performance down is shown with Intel MPI library?

3. Is there any way to make performance up with Intel MPI and 3rd Gen Xeon? 

 

Please help!.

 

K. Kunita

0 Kudos
38 Replies
SantoshY_Intel
Moderator
15,272 Views

Hi,

 

Thanks for posting in the Intel forums.

 

Could you please provide us with the following details which would help us in further investigation of your issue?

  1. What is the job scheduler you are using?
  2. What is the FI_PROVIDER(mlx/psm2/verbs etc..) you are using?
  3. What is the Interconnect hardware(Infiniband/Intel Omni-Path etc..) you are using?
  4. What is the Intel MPI version you are using?
  5. Also, please provide us the sample reproducer code to reproduce the issue from our end.

 

Thanks & Regards,

Santosh

 


0 Kudos
Kuni
New Contributor I
15,262 Views

mpiI did not use job scheduler. I just run command "mpirun -np 4 ./bin/bt.C.x" or "mpirun -n 4 ./bin/bt.B.x". 

At this time, issue happend without any node to node communication. I just use one server. Then communication might be loop back socket or Shared memory base.  The FI_PROVIDER might not make effect.  For the reference, I used command "mpirun -n 4 -genv FI_PROVIDER tcp ./bin/bt.X.x" (X is C or B).  The result is same. 

 

mpirun version is latest intel hpc-kit 

$ mpirun --version
Intel(R) MPI Library for Linux* OS, Version 2021.7 Build 20221022 (id: f7b29a2495)
Copyright 2003-2022, Intel Corporation.

 

To reproduce your side,  follwing procedure can be used:

 

On CentOS 7.9,

# su -

# yum update

- install intel-basekit, intel-hpckit based on intel instruction. Basically, set repository for one api and then

# yum install intel-basekit

# yum install intel-hpckit

# exit

- downlaod nas 4.3.2 software 

$ .  /opt/intel/oneapi/setvars.sh

$ wget https://www.nas.nasa.gov/assets/npb/NPB3.4.2.tar.gz

$ sudo yum install centos-release-scl

$ sudo yum install devtoolset-9

$ scl enable devtoolset-9 bash

$ cd npb/NPB3.4.2/NPB3.4-MPI

$ cp config/make.def.template config/make.def

$ cp config/suite.def.template config/suite.def

$ vim config/make.def

change to the followings:

MPIFC = /opt/intel/oneapi/mpi/latest/bin/mpif90
FMPI_LIB = -L/opt/intel/oneapi/mpi/latest/lib -lmpi
FMPI_INC = -I/opt/intel/oneapi/mpi/latest/include
MPICC = /opt/intel/oneapi/mpi/latest/bin/mpicc
CMPI_LIB = -L/opt/intel/oneapi/mpi/latest/lib -lmpi
CMPI_INC = -I/opt/intel/oneapi/mpi/latest/include

$ vim config/suite.def

delete all non comment lines and add following

bt<tab>B

bt<tab>C

$ make suite

$ mpirun -n 4 ./bin/bt.B.x

$ mpirun -n 4 ./bin/bt.C.x

 

For Virtual machine, you can create virtual machine, with OS standard way. 

 

 

 

0 Kudos
SantoshY_Intel
Moderator
15,241 Views

Hi,

 

Thanks for providing all the requested details.

 

Could you please provide the outputs for the below commands after initializing the Intel oneAPI environment:

 

fi_info -l
ibv_devinfo
lspci | grep Mellanox
lspci | grep Omni-Path

 

 

Also, please provide the complete debug log for the command below:

 

I_MPI_DEBUG=30 mpirun -n 4 ./bin/bt.B.x

 

 

Thanks & Regards,

Santosh

 

 

 

Thanks & Regards,

Santosh

 

0 Kudos
Kuni
New Contributor I
15,198 Views

Hi Santosh,

 

Thank you for your quick response. 

The followings are what you requested: 

 

[kkunita@svr4 NPB3.4-MPI]$ fi_info -l
psm2:
version: 113.20
mlx:
version: 1.4
psm3:
version: 1103.0
psm3:
version: 1102.0
ofi_rxm:
version: 113.20
verbs:
version: 113.20
verbs:
version: 113.20
tcp:
version: 113.20
sockets:
version: 113.20
shm:
version: 114.0
ofi_hook_noop:
version: 113.20
[kkunita@svr4 NPB3.4-MPI]$ ibv_devinfo
hca_id: rdmap24s0f0
transport: InfiniBand (0)
fw_ver: 1.60
node_guid: 669d:99ff:feff:ff5e
sys_image_guid: 649d:99ff:ff5e:0000
vendor_id: 0x8086
vendor_part_id: 5522
hw_ver: 0x2
phys_port_cnt: 1
port: 1
state: PORT_ACTIVE (4)
max_mtu: 4096 (5)
active_mtu: 1024 (3)
sm_lid: 0
port_lid: 1
port_lmc: 0x00
link_layer: Ethernet

hca_id: irdma1
transport: InfiniBand (0)
fw_ver: 1.60
node_guid: 669d:99ff:feff:ff5f
sys_image_guid: 649d:99ff:ff5f:0000
vendor_id: 0x8086
vendor_part_id: 5522
hw_ver: 0x2
phys_port_cnt: 1
port: 1
state: PORT_DOWN (1)
max_mtu: 4096 (5)
active_mtu: 1024 (3)
sm_lid: 0
port_lid: 1
port_lmc: 0x00
link_layer: Ethernet

[kkunita@svr4 NPB3.4-MPI]$ lspci |grep Mellanox
[kkunita@svr4 NPB3.4-MPI]$ lspci |grep Omni_Path
[kkunita@svr4 NPB3.4-MPI]$ I_MPI_DEBUG=30 mpirun -n 4 ./bin/bt.B.x
[0] MPI startup(): Intel(R) MPI Library, Version 2021.7 Build 20221022 (id: f7b29a2495)
[0] MPI startup(): Copyright (C) 2003-2022 Intel Corporation. All rights reserved.
[0] MPI startup(): library kind: release
[0] MPI startup(): shm segment size (342 MB per rank) * (4 local ranks) = 1368 MB total
[0] MPI startup(): libfabric version: 1.13.2rc1-impi
libfabric:24286:core:core:ofi_hmem_init():209<info> Hmem iface FI_HMEM_CUDA not supported
libfabric:24286:core:core:ofi_hmem_init():209<info> Hmem iface FI_HMEM_ROCR not supported
libfabric:24286:core:core:ze_hmem_dl_init():422<warn> Failed to dlopen libze_loader.so
libfabric:24286:core:core:ofi_hmem_init():214<warn> Failed to initialize hmem iface FI_HMEM_ZE: No data available
libfabric:24286:core:core:ofi_register_provider():474<info> registering provider: verbs (113.20)
libfabric:24286:core:core:ofi_register_provider():474<info> registering provider: verbs (113.20)
libfabric:24286:core:core:ofi_register_provider():474<info> registering provider: tcp (113.20)
libfabric:24286:core:core:ofi_register_provider():474<info> registering provider: sockets (113.20)
libfabric:24286:core:core:ofi_hmem_init():222<info> Hmem iface FI_HMEM_CUDA not supported
libfabric:24286:core:core:ofi_hmem_init():222<info> Hmem iface FI_HMEM_ROCR not supported
libfabric:24286:core:core:ofi_hmem_init():222<info> Hmem iface FI_HMEM_ZE not supported
libfabric:24286:core:core:ofi_register_provider():474<info> registering provider: shm (114.0)
libfabric:24286:core:core:ofi_hmem_init():209<info> Hmem iface FI_HMEM_CUDA not supported
libfabric:24286:core:core:ofi_hmem_init():209<info> Hmem iface FI_HMEM_ROCR not supported
libfabric:24286:core:core:ze_hmem_dl_init():422<warn> Failed to dlopen libze_loader.so
libfabric:24286:core:core:ofi_hmem_init():214<warn> Failed to initialize hmem iface FI_HMEM_ZE: No data available
libfabric:24286:core:core:ofi_register_provider():474<info> registering provider: ofi_rxm (113.20)
libfabric:24286:core:core:ofi_register_provider():474<info> registering provider: psm2 (113.20)
libfabric:24286:psm3:core:fi_prov_ini():752<info> build options: VERSION=1102.0=11.2.0.0, HAVE_PSM3_src=1, PSM3_CUDA=0
libfabric:24286:core:core:ofi_register_provider():474<info> registering provider: psm3 (1102.0)
libfabric:24286:core:core:ofi_register_provider():474<info> registering provider: mlx (1.4)
libfabric:24286:core:core:ofi_hmem_init():222<info> Hmem iface FI_HMEM_CUDA not supported
libfabric:24286:core:core:ofi_hmem_init():222<info> Hmem iface FI_HMEM_ROCR not supported
libfabric:24286:core:core:ofi_hmem_init():222<info> Hmem iface FI_HMEM_ZE not supported
libfabric:24286:psm3:core:fi_prov_ini():785<info> build options: VERSION=1103.0=11.3.0.0, HAVE_PSM3_src=1, PSM3_CUDA=0
libfabric:24286:core:core:ofi_register_provider():474<info> registering provider: psm3 (1103.0)
libfabric:24286:core:core:ofi_register_provider():474<info> registering provider: ofi_hook_noop (113.20)
libfabric:24286:core:core:fi_getinfo_():1138<info> Found provider with the highest priority psm2, must_use_util_prov = 0
libfabric:24286:core:core:fi_getinfo_():1201<info> Start regular provider search because provider with the highest priority psm2 can not be initialized
libfabric:24286:core:core:fi_getinfo_():1138<info> Found provider with the highest priority psm2, must_use_util_prov = 0
libfabric:24286:core:core:fi_getinfo_():1201<info> Start regular provider search because provider with the highest priority psm2 can not be initialized
libfabric:24286:core:core:ofi_layering_ok():1007<info> Skipping util;psm3 layering
libfabric:24286:core:core:ofi_layering_ok():1001<info> Need core provider, skipping ofi_rxm
libfabric:24286:core:core:ofi_hmem_init():209<info> Hmem iface FI_HMEM_CUDA not supported
libfabric:24286:core:core:ofi_hmem_init():209<info> Hmem iface FI_HMEM_ROCR not supported
libfabric:24286:core:core:ze_hmem_dl_init():422<warn> Failed to dlopen libze_loader.so
libfabric:24286:core:core:ofi_hmem_init():214<warn> Failed to initialize hmem iface FI_HMEM_ZE: No data available
libfabric:24286:core:core:ofi_layering_ok():1007<info> Skipping util;sockets layering
libfabric:24286:core:core:ofi_layering_ok():1007<info> Skipping util;shm layering
libfabric:24286:core:core:fi_getinfo_():1138<info> Found provider with the highest priority psm2, must_use_util_prov = 0
libfabric:24286:core:core:fi_getinfo_():1201<info> Start regular provider search because provider with the highest priority psm2 can not be initialized
libfabric:24286:core:core:ofi_layering_ok():1007<info> Skipping util;psm3 layering
libfabric:24286:core:core:ofi_layering_ok():1001<info> Need core provider, skipping ofi_rxm
libfabric:24286:core:core:ofi_layering_ok():1007<info> Skipping util;sockets layering
libfabric:24286:core:core:ofi_layering_ok():1007<info> Skipping util;shm layering
libfabric:24286:core:core:fi_getinfo_():1138<info> Found provider with the highest priority psm2, must_use_util_prov = 0
libfabric:24286:core:core:fi_getinfo_():1201<info> Start regular provider search because provider with the highest priority psm2 can not be initialized
libfabric:24286:core:core:ofi_layering_ok():1007<info> Skipping util;psm3 layering
libfabric:24286:core:core:ofi_layering_ok():1001<info> Need core provider, skipping ofi_rxm
libfabric:24286:core:core:ofi_layering_ok():1007<info> Skipping util;sockets layering
libfabric:24286:core:core:ofi_layering_ok():1007<info> Skipping util;shm layering
libfabric:24286:core:core:fi_getinfo_():1138<info> Found provider with the highest priority psm2, must_use_util_prov = 0
libfabric:24286:core:core:fi_getinfo_():1201<info> Start regular provider search because provider with the highest priority psm2 can not be initialized
libfabric:24286:core:core:ofi_layering_ok():1007<info> Skipping util;psm3 layering
libfabric:24286:core:core:ofi_layering_ok():1001<info> Need core provider, skipping ofi_rxm
libfabric:24286:core:core:ofi_layering_ok():1007<info> Skipping util;sockets layering
libfabric:24286:core:core:ofi_layering_ok():1007<info> Skipping util;shm layering
[0] MPI startup(): max_ch4_vnis: 1, max_reg_eps 64, enable_sep 0, enable_shared_ctxs 0, do_av_insert 0
[0] MPI startup(): max number of MPI_Request per vci: 67108864 (pools: 1)
libfabric:24286:core:core:fi_getinfo_():1138<info> Found provider with the highest priority psm2, must_use_util_prov = 0
libfabric:24286:core:core:fi_getinfo_():1201<info> Start regular provider search because provider with the highest priority psm2 can not be initialized
[0] MPI startup(): libfabric provider: psm3
[0] MPI startup(): detected psm3 provider, set device name to "psm3"
libfabric:24286:core:core:fi_fabric_():1423<info> Opened fabric: RoCE-192.168.17.0/24
libfabric:24286:core:core:ofi_shm_map():171<warn> shm_open failed
libfabric:24286:core:core:ofi_ns_add_local_name():370<warn> Cannot add local name - name server uninitialized
[0] MPI startup(): addrnamelen: 32
[0] MPI startup(): File "/opt/intel/oneapi/mpi/2021.7.1/etc/tuning_icx_shm-ofi_psm3_100.dat" not found
[0] MPI startup(): Load tuning file: "/opt/intel/oneapi/mpi/2021.7.1/etc/tuning_icx_shm-ofi_psm3.dat"
[0] MPI startup(): threading: mode: direct
[0] MPI startup(): threading: vcis: 1
[0] MPI startup(): threading: app_threads: -1
[0] MPI startup(): threading: runtime: generic
[0] MPI startup(): threading: progress_threads: 0
[0] MPI startup(): threading: async_progress: 0
[0] MPI startup(): threading: lock_level: global
[0] MPI startup(): threading: num_pools: 1
[0] MPI startup(): threading: enable_sep: 0
[0] MPI startup(): threading: direct_recv: 1
[0] MPI startup(): threading: zero_op_flags: 0
[0] MPI startup(): threading: num_am_buffers: 1
[0] MPI startup(): tag bits available: 30 (TAG_UB value: 1073741823)
[0] MPI startup(): source bits available: 30 (Maximal number of rank: 1073741823)
[0] MPI startup(): Rank Pid Node name Pin cpu
[0] MPI startup(): 0 24286 svr4 {0,1,2,12,13,14}
[0] MPI startup(): 1 24287 svr4 {3,4,5,15,16,17}
[0] MPI startup(): 2 24288 svr4 {6,7,8,18,19,20}
[0] MPI startup(): 3 24289 svr4 {9,10,11,21,22,23}
[0] MPI startup(): I_MPI_ROOT=/opt/intel/oneapi/mpi/2021.7.1
[0] MPI startup(): I_MPI_MPIRUN=mpirun
[0] MPI startup(): I_MPI_HYDRA_TOPOLIB=hwloc
[0] MPI startup(): I_MPI_INTERNAL_MEM_POLICY=default
[0] MPI startup(): I_MPI_DEBUG=30
[0] allocate handle (kind=1, size=744, direct_size=8, indirect_size=1) ptr=0x7f2002efe740
[0] allocate handle (kind=2, size=40, direct_size=8, indirect_size=1) ptr=0x7f100004c440


NAS Parallel Benchmarks 3.4 -- BT Benchmark

No input file inputbt.data. Using compiled defaults
Size: 102x 102x 102 (class B)
Iterations: 200 dt: 0.0003000
Total number of processes: 4

Time step 1
Time step 20
Time step 40
Time step 60
Time step 80
Time step 100
Time step 120
Time step 140
Time step 160
Time step 180
Time step 200
Verification being performed for class B
accuracy setting for epsilon = 0.1000000000000E-07
Comparison of RMS-norms of residual
1 0.1423359722929E+04 0.1423359722929E+04 0.1070287152945E-13
2 0.9933052259015E+02 0.9933052259015E+02 0.7153317200312E-15
3 0.3564602564454E+03 0.3564602564454E+03 0.5900255245348E-14
4 0.3248544795908E+03 0.3248544795908E+03 0.9798945854817E-14
5 0.3270754125466E+04 0.3270754125466E+04 0.1223502756335E-13
Comparison of RMS-norms of solution error
1 0.5296984714094E+02 0.5296984714094E+02 0.9389868800427E-15
2 0.4463289611567E+01 0.4463289611567E+01 0.1293476388601E-13
3 0.1312257334221E+02 0.1312257334221E+02 0.1258908460682E-13
4 0.1200692532356E+02 0.1200692532356E+02 0.6805440394643E-14
5 0.1245957615104E+03 0.1245957615104E+03 0.1003690013030E-13
Verification Successful


BT Benchmark Completed.
Class = B
Size = 102x 102x 102
Iterations = 200
Time in seconds = 52.59
Total processes = 4
Active processes= 4
Mop/s total = 13350.83
Mop/s/process = 3337.71
Operation type = floating point
Verification = SUCCESSFUL
Version = 3.4.2
Compile date = 22 Sep 2022

Compile options:
MPIFC = /opt/intel/oneapi/mpi/latest/bin/mpif90
FLINK = $(MPIFC)
FMPI_LIB = -L/opt/intel/oneapi/mpi/latest/lib -lmpi
FMPI_INC = -I/opt/intel/oneapi/mpi/latest/include
FFLAGS = -O3
FLINKFLAGS = $(FFLAGS)
RAND = (none)


Please send feedbacks and/or the results of this run to:

NPB Development Team
Internet: npb@nas.nasa.gov

 

Regards, K. Kunita

0 Kudos
Kuni
New Contributor I
15,198 Views

Hi Santosh,

 

Thank you for your quick response.

 

Following is the screen output of the requested commands.  It is executed on machine 1 (3rd Gen Xeon scallable Processor).

If you want to show same things on the other machine, please ask me. 

 

$ fi_info -l
psm2:
version: 113.20
mlx:
version: 1.4
psm3:
version: 1103.0
psm3:
version: 1102.0
ofi_rxm:
version: 113.20
verbs:
version: 113.20
verbs:
version: 113.20
tcp:
version: 113.20
sockets:
version: 113.20
shm:
version: 114.0
ofi_hook_noop:
version: 113.20

ibv_devinfo
hca_id: rdmap24s0f0
transport: InfiniBand (0)
fw_ver: 1.60
node_guid: 669d:99ff:feff:ff5e
sys_image_guid: 649d:99ff:ff5e:0000
vendor_id: 0x8086
vendor_part_id: 5522
hw_ver: 0x2
phys_port_cnt: 1
port: 1
state: PORT_ACTIVE (4)
max_mtu: 4096 (5)
active_mtu: 1024 (3)
sm_lid: 0
port_lid: 1
port_lmc: 0x00
link_layer: Ethernet

hca_id: irdma1
transport: InfiniBand (0)
fw_ver: 1.60
node_guid: 669d:99ff:feff:ff5f
sys_image_guid: 649d:99ff:ff5f:0000
vendor_id: 0x8086
vendor_part_id: 5522
hw_ver: 0x2
phys_port_cnt: 1
port: 1
state: PORT_DOWN (1)
max_mtu: 4096 (5)
active_mtu: 1024 (3)
sm_lid: 0
port_lid: 1
port_lmc: 0x00
link_layer: Ethernet

 

[kkunita@svr4 NPB3.4-MPI]$ lspci | grep Mellanox


[kkunita@svr4 NPB3.4-MPI]$ lspci | grep Omini-Path

 

$ I_MPI_DEBUG=30 mpirun -n 4 ./bin/bt.B.x
[0] MPI startup(): Intel(R) MPI Library, Version 2021.7 Build 20221022 (id: f7b29a2495)
[0] MPI startup(): Copyright (C) 2003-2022 Intel Corporation. All rights reserved.
[0] MPI startup(): library kind: release
[0] MPI startup(): shm segment size (342 MB per rank) * (4 local ranks) = 1368 MB total
[0] MPI startup(): libfabric version: 1.13.2rc1-impi
libfabric:29820:core:core:ofi_hmem_init():209<info> Hmem iface FI_HMEM_CUDA not supported
libfabric:29820:core:core:ofi_hmem_init():209<info> Hmem iface FI_HMEM_ROCR not supported
libfabric:29820:core:core:ze_hmem_dl_init():422<warn> Failed to dlopen libze_loader.so
libfabric:29820:core:core:ofi_hmem_init():214<warn> Failed to initialize hmem iface FI_HMEM_ZE: No data available
libfabric:29820:core:core:ofi_register_provider():474<info> registering provider: verbs (113.20)
libfabric:29820:core:core:ofi_register_provider():474<info> registering provider: verbs (113.20)
libfabric:29820:core:core:ofi_register_provider():474<info> registering provider: tcp (113.20)
libfabric:29820:core:core:ofi_register_provider():474<info> registering provider: sockets (113.20)
libfabric:29820:core:core:ofi_hmem_init():222<info> Hmem iface FI_HMEM_CUDA not supported
libfabric:29820:core:core:ofi_hmem_init():222<info> Hmem iface FI_HMEM_ROCR not supported
libfabric:29820:core:core:ofi_hmem_init():222<info> Hmem iface FI_HMEM_ZE not supported
libfabric:29820:core:core:ofi_register_provider():474<info> registering provider: shm (114.0)
libfabric:29820:core:core:ofi_hmem_init():209<info> Hmem iface FI_HMEM_CUDA not supported
libfabric:29820:core:core:ofi_hmem_init():209<info> Hmem iface FI_HMEM_ROCR not supported
libfabric:29820:core:core:ze_hmem_dl_init():422<warn> Failed to dlopen libze_loader.so
libfabric:29820:core:core:ofi_hmem_init():214<warn> Failed to initialize hmem iface FI_HMEM_ZE: No data available
libfabric:29820:core:core:ofi_register_provider():474<info> registering provider: ofi_rxm (113.20)
libfabric:29820:core:core:ofi_register_provider():474<info> registering provider: psm2 (113.20)
libfabric:29820:psm3:core:fi_prov_ini():752<info> build options: VERSION=1102.0=11.2.0.0, HAVE_PSM3_src=1, PSM3_CUDA=0
libfabric:29820:core:core:ofi_register_provider():474<info> registering provider: psm3 (1102.0)
libfabric:29820:core:core:ofi_register_provider():474<info> registering provider: mlx (1.4)
libfabric:29820:core:core:ofi_hmem_init():222<info> Hmem iface FI_HMEM_CUDA not supported
libfabric:29820:core:core:ofi_hmem_init():222<info> Hmem iface FI_HMEM_ROCR not supported
libfabric:29820:core:core:ofi_hmem_init():222<info> Hmem iface FI_HMEM_ZE not supported
libfabric:29820:psm3:core:fi_prov_ini():785<info> build options: VERSION=1103.0=11.3.0.0, HAVE_PSM3_src=1, PSM3_CUDA=0
libfabric:29820:core:core:ofi_register_provider():474<info> registering provider: psm3 (1103.0)
libfabric:29820:core:core:ofi_register_provider():474<info> registering provider: ofi_hook_noop (113.20)
libfabric:29820:core:core:fi_getinfo_():1138<info> Found provider with the highest priority psm2, must_use_util_prov = 0
libfabric:29820:core:core:fi_getinfo_():1201<info> Start regular provider search because provider with the highest priority psm2 can not be initialized
libfabric:29820:core:core:fi_getinfo_():1138<info> Found provider with the highest priority psm2, must_use_util_prov = 0
libfabric:29820:core:core:fi_getinfo_():1201<info> Start regular provider search because provider with the highest priority psm2 can not be initialized
libfabric:29820:core:core:ofi_layering_ok():1007<info> Skipping util;psm3 layering
libfabric:29820:core:core:ofi_layering_ok():1001<info> Need core provider, skipping ofi_rxm
libfabric:29820:core:core:ofi_hmem_init():209<info> Hmem iface FI_HMEM_CUDA not supported
libfabric:29820:core:core:ofi_hmem_init():209<info> Hmem iface FI_HMEM_ROCR not supported
libfabric:29820:core:core:ze_hmem_dl_init():422<warn> Failed to dlopen libze_loader.so
libfabric:29820:core:core:ofi_hmem_init():214<warn> Failed to initialize hmem iface FI_HMEM_ZE: No data available
libfabric:29820:core:core:ofi_layering_ok():1007<info> Skipping util;sockets layering
libfabric:29820:core:core:ofi_layering_ok():1007<info> Skipping util;shm layering
libfabric:29820:core:core:fi_getinfo_():1138<info> Found provider with the highest priority psm2, must_use_util_prov = 0
libfabric:29820:core:core:fi_getinfo_():1201<info> Start regular provider search because provider with the highest priority psm2 can not be initialized
libfabric:29820:core:core:ofi_layering_ok():1007<info> Skipping util;psm3 layering
libfabric:29820:core:core:ofi_layering_ok():1001<info> Need core provider, skipping ofi_rxm
libfabric:29820:core:core:ofi_layering_ok():1007<info> Skipping util;sockets layering
libfabric:29820:core:core:ofi_layering_ok():1007<info> Skipping util;shm layering
libfabric:29820:core:core:fi_getinfo_():1138<info> Found provider with the highest priority psm2, must_use_util_prov = 0
libfabric:29820:core:core:fi_getinfo_():1201<info> Start regular provider search because provider with the highest priority psm2 can not be initialized
libfabric:29820:core:core:ofi_layering_ok():1007<info> Skipping util;psm3 layering
libfabric:29820:core:core:ofi_layering_ok():1001<info> Need core provider, skipping ofi_rxm
libfabric:29820:core:core:ofi_layering_ok():1007<info> Skipping util;sockets layering
libfabric:29820:core:core:ofi_layering_ok():1007<info> Skipping util;shm layering
libfabric:29820:core:core:fi_getinfo_():1138<info> Found provider with the highest priority psm2, must_use_util_prov = 0
libfabric:29820:core:core:fi_getinfo_():1201<info> Start regular provider search because provider with the highest priority psm2 can not be initialized
libfabric:29820:core:core:ofi_layering_ok():1007<info> Skipping util;psm3 layering
libfabric:29820:core:core:ofi_layering_ok():1001<info> Need core provider, skipping ofi_rxm
libfabric:29820:core:core:ofi_layering_ok():1007<info> Skipping util;sockets layering
libfabric:29820:core:core:ofi_layering_ok():1007<info> Skipping util;shm layering
[0] MPI startup(): max_ch4_vnis: 1, max_reg_eps 64, enable_sep 0, enable_shared_ctxs 0, do_av_insert 0
[0] MPI startup(): max number of MPI_Request per vci: 67108864 (pools: 1)
libfabric:29820:core:core:fi_getinfo_():1138<info> Found provider with the highest priority psm2, must_use_util_prov = 0
libfabric:29820:core:core:fi_getinfo_():1201<info> Start regular provider search because provider with the highest priority psm2 can not be initialized
[0] MPI startup(): libfabric provider: psm3
[0] MPI startup(): detected psm3 provider, set device name to "psm3"
libfabric:29820:core:core:fi_fabric_():1423<info> Opened fabric: RoCE-192.168.17.0/24
libfabric:29820:core:core:ofi_shm_map():171<warn> shm_open failed
[0] MPI startup(): addrnamelen: 32
libfabric:29820:core:core:ofi_ns_add_local_name():370<warn> Cannot add local name - name server uninitialized
[0] MPI startup(): File "/opt/intel/oneapi/mpi/2021.7.1/etc/tuning_icx_shm-ofi_psm3_100.dat" not found
[0] MPI startup(): Load tuning file: "/opt/intel/oneapi/mpi/2021.7.1/etc/tuning_icx_shm-ofi_psm3.dat"
[0] MPI startup(): threading: mode: direct
[0] MPI startup(): threading: vcis: 1
[0] MPI startup(): threading: app_threads: -1
[0] MPI startup(): threading: runtime: generic
[0] MPI startup(): threading: progress_threads: 0
[0] MPI startup(): threading: async_progress: 0
[0] MPI startup(): threading: lock_level: global
[0] MPI startup(): threading: num_pools: 1
[0] MPI startup(): threading: enable_sep: 0
[0] MPI startup(): threading: direct_recv: 1
[0] MPI startup(): threading: zero_op_flags: 0
[0] MPI startup(): threading: num_am_buffers: 1
[0] MPI startup(): tag bits available: 30 (TAG_UB value: 1073741823)
[0] MPI startup(): source bits available: 30 (Maximal number of rank: 1073741823)
[0] MPI startup(): Rank Pid Node name Pin cpu
[0] MPI startup(): 0 29820 svr4 {0,1,2,12,13,14}
[0] MPI startup(): 1 29821 svr4 {3,4,5,15,16,17}
[0] MPI startup(): 2 29822 svr4 {6,7,8,18,19,20}
[0] MPI startup(): 3 29823 svr4 {9,10,11,21,22,23}
[0] MPI startup(): I_MPI_ROOT=/opt/intel/oneapi/mpi/2021.7.1
[0] MPI startup(): I_MPI_MPIRUN=mpirun
[0] MPI startup(): I_MPI_HYDRA_TOPOLIB=hwloc
[0] MPI startup(): I_MPI_INTERNAL_MEM_POLICY=default
[0] MPI startup(): I_MPI_DEBUG=30
[0] allocate handle (kind=1, size=744, direct_size=8, indirect_size=1) ptr=0x7f2002f57d80
[0] allocate handle (kind=2, size=40, direct_size=8, indirect_size=1) ptr=0x7f10000d5900


NAS Parallel Benchmarks 3.4 -- BT Benchmark

No input file inputbt.data. Using compiled defaults
Size: 102x 102x 102 (class B)
Iterations: 200 dt: 0.0003000
Total number of processes: 4

Time step 1
Time step 20
Time step 40
Time step 60
Time step 80
Time step 100
Time step 120
Time step 140
Time step 160
Time step 180
Time step 200
Verification being performed for class B
accuracy setting for epsilon = 0.1000000000000E-07
Comparison of RMS-norms of residual
1 0.1423359722929E+04 0.1423359722929E+04 0.1070287152945E-13
2 0.9933052259015E+02 0.9933052259015E+02 0.7153317200312E-15
3 0.3564602564454E+03 0.3564602564454E+03 0.5900255245348E-14
4 0.3248544795908E+03 0.3248544795908E+03 0.9798945854817E-14
5 0.3270754125466E+04 0.3270754125466E+04 0.1223502756335E-13
Comparison of RMS-norms of solution error
1 0.5296984714094E+02 0.5296984714094E+02 0.9389868800427E-15
2 0.4463289611567E+01 0.4463289611567E+01 0.1293476388601E-13
3 0.1312257334221E+02 0.1312257334221E+02 0.1258908460682E-13
4 0.1200692532356E+02 0.1200692532356E+02 0.6805440394643E-14
5 0.1245957615104E+03 0.1245957615104E+03 0.1003690013030E-13
Verification Successful


BT Benchmark Completed.
Class = B
Size = 102x 102x 102
Iterations = 200
Time in seconds = 52.97
Total processes = 4
Active processes= 4
Mop/s total = 13256.00
Mop/s/process = 3314.00
Operation type = floating point
Verification = SUCCESSFUL
Version = 3.4.2
Compile date = 22 Sep 2022

Compile options:
MPIFC = /opt/intel/oneapi/mpi/latest/bin/mpif90
FLINK = $(MPIFC)
FMPI_LIB = -L/opt/intel/oneapi/mpi/latest/lib -lmpi
FMPI_INC = -I/opt/intel/oneapi/mpi/latest/include
FFLAGS = -O3
FLINKFLAGS = $(FFLAGS)
RAND = (none)


Please send feedbacks and/or the results of this run to:

NPB Development Team
Internet: npb@nas.nasa.gov

 

Regards, K. Kunita

0 Kudos
Kuni
New Contributor I
15,198 Views

Hi Santosh,

 

Thank you for your quick response.

 

Following is the screen output of the requested commands.  It is executed on machine 1 (3rd Gen Xeon scallable Processor).

If you want to show same things on the other machine, please ask me. 

 

$ fi_info -l
psm2:
version: 113.20
mlx:
version: 1.4
psm3:
version: 1103.0
psm3:
version: 1102.0
ofi_rxm:
version: 113.20
verbs:
version: 113.20
verbs:
version: 113.20
tcp:
version: 113.20
sockets:
version: 113.20
shm:
version: 114.0
ofi_hook_noop:
version: 113.20

ibv_devinfo
hca_id: rdmap24s0f0
transport: InfiniBand (0)
fw_ver: 1.60
node_guid: 669d:99ff:feff:ff5e
sys_image_guid: 649d:99ff:ff5e:0000
vendor_id: 0x8086
vendor_part_id: 5522
hw_ver: 0x2
phys_port_cnt: 1
port: 1
state: PORT_ACTIVE (4)
max_mtu: 4096 (5)
active_mtu: 1024 (3)
sm_lid: 0
port_lid: 1
port_lmc: 0x00
link_layer: Ethernet

hca_id: irdma1
transport: InfiniBand (0)
fw_ver: 1.60
node_guid: 669d:99ff:feff:ff5f
sys_image_guid: 649d:99ff:ff5f:0000
vendor_id: 0x8086
vendor_part_id: 5522
hw_ver: 0x2
phys_port_cnt: 1
port: 1
state: PORT_DOWN (1)
max_mtu: 4096 (5)
active_mtu: 1024 (3)
sm_lid: 0
port_lid: 1
port_lmc: 0x00
link_layer: Ethernet

 

[kkunita@svr4 NPB3.4-MPI]$ lspci | grep Mellanox


[kkunita@svr4 NPB3.4-MPI]$ lspci | grep Omini-Path

 

$ I_MPI_DEBUG=30 mpirun -n 4 ./bin/bt.B.x
[0] MPI startup(): Intel(R) MPI Library, Version 2021.7 Build 20221022 (id: f7b29a2495)
[0] MPI startup(): Copyright (C) 2003-2022 Intel Corporation. All rights reserved.
[0] MPI startup(): library kind: release
[0] MPI startup(): shm segment size (342 MB per rank) * (4 local ranks) = 1368 MB total
[0] MPI startup(): libfabric version: 1.13.2rc1-impi
libfabric:29820:core:core:ofi_hmem_init():209<info> Hmem iface FI_HMEM_CUDA not supported
libfabric:29820:core:core:ofi_hmem_init():209<info> Hmem iface FI_HMEM_ROCR not supported
libfabric:29820:core:core:ze_hmem_dl_init():422<warn> Failed to dlopen libze_loader.so
libfabric:29820:core:core:ofi_hmem_init():214<warn> Failed to initialize hmem iface FI_HMEM_ZE: No data available
libfabric:29820:core:core:ofi_register_provider():474<info> registering provider: verbs (113.20)
libfabric:29820:core:core:ofi_register_provider():474<info> registering provider: verbs (113.20)
libfabric:29820:core:core:ofi_register_provider():474<info> registering provider: tcp (113.20)
libfabric:29820:core:core:ofi_register_provider():474<info> registering provider: sockets (113.20)
libfabric:29820:core:core:ofi_hmem_init():222<info> Hmem iface FI_HMEM_CUDA not supported
libfabric:29820:core:core:ofi_hmem_init():222<info> Hmem iface FI_HMEM_ROCR not supported
libfabric:29820:core:core:ofi_hmem_init():222<info> Hmem iface FI_HMEM_ZE not supported
libfabric:29820:core:core:ofi_register_provider():474<info> registering provider: shm (114.0)
libfabric:29820:core:core:ofi_hmem_init():209<info> Hmem iface FI_HMEM_CUDA not supported
libfabric:29820:core:core:ofi_hmem_init():209<info> Hmem iface FI_HMEM_ROCR not supported
libfabric:29820:core:core:ze_hmem_dl_init():422<warn> Failed to dlopen libze_loader.so
libfabric:29820:core:core:ofi_hmem_init():214<warn> Failed to initialize hmem iface FI_HMEM_ZE: No data available
libfabric:29820:core:core:ofi_register_provider():474<info> registering provider: ofi_rxm (113.20)
libfabric:29820:core:core:ofi_register_provider():474<info> registering provider: psm2 (113.20)
libfabric:29820:psm3:core:fi_prov_ini():752<info> build options: VERSION=1102.0=11.2.0.0, HAVE_PSM3_src=1, PSM3_CUDA=0
libfabric:29820:core:core:ofi_register_provider():474<info> registering provider: psm3 (1102.0)
libfabric:29820:core:core:ofi_register_provider():474<info> registering provider: mlx (1.4)
libfabric:29820:core:core:ofi_hmem_init():222<info> Hmem iface FI_HMEM_CUDA not supported
libfabric:29820:core:core:ofi_hmem_init():222<info> Hmem iface FI_HMEM_ROCR not supported
libfabric:29820:core:core:ofi_hmem_init():222<info> Hmem iface FI_HMEM_ZE not supported
libfabric:29820:psm3:core:fi_prov_ini():785<info> build options: VERSION=1103.0=11.3.0.0, HAVE_PSM3_src=1, PSM3_CUDA=0
libfabric:29820:core:core:ofi_register_provider():474<info> registering provider: psm3 (1103.0)
libfabric:29820:core:core:ofi_register_provider():474<info> registering provider: ofi_hook_noop (113.20)
libfabric:29820:core:core:fi_getinfo_():1138<info> Found provider with the highest priority psm2, must_use_util_prov = 0
libfabric:29820:core:core:fi_getinfo_():1201<info> Start regular provider search because provider with the highest priority psm2 can not be initialized
libfabric:29820:core:core:fi_getinfo_():1138<info> Found provider with the highest priority psm2, must_use_util_prov = 0
libfabric:29820:core:core:fi_getinfo_():1201<info> Start regular provider search because provider with the highest priority psm2 can not be initialized
libfabric:29820:core:core:ofi_layering_ok():1007<info> Skipping util;psm3 layering
libfabric:29820:core:core:ofi_layering_ok():1001<info> Need core provider, skipping ofi_rxm
libfabric:29820:core:core:ofi_hmem_init():209<info> Hmem iface FI_HMEM_CUDA not supported
libfabric:29820:core:core:ofi_hmem_init():209<info> Hmem iface FI_HMEM_ROCR not supported
libfabric:29820:core:core:ze_hmem_dl_init():422<warn> Failed to dlopen libze_loader.so
libfabric:29820:core:core:ofi_hmem_init():214<warn> Failed to initialize hmem iface FI_HMEM_ZE: No data available
libfabric:29820:core:core:ofi_layering_ok():1007<info> Skipping util;sockets layering
libfabric:29820:core:core:ofi_layering_ok():1007<info> Skipping util;shm layering
libfabric:29820:core:core:fi_getinfo_():1138<info> Found provider with the highest priority psm2, must_use_util_prov = 0
libfabric:29820:core:core:fi_getinfo_():1201<info> Start regular provider search because provider with the highest priority psm2 can not be initialized
libfabric:29820:core:core:ofi_layering_ok():1007<info> Skipping util;psm3 layering
libfabric:29820:core:core:ofi_layering_ok():1001<info> Need core provider, skipping ofi_rxm
libfabric:29820:core:core:ofi_layering_ok():1007<info> Skipping util;sockets layering
libfabric:29820:core:core:ofi_layering_ok():1007<info> Skipping util;shm layering
libfabric:29820:core:core:fi_getinfo_():1138<info> Found provider with the highest priority psm2, must_use_util_prov = 0
libfabric:29820:core:core:fi_getinfo_():1201<info> Start regular provider search because provider with the highest priority psm2 can not be initialized
libfabric:29820:core:core:ofi_layering_ok():1007<info> Skipping util;psm3 layering
libfabric:29820:core:core:ofi_layering_ok():1001<info> Need core provider, skipping ofi_rxm
libfabric:29820:core:core:ofi_layering_ok():1007<info> Skipping util;sockets layering
libfabric:29820:core:core:ofi_layering_ok():1007<info> Skipping util;shm layering
libfabric:29820:core:core:fi_getinfo_():1138<info> Found provider with the highest priority psm2, must_use_util_prov = 0
libfabric:29820:core:core:fi_getinfo_():1201<info> Start regular provider search because provider with the highest priority psm2 can not be initialized
libfabric:29820:core:core:ofi_layering_ok():1007<info> Skipping util;psm3 layering
libfabric:29820:core:core:ofi_layering_ok():1001<info> Need core provider, skipping ofi_rxm
libfabric:29820:core:core:ofi_layering_ok():1007<info> Skipping util;sockets layering
libfabric:29820:core:core:ofi_layering_ok():1007<info> Skipping util;shm layering
[0] MPI startup(): max_ch4_vnis: 1, max_reg_eps 64, enable_sep 0, enable_shared_ctxs 0, do_av_insert 0
[0] MPI startup(): max number of MPI_Request per vci: 67108864 (pools: 1)
libfabric:29820:core:core:fi_getinfo_():1138<info> Found provider with the highest priority psm2, must_use_util_prov = 0
libfabric:29820:core:core:fi_getinfo_():1201<info> Start regular provider search because provider with the highest priority psm2 can not be initialized
[0] MPI startup(): libfabric provider: psm3
[0] MPI startup(): detected psm3 provider, set device name to "psm3"
libfabric:29820:core:core:fi_fabric_():1423<info> Opened fabric: RoCE-192.168.17.0/24
libfabric:29820:core:core:ofi_shm_map():171<warn> shm_open failed
[0] MPI startup(): addrnamelen: 32
libfabric:29820:core:core:ofi_ns_add_local_name():370<warn> Cannot add local name - name server uninitialized
[0] MPI startup(): File "/opt/intel/oneapi/mpi/2021.7.1/etc/tuning_icx_shm-ofi_psm3_100.dat" not found
[0] MPI startup(): Load tuning file: "/opt/intel/oneapi/mpi/2021.7.1/etc/tuning_icx_shm-ofi_psm3.dat"
[0] MPI startup(): threading: mode: direct
[0] MPI startup(): threading: vcis: 1
[0] MPI startup(): threading: app_threads: -1
[0] MPI startup(): threading: runtime: generic
[0] MPI startup(): threading: progress_threads: 0
[0] MPI startup(): threading: async_progress: 0
[0] MPI startup(): threading: lock_level: global
[0] MPI startup(): threading: num_pools: 1
[0] MPI startup(): threading: enable_sep: 0
[0] MPI startup(): threading: direct_recv: 1
[0] MPI startup(): threading: zero_op_flags: 0
[0] MPI startup(): threading: num_am_buffers: 1
[0] MPI startup(): tag bits available: 30 (TAG_UB value: 1073741823)
[0] MPI startup(): source bits available: 30 (Maximal number of rank: 1073741823)
[0] MPI startup(): Rank Pid Node name Pin cpu
[0] MPI startup(): 0 29820 svr4 {0,1,2,12,13,14}
[0] MPI startup(): 1 29821 svr4 {3,4,5,15,16,17}
[0] MPI startup(): 2 29822 svr4 {6,7,8,18,19,20}
[0] MPI startup(): 3 29823 svr4 {9,10,11,21,22,23}
[0] MPI startup(): I_MPI_ROOT=/opt/intel/oneapi/mpi/2021.7.1
[0] MPI startup(): I_MPI_MPIRUN=mpirun
[0] MPI startup(): I_MPI_HYDRA_TOPOLIB=hwloc
[0] MPI startup(): I_MPI_INTERNAL_MEM_POLICY=default
[0] MPI startup(): I_MPI_DEBUG=30
[0] allocate handle (kind=1, size=744, direct_size=8, indirect_size=1) ptr=0x7f2002f57d80
[0] allocate handle (kind=2, size=40, direct_size=8, indirect_size=1) ptr=0x7f10000d5900


NAS Parallel Benchmarks 3.4 -- BT Benchmark

No input file inputbt.data. Using compiled defaults
Size: 102x 102x 102 (class B)
Iterations: 200 dt: 0.0003000
Total number of processes: 4

Time step 1
Time step 20
Time step 40
Time step 60
Time step 80
Time step 100
Time step 120
Time step 140
Time step 160
Time step 180
Time step 200
Verification being performed for class B
accuracy setting for epsilon = 0.1000000000000E-07
Comparison of RMS-norms of residual
1 0.1423359722929E+04 0.1423359722929E+04 0.1070287152945E-13
2 0.9933052259015E+02 0.9933052259015E+02 0.7153317200312E-15
3 0.3564602564454E+03 0.3564602564454E+03 0.5900255245348E-14
4 0.3248544795908E+03 0.3248544795908E+03 0.9798945854817E-14
5 0.3270754125466E+04 0.3270754125466E+04 0.1223502756335E-13
Comparison of RMS-norms of solution error
1 0.5296984714094E+02 0.5296984714094E+02 0.9389868800427E-15
2 0.4463289611567E+01 0.4463289611567E+01 0.1293476388601E-13
3 0.1312257334221E+02 0.1312257334221E+02 0.1258908460682E-13
4 0.1200692532356E+02 0.1200692532356E+02 0.6805440394643E-14
5 0.1245957615104E+03 0.1245957615104E+03 0.1003690013030E-13
Verification Successful


BT Benchmark Completed.
Class = B
Size = 102x 102x 102
Iterations = 200
Time in seconds = 52.97
Total processes = 4
Active processes= 4
Mop/s total = 13256.00
Mop/s/process = 3314.00
Operation type = floating point
Verification = SUCCESSFUL
Version = 3.4.2
Compile date = 22 Sep 2022

Compile options:
MPIFC = /opt/intel/oneapi/mpi/latest/bin/mpif90
FLINK = $(MPIFC)
FMPI_LIB = -L/opt/intel/oneapi/mpi/latest/lib -lmpi
FMPI_INC = -I/opt/intel/oneapi/mpi/latest/include
FFLAGS = -O3
FLINKFLAGS = $(FFLAGS)
RAND = (none)


Please send feedbacks and/or the results of this run to:

NPB Development Team
Internet: npb@nas.nasa.gov

 

Regards, K. Kunita

0 Kudos
Kuni
New Contributor I
15,209 Views

Hi Santosh,

 

Thank you for your quick response.

 

It's strange. I tried to show the console log here to answer to your question. But it can not be shown after I did "Post Reply" . Is there some length limitation?

 

Anyway, I attached the text log file whch can show the asnwer to your question. Please look at.

0 Kudos
Kuni
New Contributor I
15,150 Views

Hi,  Santosh,

 

Oh, now I can see the replies what I made and could not see. Then there are 3 same (almost) replies are shown. Please ignore those and see the attached file for your question.

 

Regards, K. Kunita

0 Kudos
SantoshY_Intel
Moderator
15,119 Views

Hi,


Thanks for providing all the requested details.


We are working on your issue & will get back to you soon.


Thanks & regards,

Santosh


0 Kudos
Kuni
New Contributor I
14,955 Views

Do you have any update? Could you tell me if you can reproduce the symtom?  If you want to get the additional information from me, please tell me.

 

Regard, K. Kunita

0 Kudos
SantoshY_Intel
Moderator
14,946 Views

Hi,

 

Sorry for the delay.

 

Could you please let us know if you could run your application without MPI with a single process using the command below?

./bin/bt.B.x 

 

Thanks & Regards,

Santosh

 

0 Kudos
Kuni
New Contributor I
14,932 Views

Yes, I can run it without mpi. 

 

Regards, K. Kunita

0 Kudos
SantoshY_Intel
Moderator
14,920 Views

Hi,

 

Thanks for the confirmation.

 

We couldn't reproduce your issue as we don't have access to the exact infrastructure.

 

We suggest you access the Intel Devcloud & do experiments there. Please get back to us if you still face the issue.

 

Thanks & Regards,

Santosh

 

 

 

0 Kudos
Kuni
New Contributor I
14,904 Views

Is Intel Devcloud vitual machine environment? If so, it is meaningless to try it.  As I showed, The symptom does not happen on vitual machine environment. Only happen with no-virtual machine environment. Could you tell me, how do you tried to reproduce the case (environment information, processor, OS, memory size, NIC (and driver), version of Intel MPI, NPB version, etc..), if you tried same things as me and you could not see the issue, it may be a solution for me or may give something to help to find the cause of the issue.

 

Regard, K. Kunita

 

0 Kudos
SantoshY_Intel
Moderator
14,887 Views

Hi,


>>>"Is Intel Devcloud virtual machine environment?"

No, you can try experimenting on Intel Devcloud & get back to us if you face the same issue.


Thanks & Regards,

Santosh


0 Kudos
Kuni
New Contributor I
14,866 Views

I tired to login to Intel devcloud. I found that the CPU is skylake. The problem is not occured with skylake based Xeon. I only showed with 3rd Gen. Xeon (Ice lake). Then I think that it is meaning-less to try dev clould with 2nd Gen. Xeon. Did you tried to reproduce my problem with 3rd Gen. Xeon scalable processor? I showed the issue only with Intel Xeon Silver 4310 Processor and Xeon Silver 4309Y Processor. I could not see the issue with Intel Xeon silver 4214R.

 

Regards, K. Kunita

0 Kudos
Kuni
New Contributor I
14,740 Views

Santosh,

 

Do you have any comment to my reply?  Devcloud may not helpful to reproduce the issue because Devcloud is using 2nd Gen Xeon scalable processor and the issue is only happened with 3rd gen Xeon.  If you could not reproduce the issue with 3rd Gen Xeon, could you tell me about the detailed information of your environment. It may helpful to find the way to solve the our problem. 

 

Regards, K. Kunita

0 Kudos
SantoshY_Intel
Moderator
14,732 Views

Hi,

 

Thanks for your patience.

 

I tried on Intel Devcloud, where we could access both Intel Xeon Scalable processors & 3rd Gen Intel Xeon Scalable processors.

Command to see the list of available nodes having 3rd Gen Intel Xeon scalable processors:
pbsnodes | grep gold6348 -B 4

Command to launch a node with 3rd Gen Intel Xeon scalable processor:

$qsub -I -l nodes=s002-n001:ppn=2 -d .

 

I can see that you are using mpif90 & mpicc compilers while building the application. I tried using Intel mpiifort & mpiicc compilers and followed the steps mentioned by you.

In the case of using mpiifort & mpiicc, I changed the config/make.def file as shown below:

 

MPIFC = /glob/development-tools/versions/oneapi/2023.0.1/oneapi/mpi/2021.8.0/bin/mpiifort
FMPI_LIB = -L /glob/development-tools/versions/oneapi/2023.0.1/oneapi/mpi/2021.8.0/lib -lmpifort 
FMPI_INC = -I /glob/development-tools/versions/oneapi/2023.0.1/oneapi/mpi/2021.8.0/include
MPICC = /glob/development-tools/versions/oneapi/2023.0.1/oneapi/mpi/2021.8.0/mpiicc
CMPI_LIB = -L /glob/development-tools/versions/oneapi/2023.0.1/oneapi/mpi/2021.8.0/lib -lmpi 
CMPI_INC = -I /glob/development-tools/versions/oneapi/2023.0.1/oneapi/mpi/2021.8.0/include

 

 

Given below are my observations:

Product Collection Intel Xeon Scalable Processor 3rd Gen Intel Xeon Scalable Processor
Model Name Intel(R) Xeon(R) Gold 6128 CPU Intel(R) Xeon(R) Platinum 8358 CPU 
Compilers Used Using mpif90/mpicc Using mpiifort/mpiicc Using mpif90/mpicc Using mpiifort/mpiicc
Test 1 (mpirun -n 4 ./bin/bt.B.x)  51.51 sec 50.78 sec 49.33 sec 45.23 sec
Test 2 (mpirun -n 4 ./bin/bt.C.x) 214.58 sec 214.11 sec 289.47 sec 233.82 sec

 

Thanks & Regards,

Santosh

 

 

 

 

0 Kudos
Kuni
New Contributor I
14,712 Views

Satosh, 

 

Thanks for your reply.

 

I tried to compile/build  with mpiifort. However I can not build/compile the npb executable due to following errors.   

CentOS 7.9 case: 

  ifort: /lib64/libstdc++.so.6: version `GLIBCXX_3.4.21' not found.

AlmaLinux 8.6 case:

  many undefiled references  ( I show some of error lines and command which cause the errors) in the below:

 /opt/intel/oneapi/mpi/latest/bin/mpiifort -O3 -o ../bin/bt.C.x bt.o bt_data.o make_set.o initialize.o exact_solution.o exact_rhs.o set_constants.o adi.o define.o copy_faces.o rhs.o solve_subs.o x_solve.o y_solve.o z_solve.o add.o error.o verify.o setup_mpi.o mpinpb.o ../common/get_active_nprocs.o ../common/print_results.o ../common/timers.o btio.o -L/opt/intel/oneapi/mpi/latest/lib -lmpifort
ld: ../common/get_active_nprocs.o: in function `get_active_nprocs_':
get_active_nprocs.f90:(.text+0x286): undefined reference to `_gfortran_get_environment_variable_i4'
ld: get_active_nprocs.f90:(.text+0x2b9): undefined reference to `_gfortran_compare_string'
ld: get_active_nprocs.f90:(.text+0x2e2): undefined reference to `_gfortran_compare_string'
ld: get_active_nprocs.f90:(.text+0x2ff): undefined reference to `_gfortran_compare_string'

In this case I used follwoing setting in confg/make.def

MPIFC = /opt/intel/oneapi/mpi/latest/bin/mpiifort

FMPI_LIB = -L /opt/intel/oneapi/mpi/latest/lib -lmpifort

FMPI_INC = -I /opt/intel/oneapi/mpi/latest/include

MPICC = /opt/intel/oneapi/mpi/latest/mpicc

CMPI_LIB = -L /opt/intel/oneapi/mpi/latest/lib -lmpi

CMPI_INC = -I  /opt/intel/oneapi/oneapi/mpi/latest/include

 

And also I tried with change "latest" to "2021.8.0" and same errors were shown. 

 

Could you tell me how to erase the errors? 

 

Regards, K. Kunita

0 Kudos
SantoshY_Intel
Moderator
14,708 Views

Hi,

 

>>>"MPICC = /opt/intel/oneapi/mpi/latest/mpicc"

I assume the path is incorrect. "bin/" directory is missing in the above path.

 

To use the Intel mpiicc compiler, modify MPICC in your config/make.def as shown below:

MPICC = /opt/intel/oneapi/mpi/latest/bin/mpiicc -cc=icx

Note: Use mpiicc compiler insted of mpicc compiler.

 

Now, try to build the application again. Since we can build & run the application on Devcloud using a 3rd Gen Intel Xeon scalable processor, you can try experimenting on Intel Devcloud.

 

 

 

Thanks & Regards,

Santosh

 

 

 

 

0 Kudos
Reply