[kkunita@svr4 NPB3.4-MPI]$ fi_info -l psm2: version: 113.20 mlx: version: 1.4 psm3: version: 1103.0 psm3: version: 1102.0 ofi_rxm: version: 113.20 verbs: version: 113.20 verbs: version: 113.20 tcp: version: 113.20 sockets: version: 113.20 shm: version: 114.0 ofi_hook_noop: version: 113.20 [kkunita@svr4 NPB3.4-MPI]$ ibv_devinfo hca_id: rdmap24s0f0 transport: InfiniBand (0) fw_ver: 1.60 node_guid: 669d:99ff:feff:ff5e sys_image_guid: 649d:99ff:ff5e:0000 vendor_id: 0x8086 vendor_part_id: 5522 hw_ver: 0x2 phys_port_cnt: 1 port: 1 state: PORT_ACTIVE (4) max_mtu: 4096 (5) active_mtu: 1024 (3) sm_lid: 0 port_lid: 1 port_lmc: 0x00 link_layer: Ethernet hca_id: irdma1 transport: InfiniBand (0) fw_ver: 1.60 node_guid: 669d:99ff:feff:ff5f sys_image_guid: 649d:99ff:ff5f:0000 vendor_id: 0x8086 vendor_part_id: 5522 hw_ver: 0x2 phys_port_cnt: 1 port: 1 state: PORT_DOWN (1) max_mtu: 4096 (5) active_mtu: 1024 (3) sm_lid: 0 port_lid: 1 port_lmc: 0x00 link_layer: Ethernet [kkunita@svr4 NPB3.4-MPI]$ lspci |grep Mellanox [kkunita@svr4 NPB3.4-MPI]$ lspci |grep Omni_Path [kkunita@svr4 NPB3.4-MPI]$ I_MPI_DEBUG=30 mpirun -n 4 ./bin/bt.B.x [0] MPI startup(): Intel(R) MPI Library, Version 2021.7 Build 20221022 (id: f7b29a2495) [0] MPI startup(): Copyright (C) 2003-2022 Intel Corporation. All rights reserved. [0] MPI startup(): library kind: release [0] MPI startup(): shm segment size (342 MB per rank) * (4 local ranks) = 1368 MB total [0] MPI startup(): libfabric version: 1.13.2rc1-impi libfabric:24286:core:core:ofi_hmem_init():209 Hmem iface FI_HMEM_CUDA not supported libfabric:24286:core:core:ofi_hmem_init():209 Hmem iface FI_HMEM_ROCR not supported libfabric:24286:core:core:ze_hmem_dl_init():422 Failed to dlopen libze_loader.so libfabric:24286:core:core:ofi_hmem_init():214 Failed to initialize hmem iface FI_HMEM_ZE: No data available libfabric:24286:core:core:ofi_register_provider():474 registering provider: verbs (113.20) libfabric:24286:core:core:ofi_register_provider():474 registering provider: verbs (113.20) libfabric:24286:core:core:ofi_register_provider():474 registering provider: tcp (113.20) libfabric:24286:core:core:ofi_register_provider():474 registering provider: sockets (113.20) libfabric:24286:core:core:ofi_hmem_init():222 Hmem iface FI_HMEM_CUDA not supported libfabric:24286:core:core:ofi_hmem_init():222 Hmem iface FI_HMEM_ROCR not supported libfabric:24286:core:core:ofi_hmem_init():222 Hmem iface FI_HMEM_ZE not supported libfabric:24286:core:core:ofi_register_provider():474 registering provider: shm (114.0) libfabric:24286:core:core:ofi_hmem_init():209 Hmem iface FI_HMEM_CUDA not supported libfabric:24286:core:core:ofi_hmem_init():209 Hmem iface FI_HMEM_ROCR not supported libfabric:24286:core:core:ze_hmem_dl_init():422 Failed to dlopen libze_loader.so libfabric:24286:core:core:ofi_hmem_init():214 Failed to initialize hmem iface FI_HMEM_ZE: No data available libfabric:24286:core:core:ofi_register_provider():474 registering provider: ofi_rxm (113.20) libfabric:24286:core:core:ofi_register_provider():474 registering provider: psm2 (113.20) libfabric:24286:psm3:core:fi_prov_ini():752 build options: VERSION=1102.0=11.2.0.0, HAVE_PSM3_src=1, PSM3_CUDA=0 libfabric:24286:core:core:ofi_register_provider():474 registering provider: psm3 (1102.0) libfabric:24286:core:core:ofi_register_provider():474 registering provider: mlx (1.4) libfabric:24286:core:core:ofi_hmem_init():222 Hmem iface FI_HMEM_CUDA not supported libfabric:24286:core:core:ofi_hmem_init():222 Hmem iface FI_HMEM_ROCR not supported libfabric:24286:core:core:ofi_hmem_init():222 Hmem iface FI_HMEM_ZE not supported libfabric:24286:psm3:core:fi_prov_ini():785 build options: VERSION=1103.0=11.3.0.0, HAVE_PSM3_src=1, PSM3_CUDA=0 libfabric:24286:core:core:ofi_register_provider():474 registering provider: psm3 (1103.0) libfabric:24286:core:core:ofi_register_provider():474 registering provider: ofi_hook_noop (113.20) libfabric:24286:core:core:fi_getinfo_():1138 Found provider with the highest priority psm2, must_use_util_prov = 0 libfabric:24286:core:core:fi_getinfo_():1201 Start regular provider search because provider with the highest priority psm2 can not be initialized libfabric:24286:core:core:fi_getinfo_():1138 Found provider with the highest priority psm2, must_use_util_prov = 0 libfabric:24286:core:core:fi_getinfo_():1201 Start regular provider search because provider with the highest priority psm2 can not be initialized libfabric:24286:core:core:ofi_layering_ok():1007 Skipping util;psm3 layering libfabric:24286:core:core:ofi_layering_ok():1001 Need core provider, skipping ofi_rxm libfabric:24286:core:core:ofi_hmem_init():209 Hmem iface FI_HMEM_CUDA not supported libfabric:24286:core:core:ofi_hmem_init():209 Hmem iface FI_HMEM_ROCR not supported libfabric:24286:core:core:ze_hmem_dl_init():422 Failed to dlopen libze_loader.so libfabric:24286:core:core:ofi_hmem_init():214 Failed to initialize hmem iface FI_HMEM_ZE: No data available libfabric:24286:core:core:ofi_layering_ok():1007 Skipping util;sockets layering libfabric:24286:core:core:ofi_layering_ok():1007 Skipping util;shm layering libfabric:24286:core:core:fi_getinfo_():1138 Found provider with the highest priority psm2, must_use_util_prov = 0 libfabric:24286:core:core:fi_getinfo_():1201 Start regular provider search because provider with the highest priority psm2 can not be initialized libfabric:24286:core:core:ofi_layering_ok():1007 Skipping util;psm3 layering libfabric:24286:core:core:ofi_layering_ok():1001 Need core provider, skipping ofi_rxm libfabric:24286:core:core:ofi_layering_ok():1007 Skipping util;sockets layering libfabric:24286:core:core:ofi_layering_ok():1007 Skipping util;shm layering libfabric:24286:core:core:fi_getinfo_():1138 Found provider with the highest priority psm2, must_use_util_prov = 0 libfabric:24286:core:core:fi_getinfo_():1201 Start regular provider search because provider with the highest priority psm2 can not be initialized libfabric:24286:core:core:ofi_layering_ok():1007 Skipping util;psm3 layering libfabric:24286:core:core:ofi_layering_ok():1001 Need core provider, skipping ofi_rxm libfabric:24286:core:core:ofi_layering_ok():1007 Skipping util;sockets layering libfabric:24286:core:core:ofi_layering_ok():1007 Skipping util;shm layering libfabric:24286:core:core:fi_getinfo_():1138 Found provider with the highest priority psm2, must_use_util_prov = 0 libfabric:24286:core:core:fi_getinfo_():1201 Start regular provider search because provider with the highest priority psm2 can not be initialized libfabric:24286:core:core:ofi_layering_ok():1007 Skipping util;psm3 layering libfabric:24286:core:core:ofi_layering_ok():1001 Need core provider, skipping ofi_rxm libfabric:24286:core:core:ofi_layering_ok():1007 Skipping util;sockets layering libfabric:24286:core:core:ofi_layering_ok():1007 Skipping util;shm layering [0] MPI startup(): max_ch4_vnis: 1, max_reg_eps 64, enable_sep 0, enable_shared_ctxs 0, do_av_insert 0 [0] MPI startup(): max number of MPI_Request per vci: 67108864 (pools: 1) libfabric:24286:core:core:fi_getinfo_():1138 Found provider with the highest priority psm2, must_use_util_prov = 0 libfabric:24286:core:core:fi_getinfo_():1201 Start regular provider search because provider with the highest priority psm2 can not be initialized [0] MPI startup(): libfabric provider: psm3 [0] MPI startup(): detected psm3 provider, set device name to "psm3" libfabric:24286:core:core:fi_fabric_():1423 Opened fabric: RoCE-192.168.17.0/24 libfabric:24286:core:core:ofi_shm_map():171 shm_open failed libfabric:24286:core:core:ofi_ns_add_local_name():370 Cannot add local name - name server uninitialized [0] MPI startup(): addrnamelen: 32 [0] MPI startup(): File "/opt/intel/oneapi/mpi/2021.7.1/etc/tuning_icx_shm-ofi_psm3_100.dat" not found [0] MPI startup(): Load tuning file: "/opt/intel/oneapi/mpi/2021.7.1/etc/tuning_icx_shm-ofi_psm3.dat" [0] MPI startup(): threading: mode: direct [0] MPI startup(): threading: vcis: 1 [0] MPI startup(): threading: app_threads: -1 [0] MPI startup(): threading: runtime: generic [0] MPI startup(): threading: progress_threads: 0 [0] MPI startup(): threading: async_progress: 0 [0] MPI startup(): threading: lock_level: global [0] MPI startup(): threading: num_pools: 1 [0] MPI startup(): threading: enable_sep: 0 [0] MPI startup(): threading: direct_recv: 1 [0] MPI startup(): threading: zero_op_flags: 0 [0] MPI startup(): threading: num_am_buffers: 1 [0] MPI startup(): tag bits available: 30 (TAG_UB value: 1073741823) [0] MPI startup(): source bits available: 30 (Maximal number of rank: 1073741823) [0] MPI startup(): Rank Pid Node name Pin cpu [0] MPI startup(): 0 24286 svr4 {0,1,2,12,13,14} [0] MPI startup(): 1 24287 svr4 {3,4,5,15,16,17} [0] MPI startup(): 2 24288 svr4 {6,7,8,18,19,20} [0] MPI startup(): 3 24289 svr4 {9,10,11,21,22,23} [0] MPI startup(): I_MPI_ROOT=/opt/intel/oneapi/mpi/2021.7.1 [0] MPI startup(): I_MPI_MPIRUN=mpirun [0] MPI startup(): I_MPI_HYDRA_TOPOLIB=hwloc [0] MPI startup(): I_MPI_INTERNAL_MEM_POLICY=default [0] MPI startup(): I_MPI_DEBUG=30 [0] allocate handle (kind=1, size=744, direct_size=8, indirect_size=1) ptr=0x7f2002efe740 [0] allocate handle (kind=2, size=40, direct_size=8, indirect_size=1) ptr=0x7f100004c440 NAS Parallel Benchmarks 3.4 -- BT Benchmark No input file inputbt.data. Using compiled defaults Size: 102x 102x 102 (class B) Iterations: 200 dt: 0.0003000 Total number of processes: 4 Time step 1 Time step 20 Time step 40 Time step 60 Time step 80 Time step 100 Time step 120 Time step 140 Time step 160 Time step 180 Time step 200 Verification being performed for class B accuracy setting for epsilon = 0.1000000000000E-07 Comparison of RMS-norms of residual 1 0.1423359722929E+04 0.1423359722929E+04 0.1070287152945E-13 2 0.9933052259015E+02 0.9933052259015E+02 0.7153317200312E-15 3 0.3564602564454E+03 0.3564602564454E+03 0.5900255245348E-14 4 0.3248544795908E+03 0.3248544795908E+03 0.9798945854817E-14 5 0.3270754125466E+04 0.3270754125466E+04 0.1223502756335E-13 Comparison of RMS-norms of solution error 1 0.5296984714094E+02 0.5296984714094E+02 0.9389868800427E-15 2 0.4463289611567E+01 0.4463289611567E+01 0.1293476388601E-13 3 0.1312257334221E+02 0.1312257334221E+02 0.1258908460682E-13 4 0.1200692532356E+02 0.1200692532356E+02 0.6805440394643E-14 5 0.1245957615104E+03 0.1245957615104E+03 0.1003690013030E-13 Verification Successful BT Benchmark Completed. Class = B Size = 102x 102x 102 Iterations = 200 Time in seconds = 52.59 Total processes = 4 Active processes= 4 Mop/s total = 13350.83 Mop/s/process = 3337.71 Operation type = floating point Verification = SUCCESSFUL Version = 3.4.2 Compile date = 22 Sep 2022 Compile options: MPIFC = /opt/intel/oneapi/mpi/latest/bin/mpif90 FLINK = $(MPIFC) FMPI_LIB = -L/opt/intel/oneapi/mpi/latest/lib -lmpi FMPI_INC = -I/opt/intel/oneapi/mpi/latest/include FFLAGS = -O3 FLINKFLAGS = $(FFLAGS) RAND = (none) Please send feedbacks and/or the results of this run to: NPB Development Team Internet: npb@nas.nasa.gov