2020 Inactive Modules: 1) impi 2020 Currently Loaded Modules: 1) autotools 3) ohpc 5) ccs/lcc-user (S) 2) prun/1.3 4) ccs/lcc-users/jcyo222 6) intel/19.1.3.304 Where: S: Module is Sticky, requires --force to unload or purge [0] MPI startup(): Intel(R) MPI Library, Version 2019 Update 9 Build 20200923 (id: abd58e492) [0] MPI startup(): Copyright (C) 2003-2020 Intel Corporation. All rights reserved. [0] MPI startup(): library kind: release [0] MPI startup(): libfabric version: 1.10.1-impi libfabric:107610:core:mr:ofi_default_cache_size():56 default cache size=2109042048 libfabric:22236:core:mr:ofi_default_cache_size():56 default cache size=2109042048 libfabric:107610:core:core:ofi_register_provider():418 registering provider: sockets (110.10) libfabric:107610:core:core:ofi_register_provider():446 "sockets" filtered by provider include/exclude list, skipping libfabric:22236:core:core:ofi_register_provider():418 registering provider: sockets (110.10) libfabric:22236:core:core:ofi_register_provider():446 "sockets" filtered by provider include/exclude list, skipping libfabric:107610:core:core:ofi_reg_dl_prov():578 dlopen(/opt/ohpc/pub/intel/compilers_and_libraries_2020.4.304/linux/mpi/intel64/libfabric/lib/prov/libpsmx2-fi.so): libpsm2.so.2: cannot open shared object file: No such file or directory libfabric:22236:core:core:ofi_reg_dl_prov():578 dlopen(/opt/ohpc/pub/intel/compilers_and_libraries_2020.4.304/linux/mpi/intel64/libfabric/lib/prov/libpsmx2-fi.so): libpsm2.so.2: cannot open shared object file: No such file or directory libfabric:107610:core:core:ofi_register_provider():418 registering provider: ofi_rxm (110.10) libfabric:22236:core:core:ofi_register_provider():418 registering provider: ofi_rxm (110.10) libfabric:107610:core:core:ofi_reg_dl_prov():578 dlopen(/opt/ohpc/pub/intel/compilers_and_libraries_2020.4.304/linux/mpi/intel64/libfabric/lib/prov/libefa-fi.so): libefa.so.1: cannot open shared object file: No such file or directory libfabric:22236:core:core:ofi_reg_dl_prov():578 dlopen(/opt/ohpc/pub/intel/compilers_and_libraries_2020.4.304/linux/mpi/intel64/libfabric/lib/prov/libefa-fi.so): libefa.so.1: cannot open shared object file: No such file or directory libfabric:107610:core:core:ofi_register_provider():418 registering provider: tcp (110.10) libfabric:107610:core:core:ofi_register_provider():446 "tcp" filtered by provider include/exclude list, skipping libfabric:22236:core:core:ofi_register_provider():418 registering provider: tcp (110.10) libfabric:22236:core:core:ofi_register_provider():446 "tcp" filtered by provider include/exclude list, skipping libfabric:107610:core:core:ofi_register_provider():418 registering provider: shm (110.10) libfabric:107610:core:core:ofi_register_provider():446 "shm" filtered by provider include/exclude list, skipping libfabric:22236:core:core:ofi_register_provider():418 registering provider: shm (110.10) libfabric:22236:core:core:ofi_register_provider():446 "shm" filtered by provider include/exclude list, skipping libfabric:107610:core:mr:ofi_default_cache_size():56 default cache size=2109042048 libfabric:107610:verbs:fabric:verbs_devs_print():869 list of verbs devices found for FI_EP_MSG: libfabric:107610:verbs:fabric:verbs_devs_print():873 #1 mlx4_0 - IPoIB addresses: libfabric:107610:verbs:fabric:verbs_devs_print():883 10.30.18.103 libfabric:107610:verbs:fabric:verbs_devs_print():883 fe80::202:c903:14:de71 libfabric:107610:verbs:fabric:vrb_get_device_attrs():615 device mlx4_0: first found active port is 1 libfabric:107610:verbs:fabric:vrb_get_device_attrs():615 device mlx4_0: first found active port is 1 libfabric:107610:verbs:fabric:vrb_get_device_attrs():615 device mlx4_0: first found active port is 1 libfabric:107610:core:core:ofi_register_provider():418 registering provider: verbs (110.10) libfabric:107610:core:core:ofi_register_provider():446 "verbs" filtered by provider include/exclude list, skipping libfabric:107610:core:core:ofi_register_provider():418 registering provider: mlx (1.4) libfabric:107610:core:core:ofi_register_provider():418 registering provider: ofi_hook_noop (110.10) libfabric:107610:core:core:fi_getinfo_():1092 Found provider with the highest priority mlx, must_use_util_prov = 0 libfabric:107610:mlx:core:mlx_getinfo():172 used inject size = 1024 libfabric:107610:mlx:core:mlx_getinfo():219 Loaded MLX version 1.6.0 libfabric:107610:mlx:core:mlx_getinfo():266 MLX: spawn support 0 libfabric:107610:core:core:fi_getinfo_():1092 Found provider with the highest priority mlx, must_use_util_prov = 0 libfabric:107610:mlx:core:mlx_getinfo():172 used inject size = 1024 libfabric:107610:mlx:core:mlx_getinfo():219 Loaded MLX version 1.6.0 libfabric:107610:mlx:core:mlx_getinfo():266 MLX: spawn support 0 [0] MPI startup(): libfabric provider: mlx libfabric:107610:mlx:core:mlx_fabric_open():172 libfabric:107610:core:core:fi_fabric_():1372 Opened fabric: mlx libfabric:107610:mlx:core:ofi_check_rx_attr():782 Tx only caps ignored in Rx caps libfabric:107610:mlx:core:ofi_check_tx_attr():880 Rx only caps ignored in Tx caps [0] MPI startup(): max_ch4_vcis: 1, max_reg_eps 1, enable_sep 0, enable_shared_ctxs 0, do_av_insert 1 libfabric:107610:mlx:core:ofi_check_rx_attr():782 Tx only caps ignored in Rx caps libfabric:107610:mlx:core:ofi_check_tx_attr():880 Rx only caps ignored in Tx caps [0] MPI startup(): addrnamelen: 1024 libfabric:107610:mlx:core:mlx_cm_getname_mlx_format():73 Loaded UCP address: [127]... libfabric:22236:core:mr:ofi_default_cache_size():56 default cache size=2109042048 libfabric:22236:verbs:fabric:verbs_devs_print():869 list of verbs devices found for FI_EP_MSG: libfabric:22236:verbs:fabric:verbs_devs_print():873 #1 mlx4_0 - IPoIB addresses: libfabric:22236:verbs:fabric:verbs_devs_print():883 10.30.18.104 libfabric:22236:verbs:fabric:verbs_devs_print():883 fe80::202:c903:14:ddf1 libfabric:22236:verbs:fabric:vrb_get_device_attrs():615 device mlx4_0: first found active port is 1 libfabric:22236:verbs:fabric:vrb_get_device_attrs():615 device mlx4_0: first found active port is 1 libfabric:22236:verbs:fabric:vrb_get_device_attrs():615 device mlx4_0: first found active port is 1 libfabric:22236:core:core:ofi_register_provider():418 registering provider: verbs (110.10) libfabric:22236:core:core:ofi_register_provider():446 "verbs" filtered by provider include/exclude list, skipping libfabric:22236:core:core:ofi_register_provider():418 registering provider: mlx (1.4) libfabric:22236:core:core:ofi_register_provider():418 registering provider: ofi_hook_noop (110.10) libfabric:22236:core:core:fi_getinfo_():1092 Found provider with the highest priority mlx, must_use_util_prov = 0 libfabric:22236:mlx:core:mlx_getinfo():172 used inject size = 1024 libfabric:22236:mlx:core:mlx_getinfo():219 Loaded MLX version 1.6.0 libfabric:22236:mlx:core:mlx_getinfo():266 MLX: spawn support 0 libfabric:22236:core:core:fi_getinfo_():1092 Found provider with the highest priority mlx, must_use_util_prov = 0 libfabric:22236:mlx:core:mlx_getinfo():172 used inject size = 1024 libfabric:22236:mlx:core:mlx_getinfo():219 Loaded MLX version 1.6.0 libfabric:22236:mlx:core:mlx_getinfo():266 MLX: spawn support 0 libfabric:22236:mlx:core:mlx_fabric_open():172 libfabric:22236:core:core:fi_fabric_():1372 Opened fabric: mlx libfabric:22236:mlx:core:ofi_check_rx_attr():782 Tx only caps ignored in Rx caps libfabric:22236:mlx:core:ofi_check_tx_attr():880 Rx only caps ignored in Tx caps libfabric:22236:mlx:core:ofi_check_rx_attr():782 Tx only caps ignored in Rx caps libfabric:22236:mlx:core:ofi_check_tx_attr():880 Rx only caps ignored in Tx caps libfabric:22236:mlx:core:mlx_cm_getname_mlx_format():73 Loaded UCP address: [127]... libfabric:22236:mlx:core:mlx_av_insert():179 Try to insert address #0, offset=0 (size=2) fi_addr=0x7f2000132a00 [1605269643.201248] [cnode003:22236:0] select.c:410 UCX ERROR no active messages transport to : mm/posix - Destination is unreachable, mm/sysv - Destination is unreachable, self/self - Destination is unreachable Abort(1091215) on node 1 (rank 1 in comm 0): Fatal error in PMPI_Init: Other MPI error, error stack: MPIR_Init_thread(136)........: MPID_Init(1149)..............: MPIDI_OFI_mpi_init_hook(1657): OFI get address vector map failed [1605269643.201281] [cnode002:107610:0] select.c:410 UCX ERROR no active messages transport to : mm/posix - Destination is unreachable, mm/sysv - Destination is unreachable, self/self - Destination is unreachable libfabric:107610:mlx:core:mlx_av_insert():179 Try to insert address #0, offset=0 (size=2) fi_addr=0x7f200002cb80 libfabric:107610:mlx:core:mlx_av_insert():189 address inserted libfabric:107610:mlx:core:mlx_av_insert():179 Try to insert address #1, offset=1024 (size=2) fi_addr=0x7f200002cb80 Abort(1091215) on node 0 (rank 0 in comm 0): Fatal error in PMPI_Init: Other MPI error, error stack: MPIR_Init_thread(136)........: MPID_Init(1149)..............: MPIDI_OFI_mpi_init_hook(1657): OFI get address vector map failed