Direct write is requested for job: 1524.lssd530-hs05, but the destination is not usecp-able from lssd530-cs09 :: initializing oneAPI environment ... BASH version = 4.2.46(2)-release :: vpl -- latest :: dpcpp-ct -- latest :: vtune -- latest :: dev-utilities -- latest :: mkl -- latest :: mpi -- latest :: ippcp -- latest :: debugger -- latest :: tbb -- latest :: dpl -- latest :: compiler -- latest :: ccl -- latest :: dnnl -- latest :: dal -- latest :: intelpython -- latest :: advisor -- latest :: ipp -- latest :: oneAPI environment initialized :: Start Sun Feb 7 12:29:06 JST 2021 aps_result_20210206 aps_result_20210207 cluster.txt HPL.dat hpl_native.sbatch test_mpi.sh PBS_ACCOUNT=MPI_Bench PBS_JOBNAME=mpi_bench PBS_ENVIRONMENT=PBS_BATCH PBS_O_WORKDIR=/home/john/tpbs PBS_TASKNUM=1 PBS_O_HOME=/home/john PBS_MOMPORT=15003 PBS_O_QUEUE=xeon1600 PBS_O_LOGNAME=john PBS_O_LANG=en_US.UTF-8 PBS_JOBCOOKIE=04F8D497309591785D5566334AE048F1 PBS_MPI_DEBUG=True PBS_NODENUM=0 PBS_JOBDIR=/home/john PBS_O_SHELL=/bin/bash PBS_JOBID=1524.lssd530-hs05 PBS_O_HOST=lssd530-hs05 PBS_QUEUE=xeon1700 PBS_O_MAIL=/var/spool/mail/john PBS_O_SYSTEM=Linux PBS_NODEFILE=cluster.txt PBS_O_PATH=/utils/spack/bin:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/opt/pbs/bin:/opt/intel/parallel_studio_xe_2019.4.070/bin:/home/john/.local/bin:/home/john/bin PBS_ACCOUNT=MPI_Bench I_MPI_FABRICS=ofi I_MPI_NETMASK=eth I_MPI_STATS=ipm MPI_USE_IB=False PBS_MPI_DEBUG=True I_MPI_HYDRA_IFACE=eno1 I_MPI_ROOT=/utils/opt/intel/impi/2021.1.1/mpi/2021.1.1 Environment above, where is P2P and MPI1 /utils/opt/intel/impi/2021.1.1/intelpython/latest/bin/IMB-P2P /utils/opt/intel/impi/2021.1.1/intelpython/latest/bin/IMB-MPI1 # FI_LOG_LEVEL: String # Specify logging level: warn, trace, info, debug (default: warn) # FI_LOG_PROV: String # Specify specific provider to log (default: all) # FI_LOG_SUBSYS: String # Specify specific subsystem to log (default: all) # FI_PERF_CNTR: String # Performance counter to analyze (default: cpu_instr). Options: cpu_instr, cpu_cycles. # FI_HOOK: String # Intercept calls to underlying provider and apply the specified functionality to them. Hook option: perf (gather performance data) # FI_MR_CACHE_MAX_SIZE: size_t # Defines the total number of bytes for all memory regions that may be tracked by the MR cache. Setting this will reduce the amount of memory not actively in use that may be registered. (default: total memory / number of cpu cores / 2) # FI_MR_CACHE_MAX_COUNT: size_t # Defines the total number of memory regions that may be store in the cache. Setting this will reduce the number of registered regions, regardless of their size, stored in the cache. Setting this to zero will disable MR caching. (default: 1024) # FI_MR_CACHE_MONITOR: String # Define a default memory registration monitor. The monitor checks for virtual to physical memory address changes. Options are: userfaultfd, memhooks and disabled. Userfaultfd is a Linux kernel feature. Memhooks operates by intercepting memory allocation and free calls. Userfaultfd is the default if available on the system. 'disabled' option disables memory caching. # FI_MR_CUDA_CACHE_MONITOR_ENABLED: Boolean (0/1, on/off, true/false, yes/no) # Enable or disable the CUDA cache memory monitor.Monitor is enabled by default. # FI_MR_ROCR_CACHE_MONITOR_ENABLED: Boolean (0/1, on/off, true/false, yes/no) # Enable or disable the ROCR cache memory monitor. Monitor is enabled by default. # FI_PROVIDER: String # Only use specified provider (default: all available) # FI_FORK_UNSAFE: Boolean (0/1, on/off, true/false, yes/no) # Whether use of fork() may be unsafe for some providers (default: no). Setting this to yes could improve performance at the expense of making fork() potentially unsafe # FI_UNIVERSE_SIZE: size_t # Defines the maximum number of processes that will be used by distribute OFI application. The provider uses this to optimize resource allocations (default: provider specific) # FI_PROVIDER_PATH: String # Search for providers in specific path (default: /usr/local/lib/libfabric) FI_PROVIDER_PATH=/utils/opt/intel/impi/2021.1.1/mpi/2021.1.1//libfabric/lib/prov:/usr/lib64/libfabric # FI_SOCKETS_PE_WAITTIME: Integer # sockets: How many milliseconds to spin while waiting for progress # FI_SOCKETS_CONN_TIMEOUT: Integer # sockets: How many milliseconds to wait for one connection establishment # FI_SOCKETS_MAX_CONN_RETRY: Integer # sockets: Number of connection retries before reporting as failure # FI_SOCKETS_DEF_CONN_MAP_SZ: Integer # sockets: Default connection map size # FI_SOCKETS_DEF_AV_SZ: Integer # sockets: Default address vector size # FI_SOCKETS_DEF_CQ_SZ: Integer # sockets: Default completion queue size # FI_SOCKETS_DEF_EQ_SZ: Integer # sockets: Default event queue size # FI_SOCKETS_PE_AFFINITY: String # sockets: If specified, bind the progress thread to the indicated range(s) of Linux virtual processor ID(s). This option is currently not supported on OS X and Windows. Usage: id_start[-id_end[:stride]][,] # FI_SOCKETS_KEEPALIVE_ENABLE: Boolean (0/1, on/off, true/false, yes/no) # sockets: Enable keepalive support # FI_SOCKETS_KEEPALIVE_TIME: Integer # sockets: Idle time in seconds before sending the first keepalive probe # FI_SOCKETS_KEEPALIVE_INTVL: Integer # sockets: Time in seconds between individual keepalive probes # FI_SOCKETS_KEEPALIVE_PROBES: Integer # sockets: Maximum number of keepalive probes sent before dropping the connection # FI_SOCKETS_IFACE: String # sockets: Specify interface name # FI_OFI_RXM_BUFFER_SIZE: size_t # ofi_rxm: Defines the transmit buffer size / inject size (default: 16 KB). Eager protocol would be used to transmit messages of size less than eager limit (FI_OFI_RXM_BUFFER_SIZE - RxM header size (64 B)). Any message whose size is greater than eager limit would be transmitted via rendezvous or SAR (Segmentation And Reassembly) protocol depending on the value of FI_OFI_RXM_SAR_LIMIT). Also, transmit data would be copied up to eager limit. # FI_OFI_RXM_COMP_PER_PROGRESS: Integer # ofi_rxm: Defines the maximum number of MSG provider CQ entries (default: 1) that would be read per progress (RxM CQ read). # FI_OFI_RXM_SAR_LIMIT: size_t # ofi_rxm: Set this environment variable to enable and control RxM SAR (Segmentation And Reassembly) protocol (default: 128 KB). This value should be set greater than eager limit (FI_OFI_RXM_BUFFER_SIZE - RxM protocol header size (64 B)) for SAR to take effect. Messages of size greater than this would be transmitted via rendezvous protocol. # FI_OFI_RXM_USE_SRX: Boolean (0/1, on/off, true/false, yes/no) # ofi_rxm: Set this environment variable to control the RxM receive path. If this variable set to 1 (default: 0), the RxM uses Shared Receive Context. This mode improves memory consumption, but it may increase small message latency as a side-effect. # FI_OFI_RXM_TX_SIZE: size_t # ofi_rxm: Defines default tx context size (default: 1024). # FI_OFI_RXM_RX_SIZE: size_t # ofi_rxm: Defines default rx context size (default: 1024). # FI_OFI_RXM_MSG_TX_SIZE: size_t # ofi_rxm: Defines FI_EP_MSG tx size that would be requested (default: 128). Setting this to 0 would get default value defined by the MSG provider. # FI_OFI_RXM_MSG_RX_SIZE: size_t # ofi_rxm: Defines FI_EP_MSG rx size that would be requested (default: 128). Setting this to 0 would get default value defined by the MSG provider. # FI_OFI_RXM_CM_PROGRESS_INTERVAL: Integer # ofi_rxm: Defines the number of microseconds to wait between function calls to the connection management progression functions during fi_cq_read calls. Higher values may decrease noise during cq polling, but may result in longer connection establishment times. (default: 10000). # FI_OFI_RXM_CQ_EQ_FAIRNESS: Integer # ofi_rxm: Defines the maximum number of message provider CQ entries that can be consecutively read across progress calls without checking to see if the CM progress interval has been reached. (default: 128). # FI_OFI_RXM_DATA_AUTO_PROGRESS: Boolean (0/1, on/off, true/false, yes/no) # ofi_rxm: Force auto-progress for data transfers even if app requested manual progress (default: false/no). # FI_OFI_RXM_DEF_WAIT_OBJ: String # ofi_rxm: Specifies the default wait object used for blocking operations (e.g. fi_cq_sread). Supported values are: fd and pollfd (default: fd). # FI_OFI_RXM_DEF_TCP_WAIT_OBJ: String # ofi_rxm: See def_wait_obj for description. If set, this overrides the def_wait_obj when running over the tcp provider. # FI_MLX_CONFIG: String # mlx: MLX configuration file name # FI_MLX_INJECT_LIMIT: Integer # mlx: Maximal tinject/inject message size # FI_MLX_NS_PORT: Integer # mlx: MLX Name server port # FI_MLX_NS_ENABLE: Boolean (0/1, on/off, true/false, yes/no) # mlx: Enforce usage of name server for MLX provider # FI_MLX_EP_FLUSH: Boolean (0/1, on/off, true/false, yes/no) # mlx: Use EP flush (Disabled by default) # FI_MLX_NS_IFACE: String # mlx: Specify IPv4 network interface for MLX provider's name server' # FI_MLX_EXTRA_DEBUG: Boolean (0/1, on/off, true/false, yes/no) # mlx: Output transport-level debug information # FI_MLX_ENABLE_SPAWN: Boolean (0/1, on/off, true/false, yes/no) # mlx: Enable dynamic process support (Disabled by default) # FI_MLX_TLS: String # mlx: Specifies transports available for MLX provider (Default: auto) # FI_MR_CACHE_MAX_SIZE: size_t # Defines the total number of bytes for all memory regions that may be tracked by the MR cache. Setting this will reduce the amount of memory not actively in use that may be registered. (default: total memory / number of cpu cores / 2) # FI_MR_CACHE_MAX_COUNT: size_t # Defines the total number of memory regions that may be store in the cache. Setting this will reduce the number of registered regions, regardless of their size, stored in the cache. Setting this to zero will disable MR caching. (default: 1024) # FI_MR_CACHE_MONITOR: String # Define a default memory registration monitor. The monitor checks for virtual to physical memory address changes. Options are: userfaultfd, memhooks and disabled. Userfaultfd is a Linux kernel feature. Memhooks operates by intercepting memory allocation and free calls. Userfaultfd is the default if available on the system. 'disabled' option disables memory caching. # FI_MR_CUDA_CACHE_MONITOR_ENABLED: Boolean (0/1, on/off, true/false, yes/no) # Enable or disable the CUDA cache memory monitor.Monitor is enabled by default. # FI_MR_ROCR_CACHE_MONITOR_ENABLED: Boolean (0/1, on/off, true/false, yes/no) # Enable or disable the ROCR cache memory monitor. Monitor is enabled by default. # FI_VERBS_TX_SIZE: Integer # verbs: Default maximum tx context size (default: 384) # FI_VERBS_RX_SIZE: Integer # verbs: Default maximum rx context size (default: 384) # FI_VERBS_TX_IOV_LIMIT: Integer # verbs: Default maximum tx iov_limit (default: 4) # FI_VERBS_RX_IOV_LIMIT: Integer # verbs: Default maximum rx iov_limit (default: 4) # FI_VERBS_INLINE_SIZE: Integer # verbs: Default maximum inline size. Actual inject size returned in fi_info may be greater (default: 256) # FI_VERBS_MIN_RNR_TIMER: Integer # verbs: Set min_rnr_timer QP attribute (0 - 31) (default: 12) # FI_VERBS_USE_ODP: Boolean (0/1, on/off, true/false, yes/no) # verbs: Enable on-demand paging memory registrations, if supported. This is currently required to register DAX file system mmapped memory. (default: 0) # FI_VERBS_PREFER_XRC: Boolean (0/1, on/off, true/false, yes/no) # verbs: Order XRC transport fi_infosahead of RC. Default orders RC first. (default: 0) # FI_VERBS_XRCD_FILENAME: String # verbs: A file to associate with the XRC domain. (default: /tmp/verbs_xrcd) # FI_VERBS_CQREAD_BUNCH_SIZE: Integer # verbs: The number of entries to be read from the verbs completion queue at a time (default: 8) # FI_VERBS_GID_IDX: Integer # verbs: Set which gid index to use attribute (0 - 255) (default: 0) # FI_VERBS_DEVICE_NAME: String # verbs: The prefix or the full name of the verbs device to use (default: ) # FI_VERBS_IFACE: String # verbs: The prefix or the full name of the network interface associated with the verbs device (default: ) # FI_VERBS_DGRAM_USE_NAME_SERVER: Boolean (0/1, on/off, true/false, yes/no) # verbs: The option that enables/disables OFI Name Server thread that is used to resolve IP-addresses to provider specific addresses. If MPI is used, the NS is disabled by default. (default: 1) # FI_VERBS_DGRAM_NAME_SERVER_PORT: Integer # verbs: The port on which Name Server thread listens incoming connections and requestes. (default: 5678) # FI_TCP_IFACE: String # tcp: Specify interface name FI_TCP_IFACE=eno1 # FI_TCP_PORT_LOW_RANGE: Integer # tcp: define port low range # FI_TCP_PORT_HIGH_RANGE: Integer # tcp: define port high range # FI_SHM_SAR_THRESHOLD: size_t # shm: Max size to use for alternate SAR protocol if CMA is not available before switching to mmap protocol Default: SIZE_MAX (18446744073709551615) # FI_SHM_TX_SIZE: size_t # shm: Max number of outstanding tx operations Default: 1024 # FI_SHM_RX_SIZE: size_t # shm: Max number of outstanding rx operations Default: 1024 HPL Test Sun Feb 7 12:29:06 JST 2021 P2P 2 Node [mpiexec@lssd530-cs09] Gtool options: ====================================== input: 0 count: 1; spawn: 0 --------------- tool set ------------- tool: {aps --collection-mode=omp,mpi} ranks: all arch: none mode: 0 ====================================== [mpiexec@lssd530-cs09] Launch arguments: /utils/opt/intel/impi/2021.1.1/intelpython/latest/bin//hydra_bstrap_proxy --upstream-host 10.2.1.73 --upstream-port 33339 --pgid 0 --launcher ssh --launcher-number 0 --base-path /utils/opt/intel/impi/2021.1.1/intelpython/latest/bin/ --tree-width 16 --tree-level 1 --iface eno1 --time-left -1 --collective-launch 1 --debug --proxy-id 0 --node-id 0 --subtree-size 1 --upstream-fd 7 /utils/opt/intel/impi/2021.1.1/intelpython/latest/bin//hydra_pmi_proxy --usize -1 --gtool-node-wide-mode-exists 0 --gtool-count 1 --gtool-tool 2 aps --collection-mode=omp,mpi --gtool-mode 0 --gtool-ranks all --gtool-arch none --auto-cleanup 1 --abort-signal 9 [mpiexec@lssd530-cs09] Launch arguments: /bin/ssh -q -x lssd530-cs10 /utils/opt/intel/impi/2021.1.1/intelpython/latest/bin//hydra_bstrap_proxy --upstream-host 10.2.1.73 --upstream-port 33339 --pgid 0 --launcher ssh --launcher-number 0 --base-path /utils/opt/intel/impi/2021.1.1/intelpython/latest/bin/ --tree-width 16 --tree-level 1 --iface eno1 --time-left -1 --collective-launch 1 --debug --proxy-id 1 --node-id 1 --subtree-size 1 /utils/opt/intel/impi/2021.1.1/intelpython/latest/bin//hydra_pmi_proxy --usize -1 --gtool-node-wide-mode-exists 0 --gtool-count 1 --gtool-tool 2 aps --collection-mode=omp,mpi --gtool-mode 0 --gtool-ranks all --gtool-arch none --auto-cleanup 1 --abort-signal 9 [proxy:0:0@lssd530-cs09] pmi cmd from fd 6: cmd=init pmi_version=1 pmi_subversion=1 [proxy:0:0@lssd530-cs09] PMI response: cmd=response_to_init pmi_version=1 pmi_subversion=1 rc=0 [proxy:0:0@lssd530-cs09] pmi cmd from fd 6: cmd=get_maxes [proxy:0:0@lssd530-cs09] PMI response: cmd=maxes kvsname_max=256 keylen_max=64 vallen_max=4096 [proxy:0:0@lssd530-cs09] pmi cmd from fd 6: cmd=get_appnum [proxy:0:0@lssd530-cs09] PMI response: cmd=appnum appnum=0 [proxy:0:0@lssd530-cs09] pmi cmd from fd 6: cmd=get_my_kvsname [proxy:0:0@lssd530-cs09] PMI response: cmd=my_kvsname kvsname=kvs_195756_0 [proxy:0:0@lssd530-cs09] pmi cmd from fd 6: cmd=get kvsname=kvs_195756_0 key=PMI_process_mapping [proxy:0:0@lssd530-cs09] PMI response: cmd=get_result rc=0 msg=success value=(vector,(0,2,1)) [proxy:0:0@lssd530-cs09] pmi cmd from fd 6: cmd=barrier_in [proxy:0:1@lssd530-cs10] pmi cmd from fd 4: cmd=init pmi_version=1 pmi_subversion=1 [proxy:0:1@lssd530-cs10] PMI response: cmd=response_to_init pmi_version=1 pmi_subversion=1 rc=0 [proxy:0:1@lssd530-cs10] pmi cmd from fd 4: cmd=get_maxes [proxy:0:1@lssd530-cs10] PMI response: cmd=maxes kvsname_max=256 keylen_max=64 vallen_max=4096 [proxy:0:1@lssd530-cs10] pmi cmd from fd 4: cmd=get_appnum [proxy:0:1@lssd530-cs10] PMI response: cmd=appnum appnum=0 [proxy:0:1@lssd530-cs10] pmi cmd from fd 4: cmd=get_my_kvsname [proxy:0:1@lssd530-cs10] PMI response: cmd=my_kvsname kvsname=kvs_195756_0 [proxy:0:1@lssd530-cs10] pmi cmd from fd 4: cmd=get kvsname=kvs_195756_0 key=PMI_process_mapping [proxy:0:1@lssd530-cs10] PMI response: cmd=get_result rc=0 msg=success value=(vector,(0,2,1)) [proxy:0:0@lssd530-cs09] PMI response: cmd=barrier_out [proxy:0:1@lssd530-cs10] pmi cmd from fd 4: cmd=barrier_in [proxy:0:1@lssd530-cs10] PMI response: cmd=barrier_out [proxy:0:0@lssd530-cs09] pmi cmd from fd 6: cmd=put kvsname=kvs_195756_0 key=bc-0 value=mpi#00B7E6D54961F8A0774008335A254ECEEA612FA7D377CC2B32004C3E5077CCAB33004F430088B6FC02C0060000C04108335A254ECEEA612F8D5377CC2B32004C3E5077CCAB33004F4300881E800500000000002200637577CC2B3200F8D74F00000000004F0300886F0B0C0F869942A82508335A254ECEEA612F478B95BFD63400242E5077CCAB330092010084B6FC02002688335A254ECEEA612F7713BD378634009858D077CCAB33009201008000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000$ [proxy:0:0@lssd530-cs09] PMI response: cmd=put_result rc=0 msg=success [proxy:0:0@lssd530-cs09] pmi cmd from fd 6: cmd=barrier_in [proxy:0:1@lssd530-cs10] pmi cmd from fd 4: cmd=put kvsname=kvs_195756_0 key=bc-1 value=mpi#00BD6EDA5787F46B3C40089DE3CACF008B375FA7D377CC2B32004C3E5077CCAB33004F4300881E5C03C0040000C041089DE3CACF008B375F8D5377CC2B32004C3E5077CCAB33004F43008808800500000000002200637577CC2B3200F8D74F00000000004F03008880DEEECDEC60DE7F25089DE3CACF008B375F478B95BFD63400242E5077CCAB3300920100841E5C030026889DE3CACF008B375F7713BD378634009858D077CCAB33009201008000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000$ [proxy:0:1@lssd530-cs10] PMI response: cmd=put_result rc=0 msg=success [proxy:0:1@lssd530-cs10] pmi cmd from fd 4: cmd=barrier_in [proxy:0:0@lssd530-cs09] PMI response: cmd=barrier_out [proxy:0:0@lssd530-cs09] pmi cmd from fd 6: cmd=get kvsname=kvs_195756_0 key=bc-0 [proxy:0:0@lssd530-cs09] PMI response: cmd=get_result rc=0 msg=success value=mpi#00B7E6D54961F8A0774008335A254ECEEA612FA7D377CC2B32004C3E5077CCAB33004F430088B6FC02C0060000C04108335A254ECEEA612F8D5377CC2B32004C3E5077CCAB33004F4300881E800500000000002200637577CC2B3200F8D74F00000000004F0300886F0B0C0F869942A82508335A254ECEEA612F478B95BFD63400242E5077CCAB330092010084B6FC02002688335A254ECEEA612F7713BD378634009858D077CCAB33009201008000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000$ [proxy:0:0@lssd530-cs09] pmi cmd from fd 6: cmd=get kvsname=kvs_195756_0 key=bc-1 [proxy:0:0@lssd530-cs09] PMI response: cmd=get_result rc=0 msg=success value=mpi#00BD6EDA5787F46B3C40089DE3CACF008B375FA7D377CC2B32004C3E5077CCAB33004F4300881E5C03C0040000C041089DE3CACF008B375F8D5377CC2B32004C3E5077CCAB33004F43008808800500000000002200637577CC2B3200F8D74F00000000004F03008880DEEECDEC60DE7F25089DE3CACF008B375F478B95BFD63400242E5077CCAB3300920100841E5C030026889DE3CACF008B375F7713BD378634009858D077CCAB33009201008000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000$ [0] [1612668549.219220] [lssd530-cs09:195766:0] select.c:445 UCX ERROR no active messages transport to : posix/memory - Destination is unreachable, sysv/memory - Destination is unreachable, self/memory - Destination is unreachable, sockcm/sockaddr - no am bcopy, rdmacm/sockaddr - no am bcopy, cma/memory - no am bcopy, knem/memory - no am bcopy [proxy:0:0@lssd530-cs09] pmi cmd from fd 6: cmd=abort exitcode=1091215 [0] Abort(1091215) on node 0 (rank 0 in comm 0): Fatal error in PMPI_Init: Other MPI error, error stack: [0] MPIR_Init_thread(138)........: [0] MPID_Init(1141)..............: [0] MPIDI_OFI_mpi_init_hook(1647): OFI get address vector map failed [proxy:0:1@lssd530-cs10] PMI response: cmd=barrier_out [proxy:0:1@lssd530-cs10] pmi cmd from fd 4: cmd=get kvsname=kvs_195756_0 key=bc-0 [proxy:0:1@lssd530-cs10] PMI response: cmd=get_result rc=0 msg=success value=mpi#00B7E6D54961F8A0774008335A254ECEEA612FA7D377CC2B32004C3E5077CCAB33004F430088B6FC02C0060000C04108335A254ECEEA612F8D5377CC2B32004C3E5077CCAB33004F4300881E800500000000002200637577CC2B3200F8D74F00000000004F0300886F0B0C0F869942A82508335A254ECEEA612F478B95BFD63400242E5077CCAB330092010084B6FC02002688335A254ECEEA612F7713BD378634009858D077CCAB33009201008000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000$ =================================================================================== = BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES = RANK 1 PID 220188 RUNNING AT lssd530-cs10 = KILLED BY SIGNAL: 9 (Killed) =================================================================================== [mpiexec@lssd530-cs09] Gtool options: ====================================== input: 0 count: 1; spawn: 0 --------------- tool set ------------- tool: {aps --collection-mode=omp,mpi} ranks: all arch: none mode: 0 ====================================== [mpiexec@lssd530-cs09] Launch arguments: /utils/opt/intel/impi/2021.1.1/intelpython/latest/bin//hydra_bstrap_proxy --upstream-host 10.2.1.73 --upstream-port 45213 --pgid 0 --launcher ssh --launcher-number 0 --base-path /utils/opt/intel/impi/2021.1.1/intelpython/latest/bin/ --tree-width 16 --tree-level 1 --iface eno1 --time-left -1 --collective-launch 1 --debug --proxy-id 0 --node-id 0 --subtree-size 1 --upstream-fd 7 /utils/opt/intel/impi/2021.1.1/intelpython/latest/bin//hydra_pmi_proxy --usize -1 --gtool-node-wide-mode-exists 0 --gtool-count 1 --gtool-tool 2 aps --collection-mode=omp,mpi --gtool-mode 0 --gtool-ranks all --gtool-arch none --auto-cleanup 1 --abort-signal 9 [mpiexec@lssd530-cs09] Launch arguments: /bin/ssh -q -x lssd530-cs10 /utils/opt/intel/impi/2021.1.1/intelpython/latest/bin//hydra_bstrap_proxy --upstream-host 10.2.1.73 --upstream-port 45213 --pgid 0 --launcher ssh --launcher-number 0 --base-path /utils/opt/intel/impi/2021.1.1/intelpython/latest/bin/ --tree-width 16 --tree-level 1 --iface eno1 --time-left -1 --collective-launch 1 --debug --proxy-id 1 --node-id 1 --subtree-size 1 /utils/opt/intel/impi/2021.1.1/intelpython/latest/bin//hydra_pmi_proxy --usize -1 --gtool-node-wide-mode-exists 0 --gtool-count 1 --gtool-tool 2 aps --collection-mode=omp,mpi --gtool-mode 0 --gtool-ranks all --gtool-arch none --auto-cleanup 1 --abort-signal 9 [proxy:0:0@lssd530-cs09] pmi cmd from fd 6: cmd=init pmi_version=1 pmi_subversion=1 [proxy:0:0@lssd530-cs09] PMI response: cmd=response_to_init pmi_version=1 pmi_subversion=1 rc=0 [proxy:0:0@lssd530-cs09] pmi cmd from fd 6: cmd=get_maxes [proxy:0:0@lssd530-cs09] PMI response: cmd=maxes kvsname_max=256 keylen_max=64 vallen_max=4096 [proxy:0:0@lssd530-cs09] pmi cmd from fd 6: cmd=get_appnum [proxy:0:0@lssd530-cs09] PMI response: cmd=appnum appnum=0 [proxy:0:0@lssd530-cs09] pmi cmd from fd 6: cmd=get_my_kvsname [proxy:0:0@lssd530-cs09] PMI response: cmd=my_kvsname kvsname=kvs_195782_0 [proxy:0:0@lssd530-cs09] pmi cmd from fd 6: cmd=get kvsname=kvs_195782_0 key=PMI_process_mapping [proxy:0:0@lssd530-cs09] PMI response: cmd=get_result rc=0 msg=success value=(vector,(0,2,1)) [proxy:0:0@lssd530-cs09] pmi cmd from fd 6: cmd=barrier_in [proxy:0:1@lssd530-cs10] pmi cmd from fd 4: cmd=init pmi_version=1 pmi_subversion=1 [proxy:0:1@lssd530-cs10] PMI response: cmd=response_to_init pmi_version=1 pmi_subversion=1 rc=0 [proxy:0:1@lssd530-cs10] pmi cmd from fd 4: cmd=get_maxes [proxy:0:1@lssd530-cs10] PMI response: cmd=maxes kvsname_max=256 keylen_max=64 vallen_max=4096 [proxy:0:1@lssd530-cs10] pmi cmd from fd 4: cmd=get_appnum [proxy:0:1@lssd530-cs10] PMI response: cmd=appnum appnum=0 [proxy:0:1@lssd530-cs10] pmi cmd from fd 4: cmd=get_my_kvsname [proxy:0:1@lssd530-cs10] PMI response: cmd=my_kvsname kvsname=kvs_195782_0 [proxy:0:1@lssd530-cs10] pmi cmd from fd 4: cmd=get kvsname=kvs_195782_0 key=PMI_process_mapping [proxy:0:1@lssd530-cs10] PMI response: cmd=get_result rc=0 msg=success value=(vector,(0,2,1)) [proxy:0:0@lssd530-cs09] PMI response: cmd=barrier_out [proxy:0:1@lssd530-cs10] pmi cmd from fd 4: cmd=barrier_in [proxy:0:1@lssd530-cs10] PMI response: cmd=barrier_out [proxy:0:0@lssd530-cs09] pmi cmd from fd 6: cmd=put kvsname=kvs_195782_0 key=bc-0 value=mpi#006D70D28DAB62CC8A4008335A254ECEEA612FA7D377CC2B32004C3E5077CCAB33004F430088CEFC02C0060000C04108335A254ECEEA612F8D5377CC2B32004C3E5077CCAB33004F43008824800500000000002200637577CC2B3200F8D74F00000000004F0300889612DB27E85D13832508335A254ECEEA612F478B95BFD63400242E5077CCAB330092010084CEFC02002688335A254ECEEA612F7713BD378634009858D077CCAB33009201008000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000$ [proxy:0:0@lssd530-cs09] PMI response: cmd=put_result rc=0 msg=success [proxy:0:0@lssd530-cs09] pmi cmd from fd 6: cmd=barrier_in [proxy:0:1@lssd530-cs10] pmi cmd from fd 4: cmd=put kvsname=kvs_195782_0 key=bc-1 value=mpi#0043FA0F84A670454D40089DE3CACF008B375FA7D377CC2B32004C3E5077CCAB33004F430088585C03C0040000C041089DE3CACF008B375F8D5377CC2B32004C3E5077CCAB33004F4300880E800500000000002200637577CC2B3200F8D74F00000000004F030088501EF78AE932AC3825089DE3CACF008B375F478B95BFD63400242E5077CCAB330092010084585C030026889DE3CACF008B375F7713BD378634009858D077CCAB33009201008000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000$ [proxy:0:1@lssd530-cs10] PMI response: cmd=put_result rc=0 msg=success [proxy:0:1@lssd530-cs10] pmi cmd from fd 4: cmd=barrier_in [proxy:0:0@lssd530-cs09] PMI response: cmd=barrier_out [proxy:0:0@lssd530-cs09] pmi cmd from fd 6: cmd=get kvsname=kvs_195782_0 key=bc-0 [proxy:0:0@lssd530-cs09] PMI response: cmd=get_result rc=0 msg=success value=mpi#006D70D28DAB62CC8A4008335A254ECEEA612FA7D377CC2B32004C3E5077CCAB33004F430088CEFC02C0060000C04108335A254ECEEA612F8D5377CC2B32004C3E5077CCAB33004F43008824800500000000002200637577CC2B3200F8D74F00000000004F0300889612DB27E85D13832508335A254ECEEA612F478B95BFD63400242E5077CCAB330092010084CEFC02002688335A254ECEEA612F7713BD378634009858D077CCAB33009201008000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000$ [proxy:0:0@lssd530-cs09] pmi cmd from fd 6: cmd=get kvsname=kvs_195782_0 key=bc-1 [proxy:0:0@lssd530-cs09] PMI response: cmd=get_result rc=0 msg=success value=mpi#0043FA0F84A670454D40089DE3CACF008B375FA7D377CC2B32004C3E5077CCAB33004F430088585C03C0040000C041089DE3CACF008B375F8D5377CC2B32004C3E5077CCAB33004F4300880E800500000000002200637577CC2B3200F8D74F00000000004F030088501EF78AE932AC3825089DE3CACF008B375F478B95BFD63400242E5077CCAB330092010084585C030026889DE3CACF008B375F7713BD378634009858D077CCAB33009201008000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000$ [0] [1612668550.311236] [lssd530-cs09:195790:0] select.c:445 UCX ERROR no active messages transport to : posix/memory - Destination is unreachable, sysv/memory - Destination is unreachable, self/memory - Destination is unreachable, sockcm/sockaddr - no am bcopy, rdmacm/sockaddr - no am bcopy, cma/memory - no am bcopy, knem/memory - no am bcopy [0] Abort(1091215) on node 0 (rank 0 in comm 0): Fatal error in PMPI_Init_thread: Other MPI error, error stack: [0] MPIR_Init_thread(138)........: [0] MPID_Init(1141)..............: [0] MPIDI_OFI_mpi_init_hook(1647): OFI get address vector map failed [proxy:0:0@lssd530-cs09] pmi cmd from fd 6: cmd=abort exitcode=1091215 [proxy:0:1@lssd530-cs10] PMI response: cmd=barrier_out [proxy:0:1@lssd530-cs10] pmi cmd from fd 4: cmd=get kvsname=kvs_195782_0 key=bc-0 [proxy:0:1@lssd530-cs10] PMI response: cmd=get_result rc=0 msg=success value=mpi#006D70D28DAB62CC8A4008335A254ECEEA612FA7D377CC2B32004C3E5077CCAB33004F430088CEFC02C0060000C04108335A254ECEEA612F8D5377CC2B32004C3E5077CCAB33004F43008824800500000000002200637577CC2B3200F8D74F00000000004F0300889612DB27E85D13832508335A254ECEEA612F478B95BFD63400242E5077CCAB330092010084CEFC02002688335A254ECEEA612F7713BD378634009858D077CCAB33009201008000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000$ =================================================================================== = BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES = RANK 1 PID 220246 RUNNING AT lssd530-cs10 = KILLED BY SIGNAL: 9 (Killed) =================================================================================== Sun Feb 7 12:29:10 JST 2021 [mpiexec@lssd530-cs09] check_exit_codes (../../../../../src/pm/i_hydra/libhydra/demux/hydra_demux_poll.c:117): unable to run bstrap_proxy on lssd530-cs13 (pid 195812, exit code 65280) [mpiexec@lssd530-cs09] poll_for_event (../../../../../src/pm/i_hydra/libhydra/demux/hydra_demux_poll.c:159): check exit codes error [mpiexec@lssd530-cs09] HYD_dmx_poll_wait_for_proxy_event (../../../../../src/pm/i_hydra/libhydra/demux/hydra_demux_poll.c:212): poll for event error [mpiexec@lssd530-cs09] HYD_bstrap_setup (../../../../../src/pm/i_hydra/libhydra/bstrap/src/intel/i_hydra_bstrap.c:772): error waiting for event [mpiexec@lssd530-cs09] main (../../../../../src/pm/i_hydra/mpiexec/mpiexec.c:1955): error setting up the boostrap proxies Sun Feb 7 12:30:12 JST 2021 End