Intel® MPI Library
Get help with building, analyzing, optimizing, and scaling high-performance computing (HPC) applications.
2159 Discussions

MPI applications hangs with a limided number of processes

Kheireddine_Yahyaoui
1,958 Views

Hello,

The issue is similar to the following post:
https://community.intel.com/t5/Intel-oneAPI-HPC-Toolkit/MPI-applications-hangs-with-a-limided-number-of-processes/td-p/1285767

 

Still get the same issue/hung with OneAPI version 2022.1.2


The same script was tested in this version and in the same environment (24 CPU per node), and when the number of processes exceed 2071 the script hung.

 

Any reason/explanition regarding this issue?

 

Please find below the script used in the test:

program main
use mpi
integer :: ierr,rank
call mpi_init(ierr)
call mpi_comm_rank(MPI_COMM_WORLD,rank,ierr)
if (rank.eq.0) print *,'Start'
call test_func(ierr)
if (ierr.ne.0) call exit(ierr)
call mpi_finalize(ierr)
if (rank.eq.0) print *,'Stop'
contains

subroutine test_func(ierr)
integer, intent(out) :: ierr
real :: send,recv
integer :: i,j,status(MPI_STATUS_SIZE),mpi_rank,mpi_size,ires
character(len=10) :: procname
real(kind=8) :: t1,t2

ierr=0
call mpi_comm_size(MPI_COMM_WORLD,mpi_size,ierr)
call mpi_comm_rank(MPI_COMM_WORLD,mpi_rank,ierr)
call mpi_get_processor_name(procname, ires, ierr)
call mpi_barrier(MPI_COMM_WORLD,ierr)
t1 = mpi_wtime()
do j=0,mpi_size-1
  if (mpi_rank.eq.j) then
    do i=0,mpi_size-1
       if (i.eq.j) cycle
      call MPI_RECV(recv,1,MPI_REAL,i,0,MPI_COMM_WORLD,status,ierr)
      if (ierr.ne.0) return
      if (i.eq.mpi_size-1) print *,'Rank ',j,procname,' done'
    enddo
  else
    call MPI_SEND(send,1,MPI_REAL,j,0,MPI_COMM_WORLD,ierr)
    if (ierr.ne.0) return
  endif
enddo
call mpi_barrier(MPI_COMM_WORLD,ierr)
t2 = mpi_wtime()
if (mpi_rank.eq.0) print*,"time send/recv = ",t2-t1
end subroutine test_func
end program main

 

Thanks inadvance

Best regrads,

Kheireddine

 

>>> Frontale node :

ulimit -a
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 514177
max locked memory (kbytes, -l) unlimited
max memory size (kbytes, -m) 30000000
open files (-n) 32768
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 8192
cpu time (seconds, -t) unlimited
max user processes (-u) 4096
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited

uname -a
Linux  3.10.0-957.5.1.el7.x86_64 #1 SMP Tue Jan 29 10:14:19 CST 2019 x86_64 x86_64 x86_64 GNU/Linux

 

ucx_info -d |grep Transport
# Transport: self
# Transport: tcp
# Transport: rc
# Transport: ud
# Transport: mm
# Transport: mm
# Transport: cma

 

>>>> Compute node

ulimit -a
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 513931
max locked memory (kbytes, -l) unlimited
max memory size (kbytes, -m) unlimited
open files (-n) 32768
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 300000
cpu time (seconds, -t) unlimited
max user processes (-u) 4096
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited

Linux  3.10.0-957.el7.x86_64 #1 SMP Thu Oct 4 20:48:51 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

 

 

0 Kudos
10 Replies
SantoshY_Intel
Moderator
1,932 Views

Hi,

 

Thanks for posting in the Intel forums.

 

Could you please provide us with the following details which would help us in further investigation of your issue?

  1. Operating system details using the command:
    $cat /etc/os-release
  2. Is hyperthreading disabled/enabled in your systems?
  3. What is the job scheduler you are using?
  4. What is the command you are using to launch the MPI job on multi nodes with 2072 ranks?
  5. Could you please provide us results of the command(Make sure to initialize oneAPI environment before running the command) :
    $fi_info -l
  6. What is the FI_PROVIDER(mlx/psm2/verbs etc..) you are using?
  7. What is the Interconnect hardware(Infiniband/Intel Omni-Path etc..) you are using?
  8. Also, could you please confirm if you are using Intel(R) MPI Library for Linux* OS, Version 2021.5?

 

Thanks & Regards,

Santosh

 

 

0 Kudos
Kheireddine_Yahyaoui
1,917 Views

Hi Santosh,

Thanks very much for reaching out to us !

Please find below requested/needed details

Best Regards,

Kheireddine

 

1- >>>> $cat /etc/os-release


cat /etc/os-release
NAME="Red Hat Enterprise Linux Server"
VERSION="7.6 (Maipo)"
ID="rhel"
ID_LIKE="fedora"
VARIANT="Server"
VARIANT_ID="server"
VERSION_ID="7.6"
PRETTY_NAME="Red Hat Enterprise Linux Server 7.6 (Maipo)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:redhat:enterprise_linux:7.6:GA:server"
HOME_URL="https://www.redhat.com/"
BUG_REPORT_URL="https://bugzilla.redhat.com/"

REDHAT_BUGZILLA_PRODUCT="Red Hat Enterprise Linux 7"
REDHAT_BUGZILLA_PRODUCT_VERSION=7.6
REDHAT_SUPPORT_PRODUCT="Red Hat Enterprise Linux"
REDHAT_SUPPORT_PRODUCT_VERSION="7.6"

 

2- >>>> Is hyperthreading disabled/enabled in your systems?

hyperthreading is enabled  >>> Thread(s) per core: 2 >>>>>>>>>>>>>>>>>>>>>

Please find below output of the lscpu command 


lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 48
On-line CPU(s) list: 0-47
Thread(s) per core: 2 >>>>>>>>>>>>>>>>>>>>>
Core(s) per socket: 12
Socket(s): 2
NUMA node(s): 2
Vendor ID: GenuineIntel
CPU family: 6
Model: 63
Model name: Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz
Stepping: 2
CPU MHz: 2900.085
CPU max MHz: 3300.0000
CPU min MHz: 1200.0000
BogoMIPS: 4999.86
Virtualization: VT-x
L1d cache: 32K
L1i cache: 32K
L2 cache: 256K
L3 cache: 30720K
NUMA node0 CPU(s): 0-11,24-35
NUMA node1 CPU(s): 12-23,36-47


3 >>>>> What is the job scheduler you are using?
IBM* Platform LSF*


4 >>>>> What is the command you are using to launch the MPI job on multi nodes with 2072 ranks?

 

I'm using the following script

 

#!/bin/sh
#BSUB -J MPIJob_test_CPU
#BSUB -q XXXXX
#BSUB -n 2100
#BSUB -R "span[ptile=24]"
#BSUB -B
#BSUB -N
#BSUB -o ./Output_%J.out
#BSUB -e ./Error_%J.err
mpirun -genv I_MPI_DEBUG=5 -genv I_MPI_DEBUG_OUTPUT=debug_output.txt -np $LSB_DJOB_NUMPROC ./a.out


5 >>>>>>>>>>>>>>> Could you please provide us results of the command(Make sure to initialize oneAPI environment before running the command) : $fi_info -l

 

module list
Currently Loaded Modulefiles:
1) intel_OneAPI/2022.1.2

 

>>>> fi_info -l
psm2:
version: 113.20
mlx:
version: 1.4
psm3:
version: 1101.0
ofi_rxm:
version: 113.20
verbs:
version: 113.20
tcp:
version: 113.20
sockets:
version: 113.20
shm:
version: 113.20
ofi_hook_noop:
version: 113.20

 

6 >>>>>>>>>>> What is the FI_PROVIDER(mlx/psm2/verbs etc..) you are using?


I checked with TCP/MLX and default FI_PROVIDER (not set in the command) same issue
Please find below extract from the logs

libfabric:3672:core:core:fi_getinfo_():1201<info> Start regular provider search because provider with the highest priority verbs can not be initialized
libfabric:37912:verbs:fabric:vrb_get_matching_info():1512<info> checking domain: #1 mlx4_0
libfabric:37912:verbs:core:ofi_check_ep_type():658<info> unsupported endpoint type
libfabric:37912:verbs:core:ofi_check_ep_type():659<info> Supported: FI_EP_MSG
libfabric:37912:verbs:core:ofi_check_ep_type():659<info> Requested: FI_EP_RDM
libfabric:37912:verbs:fabric:vrb_get_matching_info():1512<info> checking domain: #2 mlx4_0-xrc
libfabric:37912:verbs:core:ofi_check_ep_type():658<info> unsupported endpoint type
libfabric:37912:verbs:core:ofi_check_ep_type():659<info> Supported: FI_EP_MSG
libfabric:22721:verbs:fabric:vrb_get_device_attrs():618<info> device mlx4_0: first found active port is 1
libfabric:40180:verbs:mr:vrb_domain():349<info> MR cache enabled for FI_HMEM_SYSTEM memory
libfabric:41635:core:core:fi_getinfo_():1138<info> Found provider with the highest priority verbs, must_use_util_prov = 1
libfabric:41635:core:core:ofi_layering_ok():1001<info> Need core provider, skipping ofi_rxm
libfabric:41635:verbs:fabric:vrb_get_matching_info():1512<info> checking domain: #1 mlx4_0
libfabric:41635:verbs:fabric:vrb_get_matching_info():1557<info> adding fi_info for domain: mlx4_0
libfabric:41635:verbs:fabric:vrb_get_matching_info():1512<info> checking domain: #2 mlx4_0-xrc
libfabric:41635:verbs:fabric:vrb_get_matching_info():1534<info> hints->ep_attr->rx_ctx_cnt != FI_SHARED_CONTEXT. Skipping XRC FI_EP_MSG endpoints
libfabric:41635:verbs:fabric:vrb_get_matching_info():1512<info> checking domain: #3 mlx4_0-dgram
libfabric:34513:core:core:fi_getinfo_():1123<warn> Can't find provider with the highest priority
libfabric:34513:core:core:fi_getinfo_():1138<info> Found provider with the highest priority verbs, must_use_util_prov = 1
libfabric:41079:core:core:ofi_register_provider():474<info> registering provider: shm (113.20)
libfabric:41079:core:core:ofi_register_provider():502<info> "shm" filtered by provider include/exclude list, skipping
libfabric:4788:verbs:fabric:vrb_get_device_attrs():618<info> device mlx4_0: first found active port is 1
libfabric:27385:verbs:fabric:vrb_get_device_attrs():618<info> device mlx4_0: first found active port is 1
libfabric:47520:verbs:fabric:vrb_get_matching_info():1534<info> hints->ep_attr->rx_ctx_cnt != FI_SHARED_CONTEXT. Skipping XRC FI_EP_MSG endpoints
libfabric:47520:verbs:fabric:vrb_get_matching_info():1512<info> checking domain: #3 mlx4_0-dgram
libfabric:47520:verbs:core:ofi_check_ep_type():658<info> unsupported endpoint type
libfabric:47520:verbs:core:ofi_check_ep_type():659<info> Supported: FI_EP_DGRAM
libfabric:47520:verbs:core:ofi_check_ep_type():659<info> Requested: FI_EP_MSG
libfabric:33901:core:core:ofi_register_provider():474<info> registering provider: shm (113.20)
libfabric:33901:core:core:ofi_register_provider():502<info> "shm" filtered by provider include/exclude list, skipping
libfabric:12023:core:core:ofi_hmem_init():209<info> Hmem iface FI_HMEM_CUDA not supported
libfabric:12023:core:core:ofi_hmem_init():209<info> Hmem iface FI_HMEM_ROCR not supported
libfabric:30066:verbs:fabric:vrb_get_matching_info():1512<info> checking domain: #3 mlx4_0-dgram
libfabric:30066:verbs:core:ofi_check_ep_type():658<info> unsupported endpoint type
libfabric:30066:verbs:core:ofi_check_ep_type():659<info> Supported: FI_EP_DGRAM
libfabric:30066:verbs:core:ofi_check_ep_type():659<info> Requested: FI_EP_MSG
libfabric:34978:core:core:ofi_hmem_init():209<info> Hmem iface FI_HMEM_CUDA not supported
libfabric:34978:core:core:ofi_hmem_init():209<info> Hmem iface FI_HMEM_ROCR not supported
libfabric:38585:core:core:fi_getinfo_():1138<info> Found provider with the highest priority verbs, must_use_util_prov = 1
libfabric:38585:verbs:fabric:vrb_get_matching_info():1512<info> checking domain: #1 mlx4_0
libfabric:38585:verbs:fabric:vrb_get_matching_info():1557<info> adding fi_info for domain: mlx4_0
libfabric:38585:verbs:fabric:vrb_get_matching_info():1512<info> checking domain: #2 mlx4_0-xrc
libfabric:38585:verbs:fabric:vrb_get_matching_info():1534<info> hints->ep_attr->rx_ctx_cnt != FI_SHARED_CONTEXT. Skipping XRC FI_EP_MSG endpoints
libfabric:38585:verbs:fabric:vrb_get_matching_info():1512<info> checking domain: #3 mlx4_0-dgram


7 >>>> >>>>>  What is the Interconnect hardware(Infiniband/Intel Omni-Path etc..) you are using?

Interconnect hardware used >> "Infiniband"

 

8 >>>>>>>>  Also, could you please confirm if you are using Intel(R) MPI Library for Linux* OS, Version 2021.5?


>>> I confirm, please find below diectory contents
ls -l 2022.1.2/mpi
total 45
drwxrwxr-x 14 root root 33280 Mar 15 2022 2021.5.1
lrwxrwxrwx 1 root root 8 Mar 15 2022 latest -> 2021.5.1

0 Kudos
SantoshY_Intel
Moderator
1,876 Views

Hi,

 

Thank you for providing all the requested details.

 

Could you please try running your sample code using the following ways?

Scenario 1:

export I_MPI_FABRICS=ofi
export FI_PROVIDER=tcp
I_MPI_DEBUG=10 mpirun -bootstrap ssh -n 2100 ./sample

Scenario 2:

export I_MPI_FABRICS=ofi

export UCX_TLS=sm,self,tcp

I_MPI_DEBUG=10 mpirun -bootstrap ssh -n 2100 ./sample

Scenario 3:

export I_MPI_FABRICS=ofi

export FI_PROVIDER=verbs

I_MPI_DEBUG=10 mpirun -bootstrap ssh -n 2100 ./sample

 

Please let us know if you could run your sample using any one of the above scenarios.

 

Thanks & Regards,

Santosh

 

0 Kudos
Kheireddine_Yahyaoui
1,853 Views

Hi Santosh,

Thank you for your support and sharing details !

 

I will provide update once tests will be completed

For information, I'm now testing the first scenario, for 2100 socket/tasks the script completed correctly (run the test two times, and the execution completed correctly in the two cases), I'm trying now to run the script with more sockets/tasks, this should take some time (first tests took about 4 hours for every attempts)

 

The only issue detected was if I run commands without scheduler (lsf/bsub), the frontal node hung  (only output from two rank provided, then the script/ frontal node hung ). Using lsf/bsub the script for the first tests completed correctly.

 

Best Regards,

Kheireddine

0 Kudos
SantoshY_Intel
Moderator
1,660 Views

Hi,

 

Thanks for providing your observations on Scenario 1.

 

>>"I will provide update once tests will be completed"

Could you please provide any updates on scenario2 & 3?

 

Thanks & Regards,

Santosh

 

0 Kudos
Kheireddine_Yahyaoui
1,639 Views

Hi Santosh,

Many tests were run- using shared scenarios, only scenario 1 work fine until 2300 processes. With 2400 processor I interrupted the job after 24 hours. I can run the same job for more long period if needed.


Using scenario 2 and scenario 3, we get the same behavior >>>> running long time without any error/output (the script should hung)

 

Regarding scenario 1, a big latency detected when using this scenario. For little job (tested with 900 processor) the same script run in 30 minutes , when using this scenario took about 2h15 

 

When I interrupt the job, I get the following outputs:

 

>>>>>>>>>>>>>>>

Error file /entries

>>>>>>>>>>>>>>>>

forrtl: error (69): process interrupted (SIGINT)
Image PC Routine Line Source
a.out 00000000004056BB Unknown Unknown Unknown
libpthread-2.17.s 00007FFFEB8E5630 Unknown Unknown Unknown
. 00007FFFEDB028E5 clock_gettime Unknown Unknown
libc-2.17.so 00007FFFEB1167ED __clock_gettime Unknown Unknown
librxm-fi.so 00007FFF672B7771 Unknown Unknown Unknown
librxm-fi.so 00007FFF672B77C9 Unknown Unknown Unknown
librxm-fi.so 00007FFF672AF315 Unknown Unknown Unknown
librxm-fi.so 00007FFF672AF3C9 Unknown Unknown Unknown
librxm-fi.so 00007FFF672C9FBD Unknown Unknown Unknown
librxm-fi.so 00007FFF672C9F47 Unknown Unknown Unknown
libmpi.so.12.0.0 00007FFFEC3CAB30 Unknown Unknown Unknown
libmpi.so.12.0.0 00007FFFEBF19D19 Unknown Unknown Unknown
libmpi.so.12.0.0 00007FFFEC59048A Unknown Unknown Unknown
libmpi.so.12.0.0 00007FFFEC4C94C3 PMPI_Send Unknown Unknown
libmpifort.so.12. 00007FFFED6210E3 PMPI_SEND Unknown Unknown
a.out 0000000000404368 Unknown Unknown Unknown
a.out 0000000000403FE2 Unknown Unknown Unknown
libc-2.17.so 00007FFFEB024545 __libc_start_main Unknown Unknown
a.out 0000000000403EE9 Unknown Unknown Unknown
forrtl: error (69): process interrupted (SIGINT)
Image PC Routine Line Source
a.out 00000000004056BB Unknown Unknown Unknown
libpthread-2.17.s 00007FFFEB8E5630 Unknown Unknown Unknown
libpthread-2.17.s 00007FFFEB8E2573 pthread_spin_lock Unknown Unknown
forrtl: error (69): process interrupted (SIGINT)
Image PC Routine Line Source
a.out 00000000004056BB Unknown Unknown Unknown
libpthread-2.17.s 00007FFFEB8E5630 Unknown Unknown Unknown
libverbs-1.1-fi.s 00007FFF6A343200 Unknown Unknown Unknown
forrtl: error (69): process interrupted (SIGINT)
Image PC Routine Line Source
a.out 00000000004056BB Unknown Unknown Unknown
libpthread-2.17.s 00007FFFEB8E5630 Unknown Unknown Unknown
libmpi.so.12.0.0 00007FFFEC4C94CA PMPI_Send Unknown Unknown
libmpifort.so.12. 00007FFFED6210E3 PMPI_SEND Unknown Unknown
a.out 0000000000404368 Unknown Unknown Unknown
a.out 0000000000403FE2 Unknown Unknown Unknown
libc-2.17.so 00007FFFEB024545 __libc_start_main Unknown Unknown
a.out 0000000000403EE9 Unknown Unknown Unknown
forrtl: error (69): process interrupted (SIGINT)
Image PC Routine Line Source
a.out 00000000004056BB Unknown Unknown Unknown
libpthread-2.17.s 00007FFFEB8E5630 Unknown Unknown Unknown
libmlx4.so.1.0.22 00007FFF68A1E3AA Unknown Unknown Unknown
libmlx4.so.1.0.22 00007FFF68A1F013 Unknown Unknown Unknown
libverbs-1.1-fi.s 00007FFF6A347190 Unknown Unknown Unknown
libverbs-1.1-fi.s 00007FFF6A3543C2 Unknown Unknown Unknown
librxm-fi.so 00007FFF672A6E58 Unknown Unknown Unknown
libmpi.so.12.0.0 00007FFFEC4C670B PMPI_Send Unknown Unknown
libmpifort.so.12. 00007FFFED6210E3 PMPI_SEND Unknown Unknown
a.out 0000000000404368 Unknown Unknown Unknown
a.out 0000000000403FE2 Unknown Unknown Unknown
libc-2.17.so 00007FFFEB024545 __libc_start_main Unknown Unknown
a.out 0000000000403EE9 Unknown Unknown Unknown
forrtl: error (69): process interrupted (SIGINT)
Image PC Routine Line Source
a.out 00000000004056BB Unknown Unknown Unknown
libpthread-2.17.s 00007FFFEB8E5630 Unknown Unknown Unknown
libpthread-2.17.s 00007FFFEB8E2573 pthread_spin_lock Unknown Unknown
[mpiexec@xxxxxxx] HYD_sock_write (../../../../../src/pm/i_hydra/libhydra/sock/hydra_sock_intel.c:362): write error (Bad file descriptor)

>>>>>>>>>>>>>>>

Output file /entries

>>>>>>>>>>>>>>>>

The entry in output file >>> Job should start on all rank, but will not receive output and go to hung :

more Output_739752.out
[0] MPI startup(): Intel(R) MPI Library, Version 2021.5 Build 20211102 (id: 9279b7d62)
[0] MPI startup(): Copyright (C) 2003-2021 Intel Corporation. All rights reserved.
[0] MPI startup(): library kind: release
[0] MPI startup(): libfabric version: 1.13.2rc1-impi
[0] MPI startup(): libfabric provider: verbs;ofi_rxm
[0] MPI startup(): File "/xxxxxxxxxxxxxxxxxxxxxxxx/tuning_skx_ofi_verbs-ofi-rxm_56.dat" not found
[0] MPI startup(): Load tuning file: "/xxxxxxxxxxxxxxxxxxxxxxxxx/tuning_skx_ofi.dat"
[0] MPI startup(): Rank Pid Node name Pin cpu
[0] MPI startup(): 0 47123 r57i5n3 {0,24}
[0] MPI startup(): 1 47124 r57i5n3 {1,25}
[0] MPI startup(): 2 47125 r57i5n3 {2,26}

--

--

[0] MPI startup(): 2297 23673 r60i5n10 {19,43}
[0] MPI startup(): 2298 23674 r60i5n10 {20,44}
[0] MPI startup(): 2299 23675 r60i5n10 {21,45}
[0] MPI startup(): I_MPI_CC=icc
[0] MPI startup(): I_MPI_CXX=icpc
[0] MPI startup(): I_MPI_FC=ifort
[0] MPI startup(): I_MPI_F90=ifort
[0] MPI startup(): I_MPI_F77=ifort
[0] MPI startup(): I_MPI_ROOT=/data_local/sw/intel_OneAPI/RHEL7/2022.1.2/mpi/2021.5.1
[0] MPI startup(): I_MPI_MPIRUN=mpirun
[0] MPI startup(): I_MPI_HYDRA_RMK=lsf
[0] MPI startup(): I_MPI_HYDRA_TOPOLIB=hwloc
[0] MPI startup(): I_MPI_INTERNAL_MEM_POLICY=default
[0] MPI startup(): I_MPI_EXTRA_FILESYSTEM=1
[0] MPI startup(): I_MPI_EXTRA_FILESYSTEM_FORCE=lustre
[0] MPI startup(): I_MPI_FABRICS=ofi
[0] MPI startup(): I_MPI_DEBUG=10
[0] MPI startup(): threading: mode: direct
[0] MPI startup(): threading: vcis: 1
[0] MPI startup(): threading: app_threads: 1
[0] MPI startup(): threading: runtime: generic
[0] MPI startup(): threading: is_threaded: 0
[0] MPI startup(): threading: async_progress: 0
[0] MPI startup(): threading: num_pools: 64
[0] MPI startup(): threading: lock_level: global
[0] MPI startup(): threading: enable_sep: 0
[0] MPI startup(): threading: direct_recv: 1
[0] MPI startup(): threading: zero_op_flags: 0
[0] MPI startup(): threading: num_am_buffers: 8
[0] MPI startup(): threading: library is built with per-vci thread granularity
Start
Rank 0 receiving from other ranks...
receive from 0
receive from 1
receive from 2
receive from 3
--

--
receive from 24
receive from 25
receive from 26
[mpiexec@r57i5n3] Sending Ctrl-C to processes as requested
[mpiexec@r57i5n3] Press Ctrl-C again to force abort

 

Best Regards,

Kheireddine

 

0 Kudos
SantoshY_Intel
Moderator
1,575 Views

Hi,


Thanks for providing your observations. We are working on your issue and we will get back to you soon.


Thanks & Regards,

Santosh


0 Kudos
SantoshY_Intel
Moderator
1,260 Views

Hi,


Could you please try using the latest Intel MPI 2021.8 and get back to us if the issue still persists?


According to the latest Intel MPI Library system requirements, RHEL 7.6 is not a supported Linux OS. So, please ensure to use a supported operating system as mentioned in the below link:

https://www.intel.com/content/www/us/en/developer/articles/system-requirements/mpi-library-system-requirements.html



Thanks & Regards,

Santosh


0 Kudos
SantoshY_Intel
Moderator
1,229 Views

Hi,


Could you please provide us with an update on your issue?


Thanks & Regards,

Santosh


0 Kudos
SantoshY_Intel
Moderator
1,174 Views

Hi,


I have not heard back from you. This thread will no longer be monitored by Intel. If you need further assistance, please post a new question.


Thanks & Regards,

Santosh



0 Kudos
Reply