I have a Fortran code that uses both MPI and OpenMP. I am trying to run it on a cluster running Red Hat Enterprise Linux Server 7.8 OS with Intel Parallel Studio XE 2020 (1.217) installed. The system uses Sun Grid Engine as the job scheduler. I can successfully submit my job on some of the newer nodes but I kept getting the following message when I try it on the older nodes:
/net/ihn02/opt/intel/compilers_and_libraries_2020.1.217/linux/mpi/intel64/bin/mpiexec.hydra [mpiexec@cn28] check_exit_codes (../../../../../src/pm/i_hydra/libhydra/demux/hydra_demux_poll.c:117): unable to run bstrap_proxy on cn08 (pid 3745, exit code 65280) [mpiexec@cn28] poll_for_event (../../../../../src/pm/i_hydra/libhydra/demux/hydra_demux_poll.c:159): check exit codes error [mpiexec@cn28] HYD_dmx_poll_wait_for_proxy_event (../../../../../src/pm/i_hydra/libhydra/demux/hydra_demux_poll.c:212): poll for event error [mpiexec@cn28] HYD_bstrap_setup (../../../../../src/pm/i_hydra/libhydra/bstrap/src/intel/i_hydra_bstrap.c:770): error waiting for event [mpiexec@cn28] main (../../../../../src/pm/i_hydra/mpiexec/mpiexec.c:1956): error setting up the boostrap proxies
I checked that all the nodes are running the same version of OS and the Intel tools are installed on a shared space that all the nodes have access to. Does anyone know what might cause this failure? Occasionally, I can get a successful run but it fails most of the times (I would say >95% failure). Thanks in advance for any help.