Intel® MPI Library
Get help with building, analyzing, optimizing, and scaling high-performance computing (HPC) applications.

MPI_Mprobe() makes no progress for internode communicator

Hi all, My understanding (correct me if I'm wrong), is that MPI_Mprobe() has to guarantee progress if a matching send has been posted. The minimal working example below, however, runs to completion on a single Phi node of stampede2, while deadlocking on more than one node. Thanks, Toby impi version: Intel(R) MPI Library for Linux* OS, Version 2017 Update 3 Build 20170405 (id: 17193) mwe.c (attached) ~~~ #!/bin/sh #SBATCH -J mwe # Job name #SBATCH -p development # Queue (development or normal) #SBATCH -N 2 # Number of nodes #SBATCH --tasks-per-node 1 # Number of tasks per node #SBATCH -t 00:01:00 # Time limit hrs:min:sec #SBATCH -o mwe-%j.out # Standard output and error log ~~~ mwe-341107.out ~~~ TACC: Starting up job 341107 TACC: Starting parallel tasks... [0]: post Isend [1]: post Isend slurmstepd: error: *** JOB 341107 ON c455-084 CANCELLED AT 2017-10-16T10:59:26 DUE TO TIME LIMIT *** [] control_cb (../../pm/pmiserv/pmiserv_cb.c:857): assert (!closed) failed [] HYDT_dmxu_poll_wait_for_event (../../tools/demux/demux_poll.c:76): callback returned error status [] HYD_pmci_wait_for_completion (../../pm/pmiserv/pmiserv_pmci.c:501): error waiting for event [] main (../../ui/mpich/mpiexec.c:1147): process manager error waiting for completion ~~~
0 Kudos
1 Reply

Hello Toby,

MPI_Mprobe is not supported by TMI and OFI fabrics of Intel MPI 2018 and lower.

Section "Known Issues and Limitations":

  • MPI_Mprobe, MPI_Improbe, and MPI_Cancel are not supported by the TMI and OFI fabrics.

I was able to successfully launch your test with IMPI 2018 (TCP fabric) and IMPI 2019 (there is only one OFI fabric available).

0 Kudos