Intel® MPI Library
Get help with building, analyzing, optimizing, and scaling high-performance computing (HPC) applications.

PSM2_MQ_RECVREQS_MAX limit reached

stefano_t_
Beginner
2,451 Views

Hi,

one of our user report us a problem with MPI_Gatherv of intelmpi 2017.

The problem is related to the maximum number of irecv requests in flight.

To reproduce the problem we set up a test case and run it (the code used is shown below) with 72 MPI tasks on two nodes, each one containing 2x Broadwell processors (18 cores per socket). The inter-node communication fabric is Omni-Path.

At runtime the program crashes returning the following error message:

Exhausted 1048576 MQ irecv request descriptors, which
usually indicates a user program error or insufficient request
descriptors (PSM2_MQ_RECVREQS_MAX=1048576)

By setting the value of the variable PSM2_MQ_RECVREQS_MAX to a higher value seems to solve the problem.

Also putting an MPI barrier after the gatherv call solves the problem, although with the side effect of forcing tasks synchronization.

Two questions now arise:

1. Are there any known side-effects by setting PSM2_MQ_RECVREQS_MAX to a very large value?
Can that affect resource requirements of my program, just as memory for example?

2. Alternatively, is there a more robust way to limit the maximum number of irecv requests in flight, so as not to cause the program fault?

Best Regards,
 

Stefano

 

Here is the code:

#include "mpi.h"
#include <stdio.h>
#include <stdlib.h>
#include <assert.h>

int main(int argc, char **argv)
{
  MPI_Init(&argc, &argv);

  int size, rank;
  MPI_Comm_size(MPI_COMM_WORLD, &size);
  MPI_Comm_rank(MPI_COMM_WORLD, &rank);

  int iterations = 100000;

  int send_buf[1] = {rank};

  int *recv_buf = NULL;
  int *recvcounts = NULL;
  int *displs = NULL;

  int recv_buf_size = size;
  if (rank == 0) {
    recv_buf = calloc(recv_buf_size, sizeof(*recv_buf));
    for (int i = 0; i < recv_buf_size; i++) {
      recv_buf = -1;
    }
    recvcounts = calloc(size, sizeof(*recvcounts));
    displs = calloc(size, sizeof(*displs));
    for (int i = 0; i < size; i++) {
      recvcounts = 1;
      displs = i;
    }
  }
  int ten_percent = iterations / 10;
  int progress = 0;
  MPI_Barrier(MPI_COMM_WORLD);
  for (int i = 0; i < iterations; i++) {
    if (i >= progress) {
      if (rank == 0) printf("Starting iteration %d\n", i);
      progress += ten_percent;
    }
    MPI_Gatherv(send_buf, 1, MPI_INT, recv_buf, recvcounts, displs, MPI_INT, 0, MPI_COMM_WORLD);
  }
  if (rank == 0) {
    for (int i = 0; i < recv_buf_size; i++) {
      assert(recv_buf == i);
    }
  }

  free(recv_buf);
  free(recvcounts);
  free(displs);

  MPI_Finalize();
}

 

0 Kudos
3 Replies
Dmitry_S_Intel
Moderator
2,450 Views

Please try with I_MPI_ADJUST_GATHERV=3

--

Dmitry

0 Kudos
stefano_t_
Beginner
2,448 Views

Thanks for the reply,

it seems indeed that with I_MPI_ADJUST_GATHERV=3 it works without the need to increase the value of PSM2_MQ_RECVREQS_MAX for the number of iterations specified in the posted code, and the execution is faster. However, if I try with 10000000 iterations the problem arises again. Please note that the code posted in my previous post was a reproducer of what the user appplication can do, so the number of iterations can vary.

Hence, for a higher number of iterations one always needs to increase PSM2_MQ_RECVREQS_MAX. This might not be an issue, unless there are known side-effects by setting PSM2_MQ_RECVREQS_MAX to a very large value. Are you aware of any of these?

Thank you in advance.

Stefano.

0 Kudos
Dmitry_S_Intel
Moderator
2,448 Views

Hi,

Increasing of PSM2_MQ_RECVREQS_MAX will increase memory consumption. But for 72 MPI tasks it should not be significant.

--

Dmitry

0 Kudos
Reply