Incorrect program or MPI implementation bug?

Adrian_I_ · ‎07-06-2014

Hi,

Below is a simple reproduction case for the issue we're facing:

#include "stdio.h"
#include "mpi.h"
#include "stdlib.h"

int main(int argc, char* argv[]) {
    int rank;
    MPI_Group group;

    MPI_Init(&argc, &argv);
    MPI_Comm_rank(MPI_COMM_WORLD, &rank);
    MPI_Comm_group(MPI_COMM_WORLD, &group);

    if (rank == 0) {
        printf("rank 0: about to send\n");
        MPI_Ssend(NULL, 0, MPI_INT, 1, 0, MPI_COMM_WORLD);
        printf("rank 0: send completed\n");
    } else {
        MPI_Request req[2];
        int which;

        MPI_Isend(NULL, 0, MPI_INT, 0, 0, MPI_COMM_WORLD, &req[0]);
        MPI_Irecv(NULL, 0, MPI_INT, 0, 0, MPI_COMM_WORLD, &req[1]);

        MPI_Waitany(2, req, &which, MPI_STATUS_IGNORE);

        if (which == 0) {
            printf("rank 1: send succeeded; cancelling receive request\n");
            MPI_Cancel(&req[1]);
            MPI_Wait(&req[1], MPI_STATUS_IGNORE);
        } else {
            printf("rank 1: receive succeeded; cancelling send request\n");
            MPI_Cancel(&req[0]);
            MPI_Wait(&req[0], MPI_STATUS_IGNORE);
        }
    }

    MPI_Finalize();
    return 0;
}

This program outputs the following, after which it hangs indefinitely:

rank 0: about to send
rank 1: send succeeded; cancelling receive request

I understand that this is caused by the "eager completion" of MPI_Isend() on rank 1. Also, I understand that the expected behaviour of a program that initiates an unmatched operation is undefined. However, I don't believe this is the case here, as I do eventually call MPI_Cancel() on the request. If that was not enough, then wouldn't that imply that a program that simply does MPI_Isend(...); MPI_Cancel(...); MPI_Wait(...); is also incorrect?

I also noticed that changing the MPI_Isend() into MPI_Issend() makes the program work as expected:

rank 0: about to send
rank 0: send completed
rank 1: receive succeeded; cancelling send request

So, to keep it short, my questions are:

Is the initial (MPI_Isend()) version of my program an incorrect MPI program, whose behaviour is undefined?
If so, then could you please explain why and point me to the relevant section of the MPI standard or any other resources that would clarify these matters for me?
Is the MPI_Issend() version of my program also incorrect?
If MPI_Issend() still doesn't make the program correct, can I at least be sure that, with the Intel implementation, it will always work as expected? Or is it just a coincidence that it does?

Many thanks to anyone willing to help me with this!

- Adrian

James_T_Intel · ‎07-08-2014

Hi Adrian,

The MPI_Issend version is a correct program. The original MPI_Isend version is incorrect, dependent on the implementation details. From the MPI standard (see section 3.4 Communication Modes for full details), here's what's going on. There are four communication modes that can be used by a send.

Standard. MPI_Send, MPI_Isend. This lets the implementation decide which of the other three modes will be used. I'll need to confirm with our developers, but in every instance I've watched it, the Intel® MPI Library has chosen Buffered.
Buffered. MPI_Bsend, MPI_Ibsend. In this mode, the send is allowed to start at any time, and completes as soon as the data is sent to a buffer.
Synchronous. MPI_Ssend, MPI_Issend. In this mode, the send is allowed to start at any time, but cannot complete until the matching receive is posted.
Ready. MPI_Rsend, MPI_Irsend. In this mode, the send should not be started before the matching receive is posted, otherwise the program is incorrect.

Specifically for your program, when you use MPI_Isend, it is able to complete as soon as the data is in a buffer. Thus, the MPI_Waitany can have either call complete, and the MPI_Isend is the first one detected to be complete. In this case, the MPI_Irecv is cancelled, the MPI_Ssend can never complete, the data from the MPI_Isend is lost in the buffer, and the program hangs.

When you switch to MPI_Issend, this call now requires the matching receive before it can complete. Thus, MPI_Waitany will always find the MPI_Irecv and complete it, cancelling the MPI_Issend, completing the MPI_Ssend, and allowing the program to finish.

Does this make sense?

James.

Adrian_I_ · ‎07-08-2014

Hi James,

It makes total sense - in fact, it's just what I thought. Still, it's great to have the confirmation that, with MPI_Issend, the program is correct.
Many thanks for your detailed answer!

Regards,
Adrian