MPI_Recv block a long time

wang_b_1 · ‎06-16-2015

hello:
   I get into trouble when use MPI_Recv in my programmes.
   My programme start 3 subprocess,and bind them to cpu 1-3 respectively. In each subprocess, first disabled interrupts , then send message to other process and receive from others. Repeat it a billion times.
   I except that MPI_Recv will return in a fixed times ,and not use MPI_irecv instead.
   In order to do that, i disabled interrupts and cancel ticks on cpu1-3,remove other process from cpu 1-3 to cpu 0,and bind interrupts to cpu0.
But I found that a very few times(about a billion times may happen 1 times) the MPI_Recv will block for more than 600 ms, but normally the MPI_Recv only take less than 10ms.
   I don't know why the MPI_Recv some times block so long,is there any method to find the reason and solve the problem?

Use mpirun -n 3 to execute the programmes, use Hydra and Shared memory fabrics.
environment : parallel_studio_xe_2015_update2 linux 3.10
===== Processor composition =====
Processor name : Intel(R) Core(TM) i5-4590
Packages(sockets) : 1
Cores : 4
Processors(CPUs) : 4
Cores per package : 4
Threads per core : 1

void emt_comm()
{
    ......
    for (i=0; i<ProcInfo.NumProc; i++)
    {
        if (i != ProcInfo.Id)
            MPI_Send(SendBuf, EMT_COMM_NUM, MPI_DOUBLE_PRECISION, i, i+1, ProcInfo.CommEMTCAL);
    }

    for (i=0; i<ProcInfo.NumProc; i++)
    {
        if (i != ProcInfo.Id)
            MPI_Recv(buf22, EMT_COMM_NUM, MPI_DOUBLE_PRECISION, i, ProcInfo.Id+1, ProcInfo.CommEMTCAL, &MpiStt);    
    }
}
    
    
    

void *thread_emt(__attribute__((unused)) void *arg)
{
    ......
    set_thread_affinity(core_id);
    MPI_Barrier(ProcInfo.CommCAL);
    disabled_inter();
    for(step=1; step<=10000000; step++)
    {
        emt_comm();    
        MPI_Barrier(ProcInfo.CommCAL);    
    }
    open_inter();
}
    
    
int main(int argc,char *argv[])
{
    ......
    isCalculationOver = 0;
    set_thread_affinity(0);
    MPI_Init(&argc, &argv);
    MPI_Comm_rank( MPI_COMM_WORLD, &ProcInfo.Id);
    MPI_Comm_size( MPI_COMM_WORLD, &ProcInfo.NumProc);
    core = ProcInfo.Id+1;
    MPI_Barrier(MPI_COMM_WORLD);
    ......
    pthread_create(&thread, NULL, thread_emt, &core);
    ......
    while(1 != isCalculationOver)
        usleep(100*1000);

    MPI_Barrier(MPI_COMM_WORLD);
    MPI_Finalize();

    return 0;
}

Victor_E_1 · ‎07-06-2015

Your code is semantically deadlocking: you start by everyone doing a send. Since this is a blocking send, no one progresses to the receives after, and you have deadlock.

The reason your code will actually work is that small messages are usually sent without going through the rendez-vous protocol. However, this is up to the mercy of the network and availability of buffer space on the NIC. Since you're repeating this code many times, I imagine that every once in a while this "eager" send does not succeed.

Victor.

James_T_Intel · ‎07-07-2015

To clarify more on what Victor said, there are multiple ways to handle MPI_Send (and MPI_Recv). That is up to the implementation. I would highly recommend switching to an explicitly non-blocking MPI_Isend instead.

Also, are you intending to use multiple threads for MPI calls? If so, you should use MPI_Init_thread instead of MPI_Init.