Intel® MPI Library
Get help with building, analyzing, optimizing, and scaling high-performance computing (HPC) applications.
2255 Discussions

Intel mpi hangs when receive message sent from the same rank

jellie_
Beginner
1,746 Views

Suppose I have a mpi program which will send/receive within the same rank:

 

#include <iostream>
#include <mpi.h>

int main(int argc, char** argv) {
    MPI_Init(&argc, &argv);

    int i1 = 2;
    MPI_Request req1;
    MPI_Isend(&i1, 1, MPI_INT, 0, 0, MPI_COMM_WORLD, &req1);

    MPI_Status status1;
    MPI_Wait(&req1, &status1);
    std::cout << "sent\n";

    int i2;
    MPI_Status status2;
    MPI_Recv(&i2, 1, MPI_INT, 0, 0, MPI_COMM_WORLD, &status2);

    std::cout << i2 << "\n";
    MPI_Finalize();
}

 

 

run it with mpirun -np 1 <executable>
 
when linked to openmi, it works well. However, when compiled and link to intel mpi, it hangs after printed the "sent".
If it is link to openmpi and run with intel's mpirun, it works.
 
Initially I suspect that a mpi rank communicate with itself is a undefinded behaviour, but I found this on stackoverflow:
 

https://stackoverflow.com/questions/11385395/is-the-behavior-of-mpi-communication-of-a-rank-with-itself-well-defined

 

so this might be a bug related to intel mpi.

 

openmpi version:

 

$ mpirun --version
mpirun (Open MPI) 4.1.4

Report bugs to http://www.open-mpi.org/community/help/

 

intel mpi version:

$ mpirun --version
Intel(R) MPI Library for Linux* OS, Version 2021.11 Build 20231005 (id: 74c4a23)
Copyright 2003-2023, Intel Corporation.

 

0 Kudos
7 Replies
jellie_
Beginner
1,694 Views

additional infomation:

 

$ ldd test
        linux-vdso.so.1 (0x00007ffe1bdf2000)
        libc++abi.so.1 => /lib/x86_64-linux-gnu/libc++abi.so.1 (0x00007febf62f2000)
        libmpi.so.12 => /opt/intel/oneapi/mpi/2021.11/lib/libmpi.so.12 (0x00007febf4600000)
        libc++.so.1 => /lib/x86_64-linux-gnu/libc++.so.1 (0x00007febf61ec000)
        libunwind.so.1 => /lib/x86_64-linux-gnu/libunwind.so.1 (0x00007febf61dd000)
        libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007febf4521000)
        libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007febf61bd000)
        libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007febf4340000)
        librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007febf61b8000)
        libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007febf61b1000)
        /lib64/ld-linux-x86-64.so.2 (0x00007febf635a000)
0 Kudos
jellie_
Beginner
1,684 Views

another example:

#include <iostream>
#include <mpi.h>

int main(int argc, char** argv) {
    MPI_Init(&argc, &argv);

    int i1 = 2;
    MPI_Request req1;
    MPI_Isend(&i1, 1, MPI_INT, 0, 0, MPI_COMM_WORLD, &req1);

    MPI_Status status1;
    MPI_Wait(&req1, &status1);
    std::cout << "sent\n";

    int i2;
    MPI_Status status2;
    while (true) {
        MPI_Iprobe(0, 0, MPI_COMM_WORLD, &i2, &status2);
        std::cout << i2 << '\n';
        if (i2 != 0) {
            break;
        }
    }

    std::cout << "iprobe success\n";
    MPI_Finalize();
}

when linked to openmpi, it ends normally.

when linked to intel mpi, the MPI_Iprobe never success, and the line 19 always prints 0.

0 Kudos
TobiasK
Moderator
1,621 Views

@jellie_


both of your codes work for me, can you please add I_MPI_DEBUG=10 mpirun ... ?


0 Kudos
jellie_
Beginner
1,618 Views

hello,

the output of the first example looks like this:

$ I_MPI_DEBUG=10 mpirun -np 1 executable
[0] MPI startup(): Intel(R) MPI Library, Version 2021.11  Build 20231005 (id: 74c4a23)
[0] MPI startup(): Copyright (C) 2003-2023 Intel Corporation.  All rights reserved.
[0] MPI startup(): library kind: release
[0] MPI startup(): libfabric loaded: libfabric.so.1 
[0] MPI startup(): libfabric version: 1.18.1-impi
[0] MPI startup(): max number of MPI_Request per vci: 67108864 (pools: 1)
[0] MPI startup(): libfabric provider: tcp
[0] MPI startup(): File "" not found
[0] MPI startup(): Load tuning file: "/opt/intel/oneapi/mpi/2021.11/opt/mpi/etc/tuning_icx_shm-ofi.dat"
[0] MPI startup(): File "/opt/intel/oneapi/mpi/2021.11/opt/mpi/etc/tuning_icx_shm-ofi.dat" not found
[0] MPI startup(): Load tuning file: "/opt/intel/oneapi/mpi/2021.11/opt/mpi/etc//tuning_clx-ap_shm-ofi.dat"
[0] MPI startup(): threading: mode: direct
[0] MPI startup(): threading: vcis: 1
[0] MPI startup(): threading: app_threads: -1
[0] MPI startup(): threading: runtime: generic
[0] MPI startup(): threading: progress_threads: 0
[0] MPI startup(): threading: async_progress: 0
[0] MPI startup(): threading: lock_level: global
[0] MPI startup(): tag bits available: 19 (TAG_UB value: 524287) 
[0] MPI startup(): source bits available: 20 (Maximal number of rank: 1048575) 
[0] MPI startup(): ===== Nic pinning on X1-Nano =====
[0] MPI startup(): Rank Pin nic
[0] MPI startup(): 0    ve-ArchLinux
[0] MPI startup(): Rank    Pid      Node name  Pin cpu
[0] MPI startup(): 0       143295   X1-Nano    {0,1,2,3,4,5,6,7}
[0] MPI startup(): I_MPI_ROOT=/opt/intel/oneapi/mpi/2021.11
[0] MPI startup(): ONEAPI_ROOT=/opt/intel/oneapi
[0] MPI startup(): I_MPI_MPIRUN=mpirun
[0] MPI startup(): I_MPI_BIND_WIN_ALLOCATE=localalloc
[0] MPI startup(): I_MPI_HYDRA_TOPOLIB=hwloc
[0] MPI startup(): I_MPI_RETURN_WIN_MEM_NUMA=-1
[0] MPI startup(): I_MPI_INTERNAL_MEM_POLICY=default
[0] MPI startup(): I_MPI_DEBUG=10
sent

then it hangs

0 Kudos
TobiasK
Moderator
1,609 Views
0 Kudos
jellie_
Beginner
1,602 Views

I'm using debian 12 (which is the last stable debian version), is this too new?

the clang 17 and gcc 12.2.0 compiler are all tested and the program always hangs.

0 Kudos
TobiasK
Moderator
1,529 Views

Debian 13 is currently not validated / supported.
On our machines, it works fine, however, the code itself is unsafe and also talking to my colleagues, we strongly recommend to avoid that as it may or may not work the way you want it to work.

0 Kudos
Reply