Intel® MPI Library
Get help with building, analyzing, optimizing, and scaling high-performance computing (HPC) applications.
2261 Discussions

Intel mpi hangs when receive message sent from the same rank

jellie_
Beginner
1,795 Views

Suppose I have a mpi program which will send/receive within the same rank:

 

#include <iostream>
#include <mpi.h>

int main(int argc, char** argv) {
    MPI_Init(&argc, &argv);

    int i1 = 2;
    MPI_Request req1;
    MPI_Isend(&i1, 1, MPI_INT, 0, 0, MPI_COMM_WORLD, &req1);

    MPI_Status status1;
    MPI_Wait(&req1, &status1);
    std::cout << "sent\n";

    int i2;
    MPI_Status status2;
    MPI_Recv(&i2, 1, MPI_INT, 0, 0, MPI_COMM_WORLD, &status2);

    std::cout << i2 << "\n";
    MPI_Finalize();
}

 

 

run it with mpirun -np 1 <executable>
 
when linked to openmi, it works well. However, when compiled and link to intel mpi, it hangs after printed the "sent".
If it is link to openmpi and run with intel's mpirun, it works.
 
Initially I suspect that a mpi rank communicate with itself is a undefinded behaviour, but I found this on stackoverflow:
 

https://stackoverflow.com/questions/11385395/is-the-behavior-of-mpi-communication-of-a-rank-with-itself-well-defined

 

so this might be a bug related to intel mpi.

 

openmpi version:

 

$ mpirun --version
mpirun (Open MPI) 4.1.4

Report bugs to http://www.open-mpi.org/community/help/

 

intel mpi version:

$ mpirun --version
Intel(R) MPI Library for Linux* OS, Version 2021.11 Build 20231005 (id: 74c4a23)
Copyright 2003-2023, Intel Corporation.

 

0 Kudos
7 Replies
jellie_
Beginner
1,743 Views

additional infomation:

 

$ ldd test
        linux-vdso.so.1 (0x00007ffe1bdf2000)
        libc++abi.so.1 => /lib/x86_64-linux-gnu/libc++abi.so.1 (0x00007febf62f2000)
        libmpi.so.12 => /opt/intel/oneapi/mpi/2021.11/lib/libmpi.so.12 (0x00007febf4600000)
        libc++.so.1 => /lib/x86_64-linux-gnu/libc++.so.1 (0x00007febf61ec000)
        libunwind.so.1 => /lib/x86_64-linux-gnu/libunwind.so.1 (0x00007febf61dd000)
        libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007febf4521000)
        libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007febf61bd000)
        libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007febf4340000)
        librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007febf61b8000)
        libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007febf61b1000)
        /lib64/ld-linux-x86-64.so.2 (0x00007febf635a000)
0 Kudos
jellie_
Beginner
1,733 Views

another example:

#include <iostream>
#include <mpi.h>

int main(int argc, char** argv) {
    MPI_Init(&argc, &argv);

    int i1 = 2;
    MPI_Request req1;
    MPI_Isend(&i1, 1, MPI_INT, 0, 0, MPI_COMM_WORLD, &req1);

    MPI_Status status1;
    MPI_Wait(&req1, &status1);
    std::cout << "sent\n";

    int i2;
    MPI_Status status2;
    while (true) {
        MPI_Iprobe(0, 0, MPI_COMM_WORLD, &i2, &status2);
        std::cout << i2 << '\n';
        if (i2 != 0) {
            break;
        }
    }

    std::cout << "iprobe success\n";
    MPI_Finalize();
}

when linked to openmpi, it ends normally.

when linked to intel mpi, the MPI_Iprobe never success, and the line 19 always prints 0.

0 Kudos
TobiasK
Moderator
1,670 Views

@jellie_


both of your codes work for me, can you please add I_MPI_DEBUG=10 mpirun ... ?


0 Kudos
jellie_
Beginner
1,667 Views

hello,

the output of the first example looks like this:

$ I_MPI_DEBUG=10 mpirun -np 1 executable
[0] MPI startup(): Intel(R) MPI Library, Version 2021.11  Build 20231005 (id: 74c4a23)
[0] MPI startup(): Copyright (C) 2003-2023 Intel Corporation.  All rights reserved.
[0] MPI startup(): library kind: release
[0] MPI startup(): libfabric loaded: libfabric.so.1 
[0] MPI startup(): libfabric version: 1.18.1-impi
[0] MPI startup(): max number of MPI_Request per vci: 67108864 (pools: 1)
[0] MPI startup(): libfabric provider: tcp
[0] MPI startup(): File "" not found
[0] MPI startup(): Load tuning file: "/opt/intel/oneapi/mpi/2021.11/opt/mpi/etc/tuning_icx_shm-ofi.dat"
[0] MPI startup(): File "/opt/intel/oneapi/mpi/2021.11/opt/mpi/etc/tuning_icx_shm-ofi.dat" not found
[0] MPI startup(): Load tuning file: "/opt/intel/oneapi/mpi/2021.11/opt/mpi/etc//tuning_clx-ap_shm-ofi.dat"
[0] MPI startup(): threading: mode: direct
[0] MPI startup(): threading: vcis: 1
[0] MPI startup(): threading: app_threads: -1
[0] MPI startup(): threading: runtime: generic
[0] MPI startup(): threading: progress_threads: 0
[0] MPI startup(): threading: async_progress: 0
[0] MPI startup(): threading: lock_level: global
[0] MPI startup(): tag bits available: 19 (TAG_UB value: 524287) 
[0] MPI startup(): source bits available: 20 (Maximal number of rank: 1048575) 
[0] MPI startup(): ===== Nic pinning on X1-Nano =====
[0] MPI startup(): Rank Pin nic
[0] MPI startup(): 0    ve-ArchLinux
[0] MPI startup(): Rank    Pid      Node name  Pin cpu
[0] MPI startup(): 0       143295   X1-Nano    {0,1,2,3,4,5,6,7}
[0] MPI startup(): I_MPI_ROOT=/opt/intel/oneapi/mpi/2021.11
[0] MPI startup(): ONEAPI_ROOT=/opt/intel/oneapi
[0] MPI startup(): I_MPI_MPIRUN=mpirun
[0] MPI startup(): I_MPI_BIND_WIN_ALLOCATE=localalloc
[0] MPI startup(): I_MPI_HYDRA_TOPOLIB=hwloc
[0] MPI startup(): I_MPI_RETURN_WIN_MEM_NUMA=-1
[0] MPI startup(): I_MPI_INTERNAL_MEM_POLICY=default
[0] MPI startup(): I_MPI_DEBUG=10
sent

then it hangs

0 Kudos
TobiasK
Moderator
1,658 Views
0 Kudos
jellie_
Beginner
1,651 Views

I'm using debian 12 (which is the last stable debian version), is this too new?

the clang 17 and gcc 12.2.0 compiler are all tested and the program always hangs.

0 Kudos
TobiasK
Moderator
1,578 Views

Debian 13 is currently not validated / supported.
On our machines, it works fine, however, the code itself is unsafe and also talking to my colleagues, we strongly recommend to avoid that as it may or may not work the way you want it to work.

0 Kudos
Reply