Intel® MPI Library
Get help with building, analyzing, optimizing, and scaling high-performance computing (HPC) applications.
2159 Discussions

One-sided MPI communication never returns in some cases

Sebastian_R_1
Beginner
1,145 Views

Hi,

I tried running the following code on a Linux cluster with Intel MPI (Version 4.0 Update 3 Build 20110824) and slurm 2.2.7 on 2 nodes with 8 cores each (16 tasks).

Unfortunately, it hangs at the MPI_Win_unlock command during the 11th or 12th iteration. I have tried Intel compiler and gcc with no success.
[cpp]#include #include #define USE_BARRIER 1 #define LOCAL_RANK 10 #define REMOTE_RANK 3 int main(int argc, char** argv) { int rank, error; MPI_Win win; double* value; double local_value; MPI_Init(&argc, &argv); MPI_Comm_rank(MPI_COMM_WORLD, &rank); error = MPI_Alloc_mem(sizeof(double), MPI_INFO_NULL, &value); if (error != MPI_SUCCESS) MPI_Abort(MPI_COMM_WORLD, error); error = MPI_Win_create(value, sizeof(double), sizeof(double), MPI_INFO_NULL, MPI_COMM_WORLD, &win); if (error != MPI_SUCCESS) MPI_Abort(MPI_COMM_WORLD, error); if (rank == LOCAL_RANK) for (int i = 0; i < 25; i++) { std::cout << "Iteration " << i << " in rank " << rank << std::endl; error = MPI_Win_lock(MPI_LOCK_SHARED, REMOTE_RANK, 0, win); if (error != MPI_SUCCESS) MPI_Abort(MPI_COMM_WORLD, error); error = MPI_Get(&local_value, 1, MPI_DOUBLE, REMOTE_RANK, 0, 1, MPI_DOUBLE, win); if (error != MPI_SUCCESS) MPI_Abort(MPI_COMM_WORLD, error); error = MPI_Win_unlock(REMOTE_RANK, win); if (error != MPI_SUCCESS) MPI_Abort(MPI_COMM_WORLD, error); } #ifdef USE_BARRIER MPI_Barrier(MPI_COMM_WORLD); #endif MPI_Win_free(&win); MPI_Free_mem(value); MPI_Finalize(); }[/cpp] Other MPI libraries work as expected, also other "configurations" work. E.g.:
[cpp]#define USE_BARRIER 0 #define LOCAL_RANK 10 #define REMOTE_RANK 3[/cpp] or
[cpp]#define USE_BARRIER 1 #define LOCAL_RANK 2 #define REMOTE_RANK 3[/cpp] If you need more information, let me know.

Thanks for your help,
Sebastian

0 Kudos
11 Replies
James_T_Intel
Moderator
1,145 Views
Hi Sebastian,

Try adding "-env I_MPI_DEBUG 5" to the mpirun command. This will generate additional debug information and might provide some indication of what is causing the lock. I am able to run the original program you provided without any hangs. I will try some other combinations and see if I can cause the hang.

Sincerely,
James Tullos
Technical Consulting Engineer
Intel Cluster Tools
0 Kudos
Sebastian_R_1
Beginner
1,145 Views
"srun" does not support -env

This is the output of "I_MPI_DEBUG=5 srun ./test"[bash][-1] MPI startup(): Imported environment partly inaccesible. Map=0 Info=607830 [-1] MPI startup(): Imported environment partly inaccesible. Map=0 Info=607830 [-1] MPI startup(): Imported environment partly inaccesible. Map=0 Info=607830 [-1] MPI startup(): Imported environment partly inaccesible. Map=0 Info=607830 [-1] MPI startup(): Imported environment partly inaccesible. Map=0 Info=607830 [-1] MPI startup(): Imported environment partly inaccesible. Map=0 Info=607830 [-1] MPI startup(): Imported environment partly inaccesible. Map=0 Info=607830 [-1] MPI startup(): Imported environment partly inaccesible. Map=0 Info=607830 [-1] MPI startup(): Imported environment partly inaccesible. Map=0 Info=607830 [-1] MPI startup(): Imported environment partly inaccesible. Map=0 Info=607830 [-1] MPI startup(): Imported environment partly inaccesible. Map=0 Info=607830 [-1] MPI startup(): Imported environment partly inaccesible. Map=0 Info=607830 [-1] MPI startup(): Imported environment partly inaccesible. Map=0 Info=607830 [-1] MPI startup(): Imported environment partly inaccesible. Map=0 Info=607830 [-1] MPI startup(): Imported environment partly inaccesible. Map=0 Info=607830 [-1] MPI startup(): Imported environment partly inaccesible. Map=0 Info=607830 [8] MPI startup(): shm and ofa data transfer modes [9] MPI startup(): shm and ofa data transfer modes [2] MPI startup(): shm and ofa data transfer modes [6] MPI startup(): shm and ofa data transfer modes [4] MPI startup(): shm and ofa data transfer modes [5] MPI startup(): shm and ofa data transfer modes [0] MPI startup(): shm and ofa data transfer modes [1] MPI startup(): shm and ofa data transfer modes [3] MPI startup(): shm and ofa data transfer modes [10] MPI startup(): shm and ofa data transfer modes [7] MPI startup(): shm and ofa data transfer modes [14] MPI startup(): shm and ofa data transfer modes [11] MPI startup(): shm and ofa data transfer modes [12] MPI startup(): shm and ofa data transfer modes [13] MPI startup(): shm and ofa data transfer modes [15] MPI startup(): shm and ofa data transfer modes [0] MPI startup(): Rank Pid Node name Pin cpu [0] MPI startup(): 0 22239 r1i0n0 +1 [0] MPI startup(): 1 22240 r1i0n0 +1 [0] MPI startup(): 2 22241 r1i0n0 +1 [0] MPI startup(): 3 22242 r1i0n0 +1 [0] MPI startup(): 4 22243 r1i0n0 +1 [0] MPI startup(): 5 22244 r1i0n0 +1 [0] MPI startup(): 6 22245 r1i0n0 +1 [0] MPI startup(): 7 22246 r1i0n0 +1 [0] MPI startup(): 8 14354 r1i1n0 +1 [0] MPI startup(): 9 14355 r1i1n0 +1 [0] MPI startup(): 10 14356 r1i1n0 +1 [0] MPI startup(): 11 14357 r1i1n0 +1 [0] MPI startup(): 12 14358 r1i1n0 +1 [0] MPI startup(): 13 14359 r1i1n0 +1 [0] MPI startup(): 14 14360 r1i1n0 +1 [0] MPI startup(): 15 14361 r1i1n0 +1 [0] MPI startup(): I_MPI_DEBUG=5 [0] MPI startup(): I_MPI_FABRICS=shm:ofa Iteration 0 in rank 10 Iteration 1 in rank 10 Iteration 2 in rank 10 Iteration 3 in rank 10 Iteration 4 in rank 10 Iteration 5 in rank 10 Iteration 6 in rank 10 Iteration 7 in rank 10 Iteration 8 in rank 10 Iteration 9 in rank 10 Iteration 10 in rank 10 Iteration 11 in rank 10[/bash]
0 Kudos
James_T_Intel
Moderator
1,145 Views
Hi Sebastian,

Are you able to test outside of SLURM? What distribution are you using? Please try these configurations:

[cpp]#define LOCAL_RANK 11 #define REMOTE_RANK 3[/cpp][cpp]#define LOCAL_RANK 11 #define REMOTE_RANK 4[/cpp][cpp]#define LOCAL_RANK 3 #define REMOTE_RANK 10[/cpp]

Sincerely,
James Tullos
Technical Consulting Engineer
Intel Cluster Tools
0 Kudos
Sebastian_R_1
Beginner
1,145 Views
[cpp]#define LOCAL_RANK 11 #define REMOTE_RANK 3[/cpp] Works.

[cpp]#define LOCAL_RANK 11 #define REMOTE_RANK 4[/cpp] Hangs.

[cpp]#define LOCAL_RANK 3 #define REMOTE_RANK 10[/cpp]
Hangs as well.

The distribution is a SUSE Linux Enterprise Server 11.

I wasn't able to run the program outside of SLURM, at least not on this cluster. If you need this information, I can contact the help desk, maybe they know a way how to run the program without SLURM.
0 Kudos
James_T_Intel
Moderator
1,145 Views
Hi Sebastian,

I'll set up some virtual machines here to replicate your setup. Would you be able to run all of the processes on a single node (technically overdrawing resources, but for this program it shouldn't cause a problem).

For the new two that hang, do they hang atthe same iteration as the original?

Sincerely,
James Tullos
Technical Consulting Engineer
Intel Cluster Tools
0 Kudos
Sebastian_R_1
Beginner
1,145 Views
Adding the option "--ntasks-per-core=2" (which means that all of the 16 tasks run on one node) solves the problem, too.

And yes, they all hang in the same iteration.
0 Kudos
James_T_Intel
Moderator
1,145 Views
Hi Sebastian,

It definitely appears to be related to having the tasks involved in the communication on different nodes. Are you able to reliably run other MPI programs involving these two nodes? Have you tried using a different fabric for your connection? What is the output from "env | grep I_MPI"?

Sincerely,
James Tullos
Technical Consulting Engineer
Intel Cluster Tools
0 Kudos
Sebastian_R_1
Beginner
1,145 Views
"I_MPI_FABRICS=shm:ofa" is the default. However, I'm unable to reproduce the error when farbic is set to "(shm:)dapl" or "(shm:)tcp". (tmi does not work at all)

Output of "env | grep I_MPI"[bash]I_MPI_FABRICS=shm:ofa I_MPI_PMI_LIBRARY=/usr/lib64/libpmi.so I_MPI_JOB_FAST_STARTUP=0 I_MPI_HOSTFILE=/tmp/di56zem/mpd.hosts.11693 I_MPI_ROOT=/lrz/sys/intel/mpi_40_3_00[/bash] I haven't tried any other MPI programs, but according to the service provider, the Intel MPI library should work.
0 Kudos
James_T_Intel
Moderator
1,145 Views
Hi Sebastian,

I have been able to reproduce the error you are receiving by matching the fabric. I'm going to do some more modifications to your code to see if I can get a more general reproducer, and I'll be submitting a defect report for this.

Sincerely,
James Tullos
Technical Consulting Engineer
Intel Cluster Tools
0 Kudos
Sebastian_R_1
Beginner
1,145 Views
Thanks a lot.

Would be nice, if you can post an update here as soon as this gets fixed in the latest release.
0 Kudos
James_T_Intel
Moderator
1,145 Views

If this is still an issue for you, we have an engineering build which our developers have verified to fix this issue.  Please let me know if you are still encountering this, and I will send you the engineering build.

0 Kudos
Reply