- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I observe a crash on Lomonosov-2 supercomputer (http://hpc.msu.ru/node/159, partition: "pascal") with IMPI version 2019.4.243. The reproducer code is:
------
#include <iostream>
#include <mpi.h>
int main(int argc, char **argv)
{
MPI_Request request[1];
MPI_Status status;
MPI_Init(&argc, &argv);
constexpr int size = 512;
constexpr size_t N = 10000;
//---
char sbuf[size], rbuf[size];
for (size_t i = 0; i < N; i++) {
MPI_Iallreduce((char *)sbuf, (char *)rbuf, size, MPI_CHAR, MPI_SUM, MPI_COMM_WORLD, request);
MPI_Wait(request, &status);
}
//---
MPI_Finalize();
return 0;
}
--------
Running it like:
# cat hosts
n54229
n54230
# mpiexec.hydra -f hosts -np 4 -ppn 2 --errfile-pattern=err.%r --outfile-pattern=out.%r ./simple-iallreduce
=> works ok
# I_MPI_ASYNC_PROGRESS=1 mpiexec.hydra -f hosts -np 4 -ppn 2 --errfile-pattern=err.%r --outfile-pattern=out.%r ./simple-iallreduce
=> crashes with segfault
Backtrace with a debug version of IMPI on one of ranks is:
---
Program received signal SIGSEGV, Segmentation fault.
0x00007ffff660d020 in MPIDI_isend_unsafe (buf=0x7fffffffa474, count=4, datatype=1275068673, rank=5, tag=287, comm=0x7ffff753d520 <MPIR_Comm_builtin>, context_offset=1, av=0x7f2000083808, request=0x7f2000452d38) at ../../src/mpid/ch4/src/ch4_send.h:278
#0 0x00007ffff660d020 in MPIDI_isend_unsafe (buf=0x7fffffffa474, count=4, datatype=1275068673, rank=5, tag=287, comm=0x7ffff753d520 <MPIR_Comm_builtin>, context_offset=1, av=0x7f2000083808, request=0x7f2000452d38) at ../../src/mpid/ch4/src/ch4_send.h:278
#1 0x00007ffff660da87 in MPIDI_isend_safe (buf=0x7fffffffa474, count=4, datatype=1275068673, rank=5, tag=287, comm=0x7ffff753d520 <MPIR_Comm_builtin>, context_offset=1, av=0x7f2000083808, req=0x7f2000452d38) at ../../src/mpid/ch4/src/ch4_send.h:487
#2 0x00007ffff660e454 in MPID_Isend (buf=0x7fffffffa474, count=4, datatype=1275068673, rank=5, tag=287, comm=0x7ffff753d520 <MPIR_Comm_builtin>, context_offset=1, request=0x7f2000452d38) at ../../src/mpid/ch4/src/ch4_send.h:659
#3 0x00007ffff6612212 in MPIC_Isend (buf=0x7fffffffa474, count=4, datatype=1275068673, dest=5, tag=287, comm_ptr=0x7ffff753d520 <MPIR_Comm_builtin>, request_ptr=0x7f2000452d38, errflag=0x7ffff7577108 <MPIR_Request_direct+1096>) at ../../src/mpi/coll/helper_fns.c:518
#4 0x00007ffff681081d in MPIDU_Sched_start_entry (s=0x7f2000076800, idx=8, e=0x7f2000452d00) at ../../src/mpid/common/sched/mpidu_sched.c:263
#5 0x00007ffff6811861 in MPIDU_Sched_continue (s=0x7f2000076800) at ../../src/mpid/common/sched/mpidu_sched.c:407
#6 0x00007ffff6814a1d in MPIDU_Sched_progress_state (state=0x7ffff7576660 <all_schedules>, made_progress=0x7fffffffa08c) at ../../src/mpid/common/sched/mpidu_sched.c:1080
#7 0x00007ffff6814e24 in MPIDU_Sched_progress (made_progress=0x7fffffffa08c) at ../../src/mpid/common/sched/mpidu_sched.c:1171
#8 0x00007ffff63d074e in MPIDI_Progress_test_impl (flags=7) at ../../src/mpid/ch4/src/ch4_progress.h:132
#9 0x00007ffff63d0d64 in MPIDI_Progress_test (flags=7) at ../../src/mpid/ch4/src/intel/ch4_progress.c:28
#10 0x00007ffff6bb22c7 in MPID_Progress_test () at ../../src/mpid/ch4/src/ch4_progress.h:233
#11 0x00007ffff6bb232f in MPID_Progress_wait (state=0x7fffffffa1b4) at ../../src/mpid/ch4/src/ch4_progress.h:294
#12 0x00007ffff6bb246d in MPIR_Wait_impl (request_ptr=0x7ffff75770c0 <MPIR_Request_direct+1024>, status=0x7fffffffa488) at ../../src/mpi/request/wait.c:44
#13 0x00007ffff6bb240d in MPID_Wait (request_ptr=0x7ffff75770c0 <MPIR_Request_direct+1024>, status=0x7fffffffa488) at ../../src/mpid/ch4/include/mpidpost.h:178
#14 0x00007ffff6bb2b7a in MPIR_Wait (request=0x7fffffffa47c, status=0x7fffffffa488) at ../../src/mpi/request/wait.c:104
#15 0x00007ffff6bb33e9 in PMPI_Wait (request=0x7fffffffa47c, status=0x7fffffffa488) at ../../src/mpi/request/wait.c:205
#16 0x0000000000400de3 in main (argc=9, argv=0x7fffffffa5b8) at simple-iallreduce.cpp:12
---
This can be reproduces with other ppns greater than 1.
If I add a snippet before a for-loop, the code magically starts working:
----
char wbuf[size];
MPI_Win win;
MPI_Win_create(wbuf, size, 1, MPI_INFO_NULL, MPI_COMM_WORLD, &win);
----
Its strange enough, because MPI_Win_create() has nothing to do with MPI_Iallreduce().
Could you explain this or file a ticket to fix this if it appears to be a bug?
--
Regards,
Alexey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Alexey,
Sorry for the miscommunication regarding the implicit selection of release_mt libraries. I have checked only for the segfault.
After contacting the internal team they informed that using release_mt is mandatory while using these additional features (multiple-endpoints, async threads) as mentioned in the developer reference.
Regards
Prasanth
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Alexey,
Thanks for reaching out to us.
We have observed similar behaviour while using Asynchronous progress control.
Could you please try below workaround and let us know if it works for you?
replace the MPI_Wait(request, &status) with MPI_Waitall(0,request, &status);
or once after the for loop as MPI_Waitall(N,request, &status);
Also, why were you calling the mpi_iallreduce inside a for loop?
Regarding the disappearance of error after creating a mpi window, we will get back to you after discussing with SME.
Regards
Prasanth
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Prasanth,
1) I don't think MPI_Waitall(0, request, &status); is a functional equivalent of MPI_Wait(), it more seem like a no-op to me. Not sure it can be called a "workaround"
2) as for for-loop, it is not important fact about this reproducer. The reproducer snipped was extracted from some working code which originally contained such a loop, that is why the loop is there. I managed to reduce a reproducer to this small snippet, and it still behaves as described:
---
#include <iostream>
#include <mpi.h>
#include <assert.h>
// export I_MPI_ASYNC_PROGRESS=1; nnodes=2; ppn=2 --> segfault in MPI_Waitall
// export I_MPI_ASYNC_PROGRESS=0; nnodes=2; ppn=2 --> OK
int main(int argc, char **argv)
{
MPI_Request request[1];
MPI_Status status;
int thrlev;
MPI_Init_thread(&argc, &argv, MPI_THREAD_MULTIPLE, &thrlev);
assert(thrlev == MPI_THREAD_MULTIPLE);
constexpr int size = 512;
char sbuf[size], rbuf[size];
MPI_Iallreduce((char *)sbuf, (char *)rbuf, size, MPI_CHAR, MPI_SUM, MPI_COMM_WORLD, request);
MPI_Waitall(1, request, &status);
MPI_Finalize();
return 0;
}
---
Please confirm a bug or provide an explanation why this code is wrong.
Thanks for feedback.
--
Regards,
Alexey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Alexey,
- I agree that using Waitall was not a solution nor a valid workaround.
- The reason behind why i have asked that you are using for loop is, in this case, it doesn't much of a difference with the result. I understood why you have kept it.
- Previously I have tested with a beta version but now I have tested this code with the latest versions (2019u7and 2019u8) and didn't get any error. There might be a bug in 2019u4 which I will report to the internal team.
- Could you please upgrade to the latest version(2019u8) and check if the error persists?
Regards
Prasanth
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Prasanth,
I tried to switch to "release_mt" I_MPI_KIND (the described output was taken with default "release" kind).
This segfault disappeared.
I seems I_MPI_ASYNC_PROGRESS=1 works well only with relase_mt and debug_mt kinds, even though "release" and "debug" kinds work well in MPI_THREAD_MULTIPLE mode. In IMPI2019 "release_mt" must be set explicitly (seems different from IMPI2018). I didn't get it from docs, and there is no dignostics of wrong usage (at least in 2019.4), which is misleading.
Thanks for your help!
--
Regards,
Alexey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Alexey,
It has been mentioned in the developer reference (https://software.intel.com/content/www/us/en/develop/documentation/mpi-developer-guide-linux/top/add...) that only release_mt and debug_mt versions support asynchronous progress threads.
"Intel® MPI Library supports asynchronous progress threads that allow you to manage communication in parallel with application computation and, as a result, achieve better communication/computation overlapping. This feature is supported for the release_mt and debug_mt versions only."
But in the latest versions, I have observed the release_mt version will be used implicitly when we try to access these additional support features.
Since your issue has been resolved could you please confirm so we close this thread?
Regards
Prasanth
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Prasanth,
It is nice to hear that setting of release_mt/debug_mt kind is done now implicitely, since explicit setting of them is really an error-prone process: you normally call the mpivars.sh somewhere outside of direct application starting scripts, so it is easy to forget to swap release/release_mt kinds when use or not use Async Progress feature.
Could you inform us since which update number this works like you describe it? I think this info is important for the community.
--
Regards,
Alexey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Alexey,
Sorry for the miscommunication regarding the implicit selection of release_mt libraries. I have checked only for the segfault.
After contacting the internal team they informed that using release_mt is mandatory while using these additional features (multiple-endpoints, async threads) as mentioned in the developer reference.
Regards
Prasanth
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
OK thanks for explanation. I also noticed that on 2019u9 setting up "release_mt" kind is absolutely required for I_MPI_ASYNC_PROGRESS=1.
I would only suggest to add some explicit diagnostics on the situations when this functionality in attempted to be used with "release" library kind, since this currenly leads to segfaults sometimes.
Thanks for help.
--
Regards,
Alexey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Alexey,
Glad we could be of help.
I would pass your suggestion, to issue a warning when users attempt to use multi-ep features in release mode, to the internal team
Since your issue is resolved we will be closing this issue for now. Please raise a new thread for any further queries.
Regards
Prasanth
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page