- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I have written a multithread program in which each thread makes a offload call. I observed that my program is getting hanged(if i check top on MIC offload_main is getting killed but program on CPU was not terminated). Does that mean it entered a deadlock? Is there any tool to give insights on these kind of issues?
Thanks
sivaramakrishna
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
A deadlock would not kill the process on the MIC -- it would just sit there spinning forever.
Things that are likely to kill the offload task on the MIC are out-of-memory errors, out-of-range memory access errors, illegal instructions, etc.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi John,
What happening here is, Both the CPU threads and Xeonphi threads are sleeping(I noticed in top).
I have also seen the memory that is getting created on xeonphi is 6GB(1.5 GB is empty). I think if it would be due to out of memory errors, then It should cm every time i run the program.
Suppose if it would due to out-of-range memory errors, It should through a segmentation fault.(I have seen MIC throwing segment fault if i access out of range.).
Similarly if i write any illegal instructions in the code(vectorized instructions) then also MIC throw-ed illegal instruction error.
Can MIC kill any a thread if it does some thing wrong(like out of memory erros, illegal instruction...) with out notifying us?
Please correct me if my understanding is wrong.
Thanks
sivaramakrishna
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Not knowing anything about your application we can only guess as to what is going on.
Any (non-trapped by application) memory access fault, either on host or in MIC would terminate the process. GP faults (illegal instructions, stack related issues) from my understanding are not trappable and would also terminate the process. Therefore this leaves you with a deadlock like coding error.
From you description (threads sleeping) it sounds like the host is in a "are you (MIC) done yet?" wait, and the MIC is in a "what do I do next" wait.
The general culprit for that is throwing an asynchronous offload at the MIC with a signal(N), and then later performing a wait(M). Note N and M are different.
I do not know off hand, maybe someone here can answer, is what happens of you throw multiple offloads with the same N in the signal(N) while the prior offload with signal(N) is pending. The doc does not define this behavior. If you have this coding error, use a different N.
Jim Dempsey

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page