Software Archive
Read-only legacy content
公告
FPGA community forums and blogs on community.intel.com are migrating to the new Altera Community and are read-only. For urgent support needs during this transition, please visit the FPGA Design Resources page or contact an Altera Authorized Distributor.
17060 讨论

command line tool to find data races and dead locks on xeon phi

shiva_rama_krishna_b
597 次查看

Hi,

I have written a multithread program in which each thread makes a offload call. I observed that my program is getting hanged(if i check top on MIC  offload_main is getting killed but program on CPU was not terminated). Does that mean it entered a deadlock? Is there any tool to give insights on these kind of issues?

Thanks

sivaramakrishna

0 项奖励
3 回复数
McCalpinJohn
名誉分销商 III
597 次查看

A deadlock would not kill the process on the MIC -- it would just sit there spinning forever.

Things that are likely to kill the offload task on the MIC are out-of-memory errors, out-of-range memory access errors, illegal instructions, etc.

0 项奖励
shiva_rama_krishna_b
597 次查看

Hi John,

What happening here is, Both the CPU threads and Xeonphi threads are sleeping(I noticed in top).

I have also seen the memory that is getting created on xeonphi is 6GB(1.5 GB is empty). I think if it would be due to out of memory errors, then It should cm every time i run the program.

Suppose if it would due to out-of-range memory errors, It should through a segmentation fault.(I have seen MIC throwing segment fault if i access out of range.).

Similarly if i write any illegal instructions in the code(vectorized instructions) then also MIC throw-ed illegal instruction error.

Can MIC kill any a thread if it does some thing wrong(like out of memory erros, illegal instruction...) with out notifying us?

Please correct me if my understanding is wrong.

 

Thanks

sivaramakrishna

0 项奖励
jimdempseyatthecove
名誉分销商 III
597 次查看

Not knowing anything about your application we can only guess as to what is going on.

Any (non-trapped by application) memory access fault, either on host or in MIC would terminate the process. GP faults (illegal instructions, stack related issues) from my understanding are not trappable and would also terminate the process. Therefore this leaves you with a deadlock like coding error.

From you description (threads sleeping) it sounds like the host is in a "are you (MIC) done yet?" wait, and the MIC is in a "what do I do next" wait.

The general culprit for that is throwing an asynchronous offload at the MIC with a signal(N), and then later performing a wait(M). Note N and M are different.

I do not know off hand, maybe someone here can answer, is what happens of you throw multiple offloads with the same N in the signal(N) while the prior offload with signal(N) is pending. The doc does not define this behavior. If you have this coding error, use a different N.

Jim Dempsey

0 项奖励
回复